Async in C#, .NET, and Unity: Allocation and state machine builders
While helping with little something that uses Unity I came across the rabbit hole async/await support is in Unity. Historically Unity used generators (known as coroutines in Unity’s world) to support async/multiple-frames-spanning computation. In 2017 they added initial support for async/await but without any meaningful libraries support and with potential performance pitfalls (hello GC, how are you?). To be fair, at that time async had performance implications even in mainland .NET (Core), mainly around allocations which - (un)fortunately aren’t anywhere as problematic for (mostly) throughput oriented .NET Core apps as they can be for near-real-time applications like Unity games.
Luckily for .NET, with the release of .NET Core 2.1 in 2018 a lot of those issues got solved and allocations were decreased substantially. But what was the change actually about? And how does it relate to Unity and/or 3rd party Unity focused async/await libraries such as UniTask or UnityAsync? Let’s find out.
I’ll assume some (relatively deep) knowledge about async/await. If you’re not sure you have it, be sure to check this awesome blog-post about the topic.
State machine rewrite:
When you write an async method, Roslyn will rewrite it to a method that does following. As this rewrite is done by the compiler it will happen regardless of your runtime, be it full framework, .NET Core 2.1, or Unity.
- Compiler synthesizes an
IAsyncStateMachinestruct containing the original implementation of the method cut into a state machine (as itsMoveNext(..)method) and locals lifted as fields. - Compiler generated
IAsyncStateMachinestruct is initialized (stateMachine) with:Thispointer.- Parameters.
- Newly initialized
XYZMethodBuilder(methodBuilder) struct corresponding to the Task-like object that is being awaited.
- The
methodBuilderis retrieved out of thestateMachine. methodBuilder.Start(ref stateMachine).- Runs the
stateMachine.MoveNext(..). - Get
awaiterout of the awaited expression. - If completed synchronously ->
methodBuilder.setResult(..), done.- Allocates
methodBuilder.Taskif it doesn’t exist already, sets its result (1).
- Allocates
- If not completed ->
methodBuilder.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine)
- Runs the
return methodBuilder.Task.- Allocates
methodBuilder.Taskif it doesn’t exist already (1).
- Allocates
Individual runtimes can then differ in that they do in the methodBuilder.AwaitUnsafeOnCompleted(..) method. The method that actually does the continuation registration and therefore where all the important bits happen.
Old .NET Core, Unity, full framework:
The most allocate-y of the three I’m going to talk about is .NET prior to Core 2.1 and old Unity (honestly not sure how much that’s still the case by default).
When methodBuilder.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine) runs:
- Allocates
this.Taskif it doesn’t exist already. - Allocates
this.Runner(runner) if it doesn’t exist already, initializes it with:- Boxed version of
stateMachine(that actually contains thismethodBuilderas a field). - Delegate to
stateMachine.MoveNext(..)proxy that lives on therunner(cachedDelegate). - Current execution context.
- Boxed version of
- Registers the
runner.cachedDelegateonawaiter.UnsafeOnCompleted(..).
Therefore several allocations happen:
- The
stateMachineis boxed to be stored in therunner. - Capturing execution context can/might allocate.
cachedDelegateneeds to be allocated, it’s tied to execution context so it might have to be re-allocated if it changes.Taskneeds to be allocated.
Modern .NET Core, (possibly?) Unity:
.NET Core 2.1 improves on this situation quite a bit:
Instead of Task a special derived type AsyncStateMachineBox is used that enables substantially lower overhead with strongly typed stateMachine field.
- Allocates
this.TaskasAsyncStateMachineBoxif it doesn’t exist already, initializes it with:- Strongly typed version of
stateMachine(that actually contains thismethodBuilderas a field) without boxing. - Immutable (it is in .NET Core) version of Execution context.
- Strongly typed version of
- Special cases all awaiters it knows about and passes
stateMachinedirectly without the need for delegate.- Optionally allocates one immutable delegate to
stateMachine.MoveNext(..)if needed by an unknown awaiter.
- Optionally allocates one immutable delegate to
Therefore we only allocate two things at worst and never re-allocate on multiple awaits within the same method. Quite a big improvement.
UniTask:
UniTask, an async/await library for Unity, takes a slightly different but goes even further.
Instead of Task it uses UniTask which is a value type (similar to .NET’s ValueTask I didn’t have space to talk about). Instead of Runner it uses heavily pooled (rarely newly allocated) RunnerPromise with strongly typed stateMachine field.
- Initializes
UniTaskif it doesn’t exist already with: - Gets
runnerPromise(~runnerin .NET) from object pool and initializes it with:- Strongly typed copy of the
stateMachine.
- Strongly typed copy of the
- Registers a on-allocation-created delegate to
runnerPromise.Run()that calls intothis.stateMachine.MoveNext(..)onawaiter.UnsafeOnCompleted(..).
Thus, it can avoid all allocations in most situations. The delegate is allocated only once per lifetime of the runnerPromise and since those are pooled new ones are created rarely. The pooling brings some limitations in terms of robustness (can’t await it twice, …) but - after all - everything is a tradeoff in software engineering.
