Async in C#, .NET, and Unity: Allocation and state machine builders
While helping with little something that uses Unity I came across the rabbit hole async
/await
support is in Unity. Historically Unity used generators
(known as coroutines in Unity’s world) to support async/multiple-frames-spanning computation. In 2017 they added initial support for async
/await
but without any meaningful libraries support and with potential performance pitfalls (hello GC, how are you?). To be fair, at that time async had performance implications even in mainland .NET (Core), mainly around allocations which - (un)fortunately aren’t anywhere as problematic for (mostly) throughput oriented .NET Core apps as they can be for near-real-time applications like Unity games.
Luckily for .NET, with the release of .NET Core 2.1 in 2018 a lot of those issues got solved and allocations were decreased substantially. But what was the change actually about? And how does it relate to Unity and/or 3rd party Unity focused async
/await
libraries such as UniTask or UnityAsync? Let’s find out.
I’ll assume some (relatively deep) knowledge about async
/await
. If you’re not sure you have it, be sure to check this awesome blog-post about the topic.
State machine rewrite:
When you write an async
method, Roslyn will rewrite it to a method that does following. As this rewrite is done by the compiler it will happen regardless of your runtime, be it full framework, .NET Core 2.1, or Unity.
- Compiler synthesizes an
IAsyncStateMachine
struct containing the original implementation of the method cut into a state machine (as itsMoveNext(..)
method) and locals lifted as fields. - Compiler generated
IAsyncStateMachine
struct is initialized (stateMachine
) with:This
pointer.- Parameters.
- Newly initialized
XYZMethodBuilder
(methodBuilder
) struct corresponding to the Task-like object that is being awaited.
- The
methodBuilder
is retrieved out of thestateMachine
. methodBuilder.Start(ref stateMachine)
.- Runs the
stateMachine.MoveNext(..)
. - Get
awaiter
out of the awaited expression. - If completed synchronously ->
methodBuilder.setResult(..)
, done.- Allocates
methodBuilder.Task
if it doesn’t exist already, sets its result (1).
- Allocates
- If not completed ->
methodBuilder.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine)
- Runs the
return methodBuilder.Task
.- Allocates
methodBuilder.Task
if it doesn’t exist already (1).
- Allocates
Individual runtimes can then differ in that they do in the methodBuilder.AwaitUnsafeOnCompleted(..)
method. The method that actually does the continuation registration and therefore where all the important bits happen.
Old .NET Core, Unity, full framework:
The most allocate-y of the three I’m going to talk about is .NET prior to Core 2.1 and old Unity (honestly not sure how much that’s still the case by default).
When methodBuilder.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine)
runs:
- Allocates
this.Task
if it doesn’t exist already. - Allocates
this.Runner
(runner
) if it doesn’t exist already, initializes it with:- Boxed version of
stateMachine
(that actually contains thismethodBuilder
as a field). - Delegate to
stateMachine.MoveNext(..)
proxy that lives on therunner
(cachedDelegate
). - Current execution context.
- Boxed version of
- Registers the
runner.cachedDelegate
onawaiter.UnsafeOnCompleted(..)
.
Therefore several allocations happen:
- The
stateMachine
is boxed to be stored in therunner
. - Capturing execution context can/might allocate.
cachedDelegate
needs to be allocated, it’s tied to execution context so it might have to be re-allocated if it changes.Task
needs to be allocated.
Modern .NET Core, (possibly?) Unity:
.NET Core 2.1 improves on this situation quite a bit:
Instead of Task
a special derived type AsyncStateMachineBox
is used that enables substantially lower overhead with strongly typed stateMachine
field.
- Allocates
this.Task
asAsyncStateMachineBox
if it doesn’t exist already, initializes it with:- Strongly typed version of
stateMachine
(that actually contains thismethodBuilder
as a field) without boxing. - Immutable (it is in .NET Core) version of Execution context.
- Strongly typed version of
- Special cases all awaiters it knows about and passes
stateMachine
directly without the need for delegate.- Optionally allocates one immutable delegate to
stateMachine.MoveNext(..)
if needed by an unknown awaiter.
- Optionally allocates one immutable delegate to
Therefore we only allocate two things at worst and never re-allocate on multiple awaits within the same method. Quite a big improvement.
UniTask:
UniTask, an async
/await
library for Unity, takes a slightly different but goes even further.
Instead of Task
it uses UniTask
which is a value type (similar to .NET’s ValueTask
I didn’t have space to talk about). Instead of Runner
it uses heavily pooled (rarely newly allocated) RunnerPromise
with strongly typed stateMachine
field.
- Initializes
UniTask
if it doesn’t exist already with: - Gets
runnerPromise
(~runner
in .NET) from object pool and initializes it with:- Strongly typed copy of the
stateMachine
.
- Strongly typed copy of the
- Registers a on-allocation-created delegate to
runnerPromise.Run()
that calls intothis.stateMachine.MoveNext(..)
onawaiter.UnsafeOnCompleted(..)
.
Thus, it can avoid all allocations in most situations. The delegate is allocated only once per lifetime of the runnerPromise
and since those are pooled new ones are created rarely. The pooling brings some limitations in terms of robustness (can’t await it twice, …) but - after all - everything is a tradeoff in software engineering.