Async in C#, .NET, and Unity: Allocation and state machine builders

#swe

While helping with little something that uses Unity I came across the rabbit hole async/await support is in Unity. Historically Unity used generators (known as coroutines in Unity’s world) to support async/multiple-frames-spanning computation. In 2017 they added initial support for async/await but without any meaningful libraries support and with potential performance pitfalls (hello GC, how are you?). To be fair, at that time async had performance implications even in mainland .NET (Core), mainly around allocations which - (un)fortunately aren’t anywhere as problematic for (mostly) throughput oriented .NET Core apps as they can be for near-real-time applications like Unity games.

Luckily for .NET, with the release of .NET Core 2.1 in 2018 a lot of those issues got solved and allocations were decreased substantially. But what was the change actually about? And how does it relate to Unity and/or 3rd party Unity focused async/await libraries such as UniTask or UnityAsync? Let’s find out.

I’ll assume some (relatively deep) knowledge about async/await. If you’re not sure you have it, be sure to check this awesome blog-post about the topic.

State machine rewrite:

When you write an async method, Roslyn will rewrite it to a method that does following. As this rewrite is done by the compiler it will happen regardless of your runtime, be it full framework, .NET Core 2.1, or Unity.

Compiler synthesizes an IAsyncStateMachine struct containing the original implementation of the method cut into a state machine (as its MoveNext(..) method) and locals lifted as fields.
Compiler generated IAsyncStateMachine struct is initialized (stateMachine) with:
- This pointer.
- Parameters.
- Newly initialized XYZMethodBuilder (methodBuilder) struct corresponding to the Task-like object that is being awaited.
The methodBuilder is retrieved out of the stateMachine.
methodBuilder.Start(ref stateMachine).
1. Runs the stateMachine.MoveNext(..).
2. Get awaiter out of the awaited expression.
3. If completed synchronously -> methodBuilder.setResult(..), done.
  - Allocates methodBuilder.Task if it doesn’t exist already, sets its result (¹).
4. If not completed -> methodBuilder.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine)
return methodBuilder.Task.
- Allocates methodBuilder.Task if it doesn’t exist already (¹).

Individual runtimes can then differ in that they do in the methodBuilder.AwaitUnsafeOnCompleted(..) method. The method that actually does the continuation registration and therefore where all the important bits happen.

Old .NET Core, Unity, full framework:

The most allocate-y of the three I’m going to talk about is .NET prior to Core 2.1 and old Unity (honestly not sure how much that’s still the case by default).

When methodBuilder.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine) runs:

Allocates this.Task if it doesn’t exist already.
Allocates this.Runner (runner) if it doesn’t exist already, initializes it with:
- Boxed version of stateMachine (that actually contains this methodBuilder as a field).
- Delegate to stateMachine.MoveNext(..) proxy that lives on the runner (cachedDelegate).
- Current execution context.
Registers the runner.cachedDelegate on awaiter.UnsafeOnCompleted(..).

Therefore several allocations happen:

The stateMachine is boxed to be stored in the runner.
Capturing execution context can/might allocate.
cachedDelegate needs to be allocated, it’s tied to execution context so it might have to be re-allocated if it changes.
Task needs to be allocated.

Modern .NET Core, (possibly?) Unity:

.NET Core 2.1 improves on this situation quite a bit:

Instead of Task a special derived type AsyncStateMachineBox is used that enables substantially lower overhead with strongly typed stateMachine field.

Allocates this.Task as AsyncStateMachineBox if it doesn’t exist already, initializes it with:
- Strongly typed version of stateMachine (that actually contains this methodBuilder as a field) without boxing.
- Immutable (it is in .NET Core) version of Execution context.
Special cases all awaiters it knows about and passes stateMachine directly without the need for delegate.
- Optionally allocates one immutable delegate to stateMachine.MoveNext(..) if needed by an unknown awaiter.

Therefore we only allocate two things at worst and never re-allocate on multiple awaits within the same method. Quite a big improvement.

UniTask:

UniTask, an async/await library for Unity, takes a slightly different but goes even further.

Instead of Task it uses UniTask which is a value type (similar to .NET’s ValueTask I didn’t have space to talk about). Instead of Runner it uses heavily pooled (rarely newly allocated) RunnerPromise with strongly typed stateMachine field.

Initializes UniTask if it doesn’t exist already with:
Gets runnerPromise (~runner in .NET) from object pool and initializes it with:
- Strongly typed copy of the stateMachine.
Registers a on-allocation-created delegate to runnerPromise.Run() that calls into this.stateMachine.MoveNext(..) on awaiter.UnsafeOnCompleted(..).

Thus, it can avoid all allocations in most situations. The delegate is allocated only once per lifetime of the runnerPromise and since those are pooled new ones are created rarely. The pooling brings some limitations in terms of robustness (can’t await it twice, …) but - after all - everything is a tradeoff in software engineering.

Big thanks to these two blog posts.

Not synthesized by Roslyn but this will happen in the majority of implementations so I left it here. ↩ ↩²

Written by Petr Houška on Jul 7, 2020