Windows with C++ - Coroutines in Visual C++ 2015

By Kenny Kerr | October 2015

I first learned about coroutines in C++ back in 2012 and wrote about the ideas in a series of articles here in MSDN Magazine. I explored a lightweight form of cooperative multitasking that emulated coroutines by playing clever tricks with switch statements. I then discussed some efforts to improve the efficiency and composability of asynchronous systems with proposed extensions to promises and futures. Finally, I covered some challenges that exist even with a futuristic vision of futures, as well as a proposal for something called resumable functions. I would encourage you to read these if you’re interested in some of the challenges and history related to elegant concurrency in C++:

“Lightweight Cooperative Multitasking with C++” (msdn.microsoft.com/magazine/jj553509)
“The Pursuit of Efficient and Composable Asynchronous Systems” (msdn.microsoft.com/magazine/jj618294)
“Back to the Future with Resumable Functions” (msdn.microsoft.com/magazine/jj658968)

Much of that writing was theoretical because I didn’t have a compiler that implemented any of those ideas and had to emulate them in various ways. And then Visual Studio 2015 shipped earlier this year. This edition of Visual C++ includes an experimental compiler option called /await that unlocks an implementation of coroutines directly supported by the compiler. No more hacks, macros or other magic. This is the real thing, be it experimental and as yet unsanctioned by the C++ committee. And it’s not just syntactic sugar in the compiler front end, like what you find with the C# yield keyword and async methods. The C++ implementation includes a deep engineering investment in the compiler back end that offers an incredibly scalable implementation. Indeed, it goes well beyond what you might find if the compiler front end simply provided a more convenient syntax for working with promises and futures or even the Concurrency Runtime task class. So let’s revisit the topic and see what this looks like today. A lot has changed since 2012, so I’ll begin with a brief recap to illustrate where we’ve come from and where we are before looking at some more specific examples and practical uses.

I concluded the aforementioned series with a compelling example for resumable functions, so I’ll start there. Imagine a pair of resources for reading from a file and writing to a network connection:

struct File
{
  unsigned Read(void * buffer, unsigned size);
};
struct Network
{
  void Write(void const * buffer, unsigned size);
};

You can use your imagination to fill in the rest, but this is fairly representative of what traditional synchronous I/O might look like. File’s Read method will attempt to read data from the current file position into the buffer up to a maximum size and will return the actual number of bytes copied. If the return value is less than the requested size, it typically means that the end of the file has been reached. The Network class models a typical connection-oriented protocol such as TCP or a Windows named pipe. The Write method copies a specific number of bytes to the networking stack. A typical synchronous copy operation is very easy to imagine, but I’ll help you out with Figure 1 so that you have a frame of reference.

Figure 1 Synchronous Copy Operation

File file = // Open file
Network network = // Open connection
uint8_t buffer[4096];
while (unsigned const actual = file.Read(buffer, sizeof(buffer)))
{
  network.Write(buffer, actual);
}

As long as the Read method returns some value greater than zero, the resulting bytes are copied from the intermediate buffer to the network using the Write method. This is the kind of code that any reasonable programmer would have no trouble understanding, regardless of their background. Naturally, Windows provides services that can offload this kind of operation entirely into the kernel to avoid all of the transitions, but those services are limited to specific scenarios and this is representative of the kinds of blocking operations apps are often tied up with.

The C++ Standard Library offers futures and promises in an attempt to support asynchronous operations, but they’ve been much maligned due to their naïve design. I discussed those problems back in 2012. Even overlooking those issues, rewriting the file-to-network copy example in Figure 1 is non-trivial. The most direct translation of the synchronous (and simple) while loop requires a carefully handcrafted iteration algorithm that can walk a chain of futures:

template <typename F>
future<void> do_while(F body)
{
  shared_ptr<promise<void>> done = make_shared<promise<void>>();
  iteration(body, done);
  return done->get_future();
}

The algorithm really comes to life in the iteration function:

template <typename F>
void iteration(F body, shared_ptr<promise<void>> const & done)
{
  body().then([=](future<bool> const & previous)
  {
    if (previous.get()) { iteration(body, done); }
    else { done->set_value(); }
  });
}

The lambda must capture the shared promise by value, because this really is iterative rather than recursive. But this is problematic as it means a pair of interlocked operations for each iteration. Moreover, futures don’t yet have a “then” method to chain continuations, though you could simulate this today with the Concurrency Runtime task class. Still, assuming such futuristic algorithms and continuations exist, I could rewrite the synchronous copy operation from Figure 1 in an asynchronous manner. I would first have to add async overloads to the File and Network classes. Perhaps something like this:

struct File
{
  unsigned Read(void * buffer, unsigned const size);
  future<unsigned> ReadAsync(void * buffer, unsigned const size);
};
struct Network
{
  void Write(void const * buffer, unsigned const size);
  future<unsigned> WriteAsync(void const * buffer, unsigned const size)
};

The WriteAsync method’s future must echo the number of bytes copied, as this is all that any continuation might have in order to decide whether to terminate the iteration. Another option might be for the File class to provide an EndOfFile method. In any case, given these new primitives, the copy operation can be expressed in a manner that’s understandable if you’ve imbibed sufficient amounts of caffeine. Figure 2 illustrates this approach.

Figure 2 Copy Operation with Futures

File file = // Open file
Network network = // Open connection
uint8_t buffer[4096];
future<void> operation = do_while([&]
{
  return file.ReadAsync(buffer, sizeof(buffer))
    .then([&](task<unsigned> const & read)
    {
      return network.WriteAsync(buffer, read.get());
    })
    .then([&](task<unsigned> const & write)
    {
      return write.get() == sizeof(buffer);
    });
});
operation.get();

The do_while algorithm facilitates the chaining of continuations as long as the “body” of the loop returns true. So ReadAsync is called, whose result is used by WriteAsync, whose result is tested as the loop condition. This isn’t rocket science, but I have no desire to write code like that. It’s contrived and quickly becomes too complex to reason about. Enter resumable functions.

Adding the /await compiler option enables the compiler’s support for resumable functions, an implementation of coroutines for C++. They’re called resumable functions rather than simply coroutines because they’re meant to behave as much like traditional C++ functions as possible. Indeed, unlike what I discussed back in 2012, a consumer of some function shouldn’t have to know whether it is, in fact, implemented as a coroutine at all.

As of this writing, the /await compiler option also necessitates the /Zi option rather than the default /ZI option in order to disable the debugger’s edit-and-continue feature. You must also disable SDL checks with the /sdl- option and avoid the /RTC options as the compiler’s runtime-checks aren’t compatible with coroutines. All of these limitations are temporary and due to the experimental nature of the implementation, and I expect them to be lifted in coming updates to the compiler. But it’s all worth it, as you can see in Figure 3. This is plainly and unquestionably far simpler to write and easier to comprehend than what was required for the copy operation implemented with futures. In fact, it looks very much like the original synchronous example in Figure 1. There’s also no need in this case for the WriteAsync future to return a specific value.

Figure 3 Copy Operation within Resumable Function

future<void> Copy()
{
  File file = // Open file
  Network network = // Open connection
  uint8_t buffer[4096];
  while (unsigned copied = await file.ReadAsync(buffer, sizeof(buffer)))
  {
    await network.WriteAsync(buffer, copied);
  }
}

The await keyword used in Figure 3, as well as the other new keywords provided by the /await compiler option, can appear only within a resumable function, hence the surrounding Copy function that returns a future. I’m using the same ReadAsync and WriteAsync methods from the previous futures example, but it’s important to realize that the compiler doesn’t know anything about futures. Indeed, they need not be futures at all. So how does it work? Well, it won’t work unless certain adapter functions are written to provide the compiler with the necessary bindings. This is analogous to the way the compiler figures out how to wire up a range-based for statement by looking for suitable begin and end functions. In the case of an await expression, rather than looking for begin and end, the compiler looks for suitable functions called await_ready, await_suspend and await_resume. Like begin and end, these new functions may be either member functions or free functions. The ability to write non-member functions is tremendously helpful as you can then write adapters for existing types that provide the necessary semantics, as is the case with the futuristic futures I’ve explored thus far. Figure 4 provides a set of adapters that would satisfy the compiler’s interpretation of the resumable function in Figure 3.

Figure 4 Await Adapters for a Hypothetical Future

namespace std
{
  template <typename T>
  bool await_ready(future<T> const & t)
  {
    return t.is_done();
  }
  template <typename T, typename F>
  void await_suspend(future<T> const & t, F resume)
  {
    t.then([=](future<T> const &)
    {
      resume();
    });
  }
  template <typename T>
  T await_resume(future<T> const & t)
  {
    return t.get();
  }
}

Again, keep in mind that the C++ Standard Library’s future class template doesn’t yet provide a “then” method to add a continuation, but that’s all it would take to make this example work with today’s compiler. The await keyword within a resumable function effectively sets up a potential suspension point where execution may leave the function if the operation is not yet complete. If await_ready returns true, then execution isn’t suspended and await_resume is called immediately to obtain the result. If, on the other hand, await_ready returns false, await_suspend is called, allowing the operation to register a compiler-provided resume function to be called on eventual completion. As soon as that resume function is called, the coroutines resume at the previous suspension point and execution continues on to the next await expression or the termination of the function.

Keep in mind that resumption occurs on whatever thread called the compiler’s resume function. That means it’s entirely possible that a resumable function can begin life on one thread and then later resume and continue execution on another thread. This is actually desirable from a performance perspective, as the alternative would mean dispatching the resumption to another thread, which is often costly and unnecessary. On the other hand, there might be cases where that would be desirable and even required should any subsequent code have thread affinity, as is the case with most graphics code. Unfortunately, the await keyword doesn’t yet have a way to let the author of an await expression provide such a hint to the compiler. This isn’t without precedent. The Concurrency Runtime does have such an option, but, interestingly, the C++ language itself provides a pattern you might follow:

int * p = new int(1);
// Or
int * p = new (nothrow) int(1);

In the same way, the await expression needs a mechanism to provide a hint to the await_suspend function to affect the thread context on which resumption occurs:

await network.WriteAsync(buffer, copied);
// Or
await (same_thread) network.WriteAsync(buffer, copied);

By default, resumption occurs in the most efficient manner possible to the operation. The same_thread constant of some hypothetical std::same_thread_t type would disambiguate between overloads of the await_suspend function. The await_suspend in Figure 3 would be the default and most efficient option, because it would presumably resume on a worker thread and complete without a further context switch. The same_thread overload illustrated in Figure 5 could be requested when the consumer requires thread affinity.

Figure 5 Hypothetical await_suspend Overload

template <typename T, typename F>
void await_suspend(future<T> const & t, F resume, same_thread_t const &)
{
  ComPtr<IContextCallback> context;
  check(CoGetObjectContext(__uuidof(context),
    reinterpret_cast<void **>(set(context))));
  t.then([=](future<T> const &)
  {
    ComCallData data = {};
    data.pUserDefined = resume.to_address();
    check(context->ContextCallback([](ComCallData * data)
    {
      F::from_address(data->pUserDefined)();
      return S_OK;
    },
    &data,
    IID_ICallbackWithNoReentrancyToApplicationSTA,
    5,
    nullptr));
  });
}

This overload retrieves the IContextCallback interface for the calling thread (or apartment). The continuation then eventually calls the compiler’s resume function from this same context. If that happens to be the app’s STA, the app could happily continue interacting with other services with thread affinity. The ComPtr class template and check helper function are part of the Modern library, which you can download from github.com/kennykerr/modern, but you can also use whatever you might have at your disposal.

I’ve covered a lot of ground, some of which continues to be somewhat theoretical, but the Visual C++ compiler already provides all of the heavy lifting to make this possible. It’s an exciting time for C++ developers interested in concurrency and I hope you’ll join me again next month as I dive deeper into resumable functions with Visual C++.

Kenny Kerr is a computer programmer based in Canada, as well as an author for Pluralsight and a Microsoft MVP. He blogs at kennykerr.ca and you can follow him on Twitter @kennykerr.

Thanks to the following Microsoft technical expert for reviewing this article: Gor Nishanov

Windows with C++ - Coroutines in Visual C++ 2015

Additional resources