DRAFT - Stop Sucking at Async and Threading

Even the most experienced software engieneers have trouble using async/await properly, and can't do multithreading safely without sacrificing major performance, but who can blame them? Information on these topics is incomplete and convoluted. Most resources assume you already somewhat understand async, then explain only one little part of the big picture, and the lessons learned often do not translate very well to other scenarios or langauges! These techniques are hard because they require a fundamental shift in how we think about computing.

Allow me to take you on a not-so-gentle journey through the hellscape of fearful concurrency, the academia of lock-free thread safety, and ascension beyond skill issue.

Table Of Contents

1 Important Concepts

As I mentioned before, mastering these techniques requires changing how we think about computing. In this chapter, we will walk through a brief (but necessary) history of how CPUs and software have developed, as well as some concepts that have helped me personally. Building a solid foundation is the most important part of learning something new, so do not skip over this chapter or assume you already know things.

1.1 Hello, World!

Let's start at the very beginning. Imagine a simple hello world program. There is an entry point (either a main function or just the start of your file), then the code that prints "Hello, World!", and then your program calls some kind of exit function to exit the program. The last exit step is usually done for us these days, but if you ever program in assembly or use _start in C you'll learn that you have to do this to properly exit to program.

That's pretty much it. Execution starts at the program's entry point, then works its way down the file line by line until it runs out of code or is told to exit. This is how normal synchronous code works. It's easy, predictable, and behaves exactly the same way every time. But to understand why software normally works this way, we need to take a look at the hardware.

1.2 Basic Theory of Computation

Let's say we want to design a computer that, you know, computes things. How do we do this? Well, math computes things, and math has this thing called a Turing machine. To oversimplify, a Turing machine has two important parts to it: some memory, and some operations stored in that memory that can also modify the memory. These operations can do anything to the memory, like initialize and modify data, change an operation and then execute that operation again, etc. There are no limits. Because of this, a Turing machine can be thought of as an engine to execute generalized operations.

This is basically what we want a computer to do. Because "computer" is a bit of an unclear term, let's instead call it an execution engine. I like this name because, as programmers, it's helpful to abstract away the details of the CPU and think of it as an engine that executes operations.

Now, how do we get something a bit more practical? Something we can physically build? In 1945, a man named John von Neumann created the von Neumann architecture, which is what is used today. The von Neumann architecture describes a central processing unit (CPU) that receives inputs and produces outputs, and also has some access to some memory that will be modified while performing operations.

Also, these "operations" are called instructions.

To reiterate, think of a CPU is an execution engine. We'll get into multi-core CPUs later, so for now assume that 1 CPU = 1 execution engine.

1.3 The Blocking Problem

Think back to a time when computers weren't in every home, and universities had massive computers called mainframes that had to be fed punch cards. Programmers using these mainframes would have to physically spend days punching holes into punch cards, take those punch cards to a mainframe computer, physically feed their punch cards into the system, wait for the computer to execute the program, then finally the mainframe would generate some kind of output.

Let's say you just spent 3 days reviewing and punching a brand new program into punch cards, and you're excited to finally go run it on a real CPU and see it work perfectly on the first try. After a few minutes navigating through the maze of a building, you finally make it to the mainframe just to find that I'm already using the mainframe to execute my program. "Hi, my name is Timmy, and I'm new to prorgamming. I made a program to look at every possible chess game and tell me the best possible chess opening, but I'll let you use it when I'm done."

1 mainframe = 1 CPU = 1 execution engine.

If you think about this scenario, the execution engine can only really execute one instruction at a time, so you can't run your program at the same time that I'm running my program. Also, if I stopped my program now so that you could run yours, my program would lose track of all of the chess positions it's calculated so far and I'd have to start over from the beginning. But let's pretend for a moment that I could just magically pause my program and let you run yours, and then I'd start running my program again after you were done right from where I left off.

My program started at memory address 0x01 and started filling up RAM with a massive array of every possible chess position. When you go to run your program, your program also starts at memory address 0x01 and starts using the RAM just like mine did. Once your program is done and I magically swap my program back in, the memory that your program used at address 0x01 is now different from when my program stopped!

This is called a race condition, and race conditions are why we can really only execute one synchronous program at a time. This limitation is said to be blocking the mainframe, because when one program is being executed on a mainframe, it is blocking all other programs from being executed on that mainframe.

This is the Blocking Problem.

1.4 CPU Multitasking & The Birth of Async

Many software developers around that time started using a technique called cooperative multitasking. The technique involves writing a program in a way such that the program is designed to be stopped and resumed at certain points. It's up to the programmer to avoid the Blocking Problem by designing their program to be run in smaller isolated steps, where the output from the previous step can be fed into the beginning of the next step. This approach means that the state of any shared system resource (like memory) can be reset to what the program expects when the program resumes. That means the program can continue on, just like nothing happened.

Cooperative multitasking means that the program currently blocking the mainframe will stop at some predetermined point, which then temporarily unblocks the mainframe even though the larger "program" may not yet be finished. This allows another programmer to block the mainframe for a little bit with their next step of their program. This approach works, but it's annoying because it makes writing programs harder, and all it takes is a few people being "less cooperative" to ruin things for everyone.

In 1964, operating systems were becoming more prevalent, and an operating system called Multics was one of the first notable operating systems to implement preemptive multitasking. Preemptive multitasking involves periodically interrupting a program in the middle of execution and switching to executing another program. How does the operating system avoid the Blocking Problem? It uses isolated sections of memory for each program, and backs up the state of the shared parts of the system (like CPU registers) into that isolated program memory before switching to the other program. Eventually, the operating system will see that your program is still waiting to finish executing and it will restore the backed up system state and then resume your program as if nothing ever happened. This requires quite a bit of bookkeeping, but it works. In fact, it works so well that operating systems still do this even today. It's called context switching.

Thanks to preemptive multitasking, the Blocking Problem is dead... for now.

1.5 Multi-Core Machines

In 2001, IBM created a CPU called the Power4. This CPU was the first multi-core CPU, but what does that mean? Well, think back to the von Neumann architecture. Rather than an entire CPU being the execution engine, a CPU core is now the execution engine, and a CPU can have multiple CPU cores. These CPU cores operate completely independently from each other, which means that CPUs now have a number of execution engines equal to the number of CPU cores it has!

1.6 The Execution Pass

We've been talking a lot about execution from the perspective of the hardware up until now. We're software developers, not hardware... hardware developers? Is that what they're called?

Anyway, I'm going to explain a mental model that has been very helpful to me to understand both async and threading. Keep in mind that this is a mental model, and not necessarily related to what a CPU actually does. Let's begin by looking at some example pseudocode that's definitely not C:

void main(){
    int i = 0;
 
    i++;
}

Imagine something called the execution pass. Visualize the execution pass as a slip of paper that says a line of code has permission to execute. Once that line of code is finished executing, it hands the execution pass to the next line of code, then the next line, then the next line, until there are no more lines of code left to execute.

Back to the example code, I'm going to be pedantic for a moment and point out that code isn't always executing. What do I mean by this? Well, an execution engine can only execute instructions. Code has to be translated to instructions, then those instructions have to be loaded into memory somewhere, then the execution engine can finally start executing those instructions, thus beginning execution of your program. When those instructions aren't actively being executed by an execution engine, your code is just data sitting on a drive. We have to think about where the execution pass comes from so that a program can actually start executing. This is a question we'll answer later.

Regardless of how your program gets the execution pass, your program begins execution at its entry point. This is typically either going to be the start of your file or a function called main. In the example code, line 1 starts a main function and line 5 ends it, but for languages that don't have a main function, you can just imagine that lines 1 and 5 don't exist.

In the example program, the entry point is the main function, so the execution pass gets given to line 1.

Since declaring a function doesn't really do anything, the execution pass gets passed from line 1 to line 2.

Line 2 is a simple variable assignment, assigning the value of 0 to the integer variable i. After this assignment is done, line 2 of the code is done executing and the execution pass gets handed to the next line, line 3.

Line 3 has no code, so the execution pass gets handed to line 4.

Line 4 increments variable i so it now holds the value of 1. Once the variable is done being incremented, line 4 is done executing and the execution pass gets handed to line 5.

Line 5 is the end of the main function, which means there is no code left to execute.

At this point, the exit function is called automatically and gives the execution pass back to whatever it was that started executing your program.

1.7 What Even Is Async?

Now that we understand the idea of the execution pass, we can understand what the point of asynchronous code is, and how it actually works. Forget about the language keywords for now.

Do you remember the Blocking Problem? The problem is that, without some kind of multitasking system, one program blocks the entire execution engine and prevents other programs from executing. Async is a solution to the blocking problem, except within a single program.

One line of your code might get the execution pass and then wait on something for a long time (a network request for example), which blocks the execution pass from being given to the rest of your code, even if the rest of your code doesn't depend on the thing that is being waited on. There are even scenarios where one line of code might end up holding the execution pass forever, which blocks the rest of your program from getting the execution pass forever. This is called a deadlock.

1.7.1 What Causes Code To Block?

Now you're probably wondering, "What parts of my code might hold onto the execution pass for a long time and block the rest of my program?" The answer is surprisingly simple. Any time a line of code gives the execution pass to something other than the next line of code, that line of code has to wait to get the execution pass back before it can hand the execution pass to the next line of code. Here's the really important detail though, your program can give the execution pass to more than just other programs running on your computer.

This may sound weird, but think about it; when you send a network request, your program sits and waits until the remote server responds to the network request. This effectively means that your one line is code is giving the execution pass to the remote server, and this make sense because the remote server can't start processing your request until you actually send the request.

Another scenario, think about a calculator program that's waiting for the user to input numbers and operations on those numbers. The calculator program sits and waits on user input, which effectively gives the execution pass to the user because the calculator can't calculate an answer to an equation that hasn't been input yet.

This is where async comes in.

Notice that I said the line of code effectively gives the execution pass to these external systems. Remember, the execution pass means things can be executed on an execution engine, and a remote server and a user don't rely on your local execution engine to be executed. Async allows that line of code to get the execution pass back immediately and hand it to the next line of code without having to wait on the external system. Sometimes, your program will have nothing else to do until it gets the desired data from the external system, and these are scenarios where blocking is actually perfectly fine. However, there are many times where a program can do other work while it waits for data from an external system.

1.7.2 Synchronous vs Asynchronous Example

To build on this idea, let's pretend for a moment that we're going to make a simple chatroom. To keep it simple, let's say there are only two features: allow the user to send messages, and display messages from other users.

With an example synchronous chatroom, a simple implementation might be to check for messages from other users, and then after receiving one or more messages from other users the execution pass can continue moving through the program and now the user is able to send a message. However, since this is a synchronous program, the program will now block until the user sends a message, which means that the user can't recieve any new messages until they send a message.

It's not hard to tell that this design sucks.

Now compare this to an example asynchronous implementation. The program would check to see if any messages have been received. However, since that line of code gets back the execution pass immediately, any new messages can be displayed immediately, including 0 new messages! Now after displaying any new message, the program can check if the user is trying to send a new message. Again, since the execution pass gets given back immediately, it doesn't matter if the user is trying to send a message or not, the message is sent immediately if there is one, and the execuction pass goes back to checking for new messages. The program never sits and waits for any one thing to happen.

This provides an interactive, real-time experience to the user, just like we'd expect from a chatroom.

1.8 A Brief History of The Async Keyword

Now let's get into the async you probably already know about, the async keyword. Async all began with F#. F# is a functional programming language created in 2005 by the F# Software Foundation and Microsoft. In 2010, F# 2.0 was released and "invented async", but not async as we know it today. I'm far from an expert in F#, but in Section 6.3.10 of The F# 2.0 Language Specification lies the first usage of the await keyword as well as a ! symbol, which appears to denote asynchronous execution of "computation expressions". This section also talks about monads. Nerds.

Jumping forward to 2012, Microsoft releases C# 5.0. Looking at the Language Specification for C# 5.0, we can see that section 15.15 talks about "Async Functions", and section 12.8.8 talks about "Await Expressions". Now this sounds more like modern async.

1.9 Async & Await

Quoting the C# language specification from above, "The await operator is used to suspend evaluation of the enclosing async function until the asynchronous operation represented by the operand has completed." In other words, await foo() can only exist inside of a function that has been declared with the async keyword.

When the awaited foo() encounters a blocking operation that would give the execution pass to an external system, the execution pass is instead handed back up through your program, through every async and await until the execution pass gets back up to main, which we will assume has an async keyword for now. Since main is marked as async, the execution pass gets handed back up to whatever runs your program, and your entire program is now stopped, blocking on foo().

Once foo() stops blocking and is ready to get the execution pass back, the execution pass gets given back to main and then works its way back down through every async and await until it reaches the foo() that blocked. However, since foo() is no longer blocking, the execution pass gets handed to the next line of code just like normal.

This is a big idea, so let's look at an example.

1.9.1 Async & Await Example

C# looks way too verbose to use in an example, so I'm using pseudocode that will hopefully be more familiar to everyone.

async i_cant_wake_up(){
    await sleep_seconds(99999999999999999);
}
 
async wake_me_up_inside(){
    await i_cant_wake_up();
 
    do_important_stuff();
}
 
async main(){
    await wake_me_up_inside();
}

Yes, that Evanescence song was about asynchronous programming this whole time.

Let's follow the code and see what's happening. The entry point is the main function, so the execution pass gets handed to main.

main immediately gives the execution pass to the function wake_me_up_inside.

wake_me_up_inside immediately gives the execution pass to the function i_cant_wake_up.

i_cant_wake_up immediately gives the execution pass to the function sleep_seconds.

sleep_seconds blocks the entire program for 3.2 billion years (99999999999999999 seconds).

As soon as any blocking is awaited, the execution pass gets handed back up the call stack through all of the async functions, so:

sleep_seconds hands the execution pass back to i_cant_wake_up.

i_cant_wake_up is marked as async, so it hands the execution pass back to wake_me_up_inside.

wake_me_up_inside is marked as async, so it hands the execution pass back to main.

main is the entry point and is marked as async, so the execution pass gets handed back up to whatever is executing your program.

Now, this is a problem. Nobody has time to wait 3.2 billion years for their important stuff to get done, so this is why we made everything async. Uh, how do we actually do the useful stuff without waiting 3.2 billion years first? Isn't that the whole point of async? So far, this is just blocking with extra steps.

Well, there's one other very important part to async:

1.10 Futures

Back to the C# language specification, the section on async explains that "The return-type of an async method shall be either void or a task type." Some more very relevant quotes are "The task is initially in an incomplete state" and "If the function body terminates ... any result value is recorded in the return task, which is put into a succeeded state".

If you're familiar with JavaScript, this might sound a lot like promises, and that's because JavaScript promises are basically the same thing as futures. It's complicated though, so I'll explain later. For now, pretend these are the same thing.

Anyway, an easy way to understand futures is to think of a future as a variable that holds all of the stopped async and await stuff below it. When the execution pass starts going up the pile of async and await, the execution pass stops on the line declaring the future, and then the execution pass starts moving through the program again as normal.

1.10.1 Async, Await, & Futures Example Solution

Let's modify the prior example code to use futures:

async i_cant_wake_up(){
    await sleep_seconds(99999999999999999);
}
 
wake_me_up_inside(){
    future f = i_cant_wake_up();
 
    do_important_stuff();
 
    while(!f.done){
        sleep(0);
    }
}
 
main(){
    wake_me_up_inside();
}

Notice that main and wake_me_up_inside are no longer marked as async, and i_cant_wake_up is no longer being awaited by wake_me_up_inside.

Let's go through this new example code step by step to see what is happening.

The entry point is the main function, so the execution pass gets handed to main.

main immediately gives the execution pass to the function wake_me_up_inside.

wake_me_up_inside immediately gives the execution pass to the function i_cant_wake_up.

i_cant_wake_up immediately gives the execution pass to the function sleep_seconds.

sleep_seconds blocks the entire program for 3.2 billion years (99999999999999999 seconds).

As soon as any blocking is awaited, the execution pass gets handed back up the call stack through all of the async functions, so:

sleep_seconds hands the execution pass back to i_cant_wake_up.

i_cant_wake_up is marked as async, so it hands the execution pass back to...

Wait, this part is different. i_cant_wake_up is no longer being awaited, so i_cant_wake_up returns a future, which is assigned to the f variable on line 6. Line 6 is done executing after this variable assignment, so the execution pass gets passed down to line 7.

Line 7 is a blank line, so the execution pass gets handed down to line 8.

Line 8 calls the function do_important_stuff. For the sake of this example, we'll assume this important stuff gets done correctly and quickly. Line 8 is done executing, so the execution pass gets handed down to line 9.

Line 9 is a blank line, so the execution pass gets handed to line 10.

Lines 10 through 12 contain a while loop, where as long as the f is not done yet, the program calls sleep with an argument of 0. Sleeping for 0 seconds is a way to do cooperative multitasking in modern programming languages.

This means that we will still end up waiting 3.2 billion years for the program to end... But the important stuff got done! Remember, async doesn't avoid waiting, it simply allows the program to multitask while it's waiting.

Now, many of you might think that this looping behavior on lines 10 through 12 is very strange because popular async languages like Python, JS, and Go don't do this. In fact, many popular async languages will deadlock if you do what we did on lines 10 through 12, but this is where runtimes come in. Runtimes require a whole chapter of their own.

Oh, would you look at that, a chapter on runtimes is up next. How fortunate.

2 Async Runtimes

I normally like to start by defining what we're going to cover, but saying "a runtime is a library or program that runs a program", while accurate, is very vague. There are many different programming languages with many different design choices regarding their runtimes, if the language even has a runtime, and these differences cause async to behave differently in pretty much every language.

For languages that are either interpreted (JavaScript, Python, etc) or otherwise run on some kind of VM (C#, Java), the language runtime will typically have features built in for handling async. Some of these languages (such as Python) even have multiple ways of doing async, which allows for more fine-grained control over the async works in your program.

For languages that are compiled and executed natively, you will either need to implement your own, or you'll have to find and use an existing 3rd-party library. According to Robert Virding, one of the inventors of the Erlang programming language, "Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang."

Some languages like Go or Erlang are special because they follow academic paradigms like CSP or The Actor Model respectively. The way these languages work are fundamentally different because their runtimes are based on academic principles rather than deriving functionality from lower-level features.

Other notable languages include:

  • C++, which has async functionality without the usual async/await keywords and without a dedicated async runtime.
  • Rust, which has a wide variety of 3rd party async runtimes that each have different behaviors.
  • C/Assembly, which have no async functionality, but they have certain non-blocking functionality.
  • Bash. Yes, the Linux Bourne-Again Shell. You can make async bash code if you're clever and understand the fundamentals of async.

As you can see, this is an absolute mess. The only sensible place to start is with the original async runtime: The operating system.

2.1 The Operating System Runtime

Operating systems are extremely complex, so I will oversimplify details that aren't relevant to us right now. I will still try my best to remain accurate, but the following subchapters are for understanding async, not OS dev, so keep that in mind.

Anyway, the most important part of the operating system is a special program called the kernel. The kernel is the brain of the operating system, and is at the core of everything that happens on a machine, including multitasking. But how does an operating system do multitasking exactly?

2.1.1 Processes & Threads

When you execute a program, the program is copied from the disk into RAM, and that copy in RAM gets executed. That copy of the program in RAM is called a process. In programming terms, think of the program sitting on the disk as a class{} definition, and think of the process as an instance of new class().

From the perspective of the CPU, a process is just a giant list of single instructions that need to be executed one at a time until the process asks the kernel to exit it. This seems pretty straightforward, right? Well, there are plenty of situations where you will want a single process to spawn multiple processes that work together. For example, think about your web browser of choice. Use a task manager or process viewer to look at how many processes your web browser has. It's an astronomical amount.

But what about threads? Threads are more-or-less a more lightweight version of a process, although there are some very important differences between processes and threads. One of the more notable differences is that threads have almost no memory isolation from the process that spawned it (parent process) as well as any other child threads. This means that if you declare a global variable in the parent process or any of the child threads, that parent process and any of the threads can all access and potentially modify that global variable.

Operating system threads are often called OS Threads or native threads to differentiate them from green threads, which we will cover soon. Also note that, although programs may have multiple processes and threads (called multithreading), that is a completely unrelated topic from async. Multithreading is about doing one thing per thread in multiple threads at the same time, while async is about doing multiple things in a single thread. Put another way, multithreading still executes code synchronously, in one specific order, while async executes code asynchronously, in one of many different potential orders. Many people are confused about this, so take the time to understand the difference.

2.1.2 Multitasking, Task Scheduling, & Context Switching

Those of you who have been reading from the beginning should remember the section on multitasking: "Preemptive multitasking involves periodically interrupting a program in the middle of execution and switching to executing another program." Good job, you. If you haven't been reading from the beginning, this part might not make sense.

The operating system will periodically get interrupted by a hardware timer, which is what triggers the preemption. These hardware timer events are called interrupts, and interrupts are handled by Interrupt Service Routines (ISRs). When the hardware timer interrupt is triggered, the execution pass used by the interrupted CPU core gets violently yanked away from whatever process or thread was using it, and then the execution pass is given to the ISR in the kernel that is responsible for handling these hardware timer interrupts.

The first part of the ISR will backup the current state of the CPU in the memory associated with the process or thread that just stopped executing. Once this backup is finished, some kind of task scheduling algorithm (Wikipedia, OSDev Wiki) will be executed to decide which process or thread to give the execution pass to. It could be the same process or thread, or it could be a different one.

Once the algotihm chooses a process or thread to execute, the CPU state will be restored from the new process or thread memory, and then the execution pass will be given to the process or thread right where it left off, just like nothing ever happened. This entire process is called a context switch. Because the CPU state needs to be backed up and restored every time, and because the task scheduling algorithm needs to be run every time, context switches can be quite slow and can have a noticeable amount of overhead if you do things wrong. It can also cause issues like slowloris vulnerabilities or The C10k Problem.

2.1.3 The Execution Loop

This isn't a feature of an operating system, but rather an absolute truth of how hardware works. Say for the sake of argument that you have a CPU with a single CPU core that is clocked at 3 GHz; that means you have a single execution engine capable of executing 3 billion instructions per second. However, 3 billion instructions get executed every second by that CPU core, even if there is nothing to execute! This means the CPU is basically stuck in a while(!hasWorkToDo()); loop. So how does the CPU avoid being under 100% load all of the time while still being responsive when something happens?

Most CPU instruction set architectures (ISAs) will have a halt instruction built in, called HLT. This instruction is basically a sleep(0); for that CPU core. But how does the core know when to wake back up when something happens? Well, the hardware tells the core when something happened by sending it an interrupt. This can be expressed with the following pseudocode:

async do_task_scheduling(){
    let active_threads = get_active_threads();
 
    if(len(active_threads) < 1){
        do_assembly_instruction("HLT");
    }
 
    pick_and_execute_thread(active_threads);
}

When the HLT assembly instruction is executed, the kernel gives the execution pass back to the CPU core, and then the CPU core takes a little nap, which prevents that CPU core from handing out its execution pass until a hardware interrupt wakes it up. When this hardware interrupt happens, rather than giving the execution pass back to line 5 of the example code, the execution pass get given to the start of ISR responsible for handling that hardware interrupt.

2.2 Runtime Details

You might be wondering "If an operating system already has all of this stuff built in, then why do async runtimes even exist?" This is a great question. An operating system is, well, a system for operating the machine. An operating system has many responsibilities including managing processes, dealing with hardware, dealing with ACPI/UEFI, etc.

On the other hand, a runtime is responsible only for helping a single program execute. Runtimes are much more narrow in scope, and often have different design goals.

Although async runtimes are extremely varied, they all have some things in common, and by understanding these similarities, we can learn how to correctly use async in any language without having to blindly memorize endless keywords and patterns.

2.2.1 Green Threads

A green thread is the runtime equivalent of an OS process or thread. A green thread, by definition, exists inside of the runtime, and only inside of the runtime. Let's think of Python as an example. Let's say that you wrote a Hello World in python and you want to execute it. You'll need to run it with a command such as python hello.py. Notice that you are using the python executable to run python code.

When running the above command, the operating system will find the python executable that is installed on your system, and then the operating system will create a process and being executing the python process. Next, the python executable will do some internal setup before loading hello.py. The python code contained within hello.py is loaded as a green thread into the memory of the python process somewhere, and then the python process begins executing the python code in that green thread until the code in hello.py either ends or calls sys.exit().

If we take a step back and look at the big picture, we can see that both OS threads and green threads are just a chunk of memory with big list of instructions that we want to execute. However, if you think about it, there's another concept we've already looked at that seems to fit this definition: a future. If you think about it, a future is a green thread. The only difference is that the future is managed by your program and not some kind of runtime. Think of a future as a different type of green thread. There are actually many other types of green threads, include Go's go routines and JS's promises, fiber and coroutines (C++ warning), etc.

2.2.2 Runtime Task Scheduling

Just like with the operating system, a runtime has to be able to switch between green threads. Unfortunatley, this can be really tricky because runtimes can implement whatever task scheduling algorithm they want, which means we can't make any assumptions about how runtimes will schedule tasks. However, we do know some absolute truths that runtimes can't escape. As we established before, all runtimes have either preemptive or cooperative multitasking, and we know that they operate on some kind of thread data.

There is one massive difference between task scheduling in the OS compared to task scheduling in a runtime though: context switching. When an operating system context switches, it needs to backup and restore CPU state, but a runtime doesn't need to do this! This means that, assuming the runtime is programmed well, context switching inside of a runtime has almost 0 cost. Another important detail is to think about the execution pass. In an operating system, the execution pass gets violently yanked away from the process that the execution engine is currently executing. However, from the perspective of the operating system, the runtime is just a program, so when the operating system context switches and resumes a runtime, the execution pass gets given back to the runtime right where it left off. This means that the runtime isn't even aware that the OS had a context switch, so the runtime doesn't have to do any extra work if the hardware timer interrupts thousands of times while the runtime is helping a program execute.

Now some of you may be wondering, Wouldn't a runtime always be slower than an just using the OS? This is a good question, however the answer is no. A runtime is often times much faster than relying on the operating system when async is involved. A hardware timer will interrupt a CPU core X amount of times per second. This amount of interrupts per second is configurable by the kernel, so the machine will experience X context switches per second no matter what you do.

However, we can make the problem worse. For example, a common and easy way to do async without a runtime is to spawn a thread for every async task, sometimes even using sleep(0) to do cooperative multitasking. However, the operating system has no idea why any of the threads are sleeping, or if any of your threads are waiting on each other, so the task scheduler has to just guess at which thread to resume. If the tsak scheduler guesses wrong, it takes another context switch to try again.

This can add a shit ton of additional context switches, sometimes even orders of magnitude greater than the X that happen by default. However, if a runtime is being used, not only are these additional context switches significantly cheaper, but the task scheduling algorithm will likely have additional insight into how the green threads are blocking each other, which means the task scheduling algorithm can make better decisions. Also, if the task scheduler in a runtime guesses wrong, the cost to do context switch and try again is minimal compared to context switching in the OS.

2.2.3 Event Loop

Most runtimes will have an event loop that, much like the execution loop, is always trying to do something. Some runtimes will handle the event loop for you entirely and you never think about it, some runtimes allow you to manage the event loop directly, some runtimes will spawn addiitonal threads to run the event loop on outside of the main program while others will only multitask on a single thread. Some runtimes even leave it up to you to implement your own event loop. There are many possibilities, and many runtimes will behave differently, so understanding how the execution pass gets given to the event loop is one of the most important parts of async. Despite the importance, I sadly don't see it get talked about much.

Now what is the event loop and how does it apply to green threads and task scheduling? As always, it depends on the runtime, but generally speaking the process will be:

  1. Create an event loop (depends on runtime).
  2. Add green threads to the event loop.
  3. Give the execution pass to the event loop.
  4. The execution pass goes to the green threads assigned to the event loop.
  5. Get the execution pass back from the event loop.
  6. Maybe get a result from the green threads in the event loop.
  7. If no result, do some other work and then go back to step 3.

Remember back to the difference between multithreading and async, multithreading still executes things in order, while async can execute things in any order. This is because, in the above steps, we can see that we "do some other work" as a part of 7 step, and that other work will happen whether or not we got back the result from the green thread yet. Remember, async is about executing code in any order, not at the same time as other code.

2.3 Async Examples

That was a lot of abstract stuff. Let's take a look at some examples to help solidify our understanding.

We will look at a variety of languages and answer a series of questions for those languages to determine how async works in those languages.

These sections will be messy and fucked up until I can get down solid and consistent messaging and definitions, which will take some time. This will probably be the most time-consuming part of the book.

TO-DO: Cover PHP (fibers), C++ (all of the above), C, and Go.

2.3.1 Python

Since Python is quite a popular language, let's start with it. Since this section will rely on Python's implementation details, we will look specifically at version 3.12. When this version inevitably becomes out of date, just go through this same process on the current version of Python. Anyway, when trying to understand async in a new language, we need to answer some questions:

  • What type of green threads does the language use?
  • How do we interact with the event loop?
  • How does event loop get the execution pass?

The first step to answering these questions will usually be to look for documentation on how that language handles async. In the case of Python, we find asyncio pretty quickly. The documentation says this is "a library", however it's listed as a part of the Python standard library, so asyncio is considered core Python functionality.

If we look at the features of asyncio, we can see that Runners, Coroutines and Tasks, and Subprocesses are listed under High-level APIs, and Event Loop and Futures are listed under Low-level APIs. This is actually quite helpful. Before we even start looking at example code and trying to figure out the details, we can infer a lot of details:

  • It looks like coroutines, tasks, and subprocesses are potential different kinds of green threads.
  • It looks like futures are used to manage green threads in the application.
  • There is some kind of interface for managing the event loop.
  • A runner presumably "runs" something, presumably green threads.

Just by looking at the various APIs, we can already infer answers to all of our questions:

  • What type of green threads does the language use? coroutines, tasks, and subprocesses
  • When do green threads get executed? when they are run on the event loop
  • What kind of multitasking is being used? Since we control the event loop, it's up to us
  • Where is the event loop? There is an interface for managing it directly

Now for the next step, we need to actually look at the docs and verify these assumptions.

2.3.1.1 Verifying Python Assumptions

The first assumption is that coroutines, tasks, and subprocesses are different kinds of green threads. Taking a closer look at the documentation, it appears that a coroutine is an async function, a task is a future specifically for coroutines, and subprocesses appear to actually be new OS processes with some kind of Process variable that acts as a future specifically for subprocesses. Although the names are different and the behavior seems a bit convoluted, we can see that Python has different kinds of green threads (or even OS threads) with different kinds of futures.

The second assumption is that green threads are executed when the execution pass is given to the event loop. After taking a closer look at the event loop interface, it appears that we can actually have multiple event loops! It also appears that you're able to assign tasks to event loops, and execute the event loops in various ways. This also validates our third and fourth assumptions.

And just like that, we should now understand async in python well enought to write some correct async code.

2.3.1.2 Python Async Test Code

Let's start with a trivially easy async program, just to verify that it behaves the way we expect:

import asyncio
 
async def main():
    print("It works!")
 
asyncio.run(main())

Just as expected, this prints It works!. Now, what if we try to run 2 different async functions, and have the first one wait a little?

import asyncio
 
async def main1():
    await asyncio.sleep(1)
    print("It works1!")
 
async def main2():
    print("It works2!")
 
asyncio.run(main1())
asyncio.run(main2())

This waits about a second, and then prints the first message, and then prints the second message immediately after. This is exactly what we expect. But how do we get python to run the second async function while the first one is waiting?

import asyncio
 
async def main1():
    sleep_task = asyncio.create_task(asyncio.sleep(1))
    print("It works1!")
    await sleep_task
 
async def main2():
    print("It works2!")
 
asyncio.run(main1())
asyncio.run(main2())

This works a little bit better. The first message is printed immediately, but then the program pauses for about a second before printing the second message still. After doing a bit more reading, we find the solution:

import asyncio
 
async def main1():
    await asyncio.sleep(1)
    print("It works1!")
 
async def main2():
    print("It works2!")
 
async def main():
    await asyncio.gather(
        main1(),
        main2()
    )
 
asyncio.run(main())

Finally. This prints the second message immediately, then the first message after about a second. However, this is done using Python's runner interface. What if we want to use tasks (futures)?

import asyncio
 
async def main1():
    await asyncio.sleep(1)
    print("It works1!")
 
async def main2():
    print("It works2!")
 
loop = asyncio.new_event_loop()
 
task1 = loop.create_task(main1())
task2 = loop.create_task(main2())
 
loop.run_until_complete(task1)
loop.run_until_complete(task2)
 
while not task1.done():
    pass
 
while not task2.done():
    pass

Finally, again. This code prints the second message, then after about a second the first message is printed, then the program exits because both tasks are done. Notice how we're able to manually run and wait for these async functions to finish without even using the await keyword.

In every language we look at, our goal will be to figure out how to run asynchronous tasks concurrently without awaiting anything. Doing this will require understanding the details of how async works in that language, and will prove mastery over async in that language.

2.3.2 JavaScript

Next, let's look at another very widely-used language: JavaScript. JavaScript doesn't really have "versions" anymore, so I will just link to the current documentation and hope it never changes. Web dev.

Anyway, one positive thing I can say about JS is that it is actually designed to be asynchronous from the very beginning, and is actually one of the best examples of an async language. JS has async functions. Reading the documentation, it appears JS has an object representing an async function, the await keyword, and Promises. Now, let me make a very important distinction here. Promises are another important concept in the world of async, and although a promise is closely related to a future, a future and a promise are not the same thing.

In a normal program, a future basically holds a green thread in a variable, but what if we changed out perspective to look at the future from inside of the green thread? This is a promise. A green thread can interact with a promise to pass data from the green thread to the rest of the program, and to signal that the green thread has finished executing. JS uses the terms "outer promise" and "inner promise" to make this distinction, but understand that the "outer promise" is a future.

With that out of the way, let's revisit the questions we asked when looking at Python:

  • What type of green threads does the language use? Async function objects and promises
  • Where is the event loop? Presumably in the browser/runtime somewhere
  • What kind of multitasking is being used? Because a promise has to signal when it's done, mutlitasking has to be cooperative
  • When do green threads get executed? Unsure, maybe it's automatically handled by the event loop?

This isn't a very good start, we're assuming almost everything about how the event loop works. Fortunately, there is documentation on how the event loop works in JS. Reading through this, "HTML event loops split jobs into two categories: tasks and microtasks. Microtasks have higher priority and the microtask queue is drained first before the task queue is pulled." Ah, there's a hidden detail here! Fortunately, there is also documentation on microtasks. However, the last detail we need is buried in the documentation for the await keyword. "... because both await and queueMicrotask() schedule microtasks, the order of execution is always based on the order of scheduling."

Now let's revisit the questions one more time:

  • What type of green threads does the language use? Async function objects, promises, tasks, and microtasks
  • Where is the event loop? In the "agent" (browser or runtime)
  • What kind of multitasking is being used? Cooperative
  • When do green threads get executed? When microtasks are executed

2.3.2.1 JavScript Async Test Code

Let's write some test code to test our assumptions. I'm going to write these tests for browser JS, but they should be trivial to change for using Node or other non-browser runtimes.

let f = async function(){
    console.log("It Works");
}
 
await f();

This prints the message just as we expect. Now, because JS was designed to be inherently asynchronous, there actually is no sleep function in JS. Instead, we will have to build our own. JS has a setTimeout function, so let's start with that:

let f1 = async function(){
    console.log("It Works1");
}
 
let f2 = async function(){
    console.log("It Works2");
}
 
setTimeout(f1,1000);
 
await f2();

The above code immediately prints the second message, then after about a second prints the first message. That's not really a sleep function. Instead, JS tries very hard to never block execution, so when setTimeout is called, the function f1 is added as a task to the task queue, and then the code keeps executing without stopping and waiting for f1! Let's try another approach:

let f1 = async function(){
    console.log("It Works1");
}
 
let f2 = async function(){
    console.log("It Works2");
}
 
await new Promise(
    function(resolve){
        setTimeout(
            async function(){
                await f1();
                resolve(null);
            },
            1000
        );
    }
);
await f2();

This pauses for about 1000ms, then prints the first message, then prints the second message immediately after. Get ready to engage your brain and test your knowledge of async so far. See if you can use our understanding of JS so far to figure out what's going on before reading the following explanation:

  1. Execution goes down to line 9, where the promise is awaited.
  2. When something is awaited, the awaited function gets put onto the microtask queue and the current green thread stops executing until the promise is resolved.
  3. The microtask queue has a microtask, so the event loop executes the microtask right away.
  4. The promise function is executed, which adds a function as a new task to be executed 1000 ms from now. Note that this function does not call the provided resolve callback, so the promise does not resolve yet.
  5. After roughly 1000 ms have passed, the function is added to the task queue and gets executed. f1 gets called and prints the first message, then the resolve callback gets called and the promise resolves..
  6. Since the promise has resolved, the suspended green thread is added back to the task queue, where the event loop eventually begins executing it again.
  7. Execution moves to line 19, where f2 is awaited. f2 gets added as a microtask to the microtask queue, and the current green thread suspends execution again.
  8. The microtask queue has a microtask, so the event loop executes the microtask right away.
  9. f2 executes and prints the message.
  10. The microtask queue is now empty, so the green thread is resumed. However, there is no more code, so the green thread exits.

That was quite the long explanation, JS has the most flexible async design of any language I know. If you're interested in getting a better grasp on async, I recommend learning JS and getting comfortable with how the language handles async.

For our last test, we will try executing and resolving async functions without using the await keyword. If we just call the async functions directly without using the await keyword, the functions will still actually execute, but it seems that the functions return promises and the runtime will implcity await the promises, so let's try a different approach:

let f1 = async function(){
    console.log("It Works1");
}
 
let f2 = async function(){
    console.log("It Works2");
}
 
queueMicrotask(f1);
queueMicrotask(f2);
 
console.log("Done");

The above code will print Done, followed by the first message, then the second message. In fact, we don't even need to mark f1 and f2 as async function, this will still work just the same. Async in JS is truly a strange system.

Postface here. This is currently a public draft, and is not yet finished!