DRAFT - Stop Sucking at Async and Threading

Even the most senior software engineers often struggle with mastering techniques like non-blocking code, using async/await properly, and doing multithreading safely without sacrificing performance. But who can blame them? Information on these topics is lacking and all over the place. Informational resources assume you already have some amount of understanding, explain only a single part of the big picture, and often get details wrong! These techniques are hard because they require a fundamental shift in how we think about computing.

Allow me to take you on a not-so-gentle journey through the hellscape of fearful concurrency, the academia of lock-free thread safety, and then finally the ascension beyond skill issue.

1 Important Concepts

1.1 Hello, World!

1.2 Basic Theory of Computation

1.3 The Blocking Problem

1.4 CPU Multitasking & The Birth of Async

1.5 Multi-Core Machines

1.6 The Execution Pass

1.6.1 Where Does The Execution Pass Come From?

1.6.2 Reading The Execution Pass Example Code

1.7 What Even Is Async?

1.7.1 What Causes Code To Block?

1.7.2 Synchronous vs Asynchronous Example

1.8 A Brief History of The Async Keyword

1.9 Async & Await

1.9.1 Async & Await Example

1.10 Futures

1.10.1 Async, Await, & Futures Example Solution

2 Async Runtimes

2.1 The Operating System Runtime

2.1.1 Processes & Threads

2.1.2 Multitasking, Task Scheduling, & Context Switching

2.1.3 The Execution Loop

2.2 Runtime Details

2.2.1 Green Threads

2.2.2 Runtime Task Scheduling

2.2.3 Event Loop

2.3 Runtime Differences

1 Important Concepts

As I mentioned before, mastering these techniques requires changing how we think about computing. In this chapter, we will walk through a brief (but necessary) history of how CPUs and software have developed, as well as some concepts that have helped me personally. Building a solid foundation is the most important part of learning, so do not skip over this chapter.

1.1 Hello, World!

Let's start at the very beginning. Imagine a simple hello world program. There is an entry point (either a main function or the start of your file), then the code that prints "Hello, World!", and then your program calls whatever exit function exists in the language you're using. Pretty much every modern programming language automatically calls this exit function when there is no more code to execute, so we don't have to do this step manually.

That's pretty much it. Execution starts at the program's entry point, then works its way down the file line by line until it runs out of code or is told to exit. This is how normal synchronous code works. It's easy, predictable, and behaves exactly the same way every time. But to understand why software normally works this way, we need to take a look at the hardware.

1.2 Basic Theory of Computation

Let's say we want to design a computer that, you know, computes things. How do we do this? Well, math computes things, and math has this thing called a Turing machine. To oversimplify, a Turing machine has two important parts to it: some memory, and some operations stored in the memory that can modify the memory. These operations can do anything to the memory, there are no limits. Because of this, a Turing machine can be thought of as an engine to execute generalized operations. This is basically what we want a computer to do. Because "computer" is a bit of an unclear term, let's instead call it an execution engine. I like this name because, you know, this is an engine that executes things.

Now, how do we get something a bit more practical that we can physically build? In 1945, a man named John von Neumann created the von Neumann architecture, which is what is used today. The von Neumann architecture describes a central processing unit (CPU) that receives inputs and produces outputs, and also has some access to some memory that will be modified while performing operations. Also, these "operations" are called instructions.

To reiterate, a CPU is an execution engine. We'll get into multi-core CPUs later, so for now assume that 1 CPU = 1 execution engine.

1.3 The Blocking Problem

What happens when you want to use the CPU to execute your program, but I'm already using the CPU to execute my program which will take a very long time to finish? If you think about it, a CPU can only really execute one instruction at a time, so only one of us can use it at a time. Also, because of the nature of synchronous code, our code is written in a way that assumes it won't get interrupted.

But let's just try the obvious idea. What if I stopped my program so you could run your program really quick, and then I could go back to running my program again? What's the worst that could happen?

Suppose my program started printing out Hello, World! but only got out Hello, before starting on a lot of really slow calculations. At this point, I stop my program, let you use the CPU, and your program prints the text Race Condition! to the screen and then exits quickly. In the end, the screen would show Hello, Race Condition! That's not good.

What if we did the next obvious thing and just cleared the screen before running your program? Your program would properly display Race Condition!, but after clearing the screen again and resuming my program, the screen would only show World! when my program finishes. This doesn't work either.

This is called a race condition, and race conditions are why we can really only execute one synchronous program at a time. This limitation is called blocking the CPU, because when one program is being executed by a CPU, it is blocking all other programs from being executed by that same CPU.

One last note before moving on, this example would probably never happen in real life. This example is meant to be easy to visualize and understand. Real race conditions are usually invisible, extremely inconsistent, and are notoriously hard to detect and fix. If you've never felt the pain of troubleshooting a race condition, consider yourself lucky.

1.4 CPU Multitasking & The Birth of Async

Think back to a time when computers weren't in every home, and universities had massive computers called mainframes that had to be fed punch cards. These mainframes were timeshared systems, meaning that a user would take their program to the computer, feed their program into the computer, wait for the computer to finish executing their program, then get an output from the computer. Only after all of this is done can the next user come in and execute their program on the computer. Again, this is because a CPU gets blocked because a CPU can only execute one synchronous program at a time.

Obviously, this approach kind of sucks. There is a lot of time when the CPU is doing nothing. Time spent troubleshooting errors, people just being late for their timeslot, poor planning around how long things will take to execute, there are so many things that can go wrong and cause the CPU to be unused.

Many software developers around that time started using a technique called cooperative multitasking. The technique involves writing a program in a way such that the program is designed to be stopped and resumed at certain points. It's up to the programmer to avoid the Blocking Problem by manually and explicitly using various techniques that prevent the program and its output from being corrupted while the CPU is executing other programs.

In practice, cooperative multitasking means that the program currently blocking the CPU will stop at a predetermined point, which unblocks the CPU. This allows another user to execute their program and block the CPU for a little bit, until their program also hits a predetermined point and stops executing. This unblocks the CPU again, which allows the first user to execute their program again on the CPU. Every program cooperates to share the CPU.

This approach works, but it's still far from perfect. It makes writing programs harder, and all it takes is one person to not be cooperative and hog the CPU.

In 1964, operating systems were becoming more prevalent, and an old operating system called Multics was one of the first notable operating systems to implement preemptive multitasking. Preemptive multitasking involves periodically interrupting a program in the middle of execution and switching to executing another program. How does the operating system avoid Blocking Problem? It backs up the current state of the CPU and the entire program somewhere in memory before switching to execute the other program. Eventually, the operating system will see that your program is waiting to be resumed, restore the backed up CPU and program state, and then continue executing your program as if nothing ever happened. This requires quite a bit of bookkeeping, but it works. In fact, it works so well that operating systems still do this even today. It's called context switching.

Thanks to preemptive multitasking, the Blocking Problem is dead... for now.

1.5 Multi-Core Machines

In 2001, IBM created a CPU called the Power4. This CPU was the first multi-core CPU, but what does that mean? Well, think back to the von Neumann architecture. Rather than an entire CPU being the execution engine, a CPU core is now the execution engine, and a CPU can have multiple CPU cores. These CPU cores operate completely independently from each other, which means that multiple different programs can actually be executing at the same exact moment in time on a single CPU!

To reiterate, rather than 1 CPU = 1 execution engine, now 1 CPU core = 1 execution engine.

1.6 The Execution Pass

We've been talking about execution from the perspective of the hardware up until now. We're software developers, not hardware... hardware developers? Is that what they're called?

Anyway, let's look at some example pseudocode that's definitely not C:

void main(){

int i = 0;

i++;

}

Now, imagine something called the execution pass. Visualize the execution pass as a slip of paper that gives a line of code permission to execute. Once that line of code is finished executing, it hands the execution pass to the next line of code, until there are no more lines of code.

Back to the example code, the first important thing to note is that code isn't always executing. In fact, the above example code can never be executed. You can copy/paste this code into a new file, compile and execute that code, but this example code on this page can never be executed. This is obvious, but why is this the case? We have to think about where the execution pass comes from.

1.6.1 Where Does The Execution Pass Come From?

So far, it seems like a CPU core will just execute whatever is handed to it, and this is true from a hardware perspective. But from the software's perspective, how a system does multitasking has a massive impact on where the execution pass comes from.

On systems with no multitasking, the execution pass is given to your program by the CPU core as soon as your program starts being executed.

On systems with cooperative multitasking, the execution pass comes from other programs running on the CPU core, or the CPU core itself if your program is the first to execute. Remember that your program needs to explicitly give up the execution pass to other programs running on the system.

On systems with preemptive multitasking, the operating system is periodically given the execution pass by the hardware. How the hardware does this is very complicated and outside of the scope of this article, but the important part is that all programs running within operating systems are given the execution pass by the operating system.

This also means that your program will periodically have the execution pass taken away and given back with every context switch. The timing of a context switch depends entirely on the hardware and how the operating system configures it, so it's impossible for you as a programmer to try to work around these context switches. However, since context switches have no side effects, there is no potential to corrupt your code, so the only real impact context switching has on your program is performance.

Although we can't control or plan around context switching directly, there are still strategies we can use to minimize performance impacts on our programs. Context switching doesn't have much impact on async code, but there is a massive impact on multithreaded code, which we'll get into later. For now, let's get back to the example code.

1.6.2 Reading The Execution Pass Example Code

Regardless of how your program gets the execution pass, your program begins execution at its entry point. On the hardware level, the entry point is just the memory address of the program's first instruction, but from the software's perspective this is either going to be the start of your file or a function called main. In the example code, line 1 starts a main function and line 5 ends it, but for languages that don't have a main function, you can just imagine that lines 1 and 5 don't exist.

In the earlier example code, the entry point is the main function, so the execution pass gets given to line 1.

Since declaring a function doesn't really do anything, the execution pass gets passed from line 1 to line 2.

Line 2 is a simple variable assignment, assigning the value of 0 to the integer variable i. After this assignment is done, line 2 of the code is done executing and the execution pass gets handed to the next line, line 3.

Line 3 has no code, so the execution pass gets handed to line 4.

Line 4 increments variable i so it now holds the value of 1. Once the variable is done being incremented, line 4 is done executing and the execution pass gets handed to line 5.

Line 5 is the end of the main function, which means there is no code left to execute. At this point, the program exits and the execution pass gets handed back to the operating system, which will now probably clean up the example program from memory and hold onto the execution pass until another program wants to be executed.

1.7 What Even Is Async?

Now that we understand the idea of the execution pass, we can understand what the point of asynchronous code is, and how it actually works. Forget about the language keywords for now.

Do you remember the Blocking Problem? The problem is that, without some kind of multitasking system, one program blocks the entire CPU and prevents other programs from executing. Async is a similar concept, except within one program. One part of your code might get the execution pass and then block for a long time, which prevents the rest of your code from executing. There are even scenarios where a line of code might hold the execution pass forever, which means your program will stop executing entirely. This is called a deadlock.

1.7.1 What Causes Code To Block?

Now you're probably wondering, what parts of my code might hold onto the execution pass for a long time and block the rest of the program? The answer is surprisingly simple. Any time a line of code gives the execution pass to something other than your program, your program blocks. Here's the really important detail though, your program can give the execution pass to more than just software.

This may sound weird, but think about it; when you send a network request, your program sits and waits until the remote server responds to the network request. This effectively gives the execution pass to the remote server, because the remote server can't start processing your request until you send it. When a calculator program is waiting for the user to input numbers, the calculator program sits and waits. This effectively gives the execution pass to the user because the calculator can't calculate an answer to an equation that hasn't been input yet.

This is where async comes in.

If you notice, I said that your program effectively gives the execution pass to these things. Remember, the execution pass means things can be executed on our local execution engine, which is our local CPU core. Our local CPU core doesn't execute code on a remote server, and it certainly doesn't execute people. Async allows your program to keep the execution pass while waiting on external systems. Your program may not be able to continue until it gets back some certain data, but oftentimes your program can prepare and set up many things while it waits to receive the data. After all of this extra work is done, then your program can wait and do nothing until it gets the data it needs.

1.7.2 Synchronous vs Asynchronous Example

Let's pretend for a moment that we're going to make a simple chatroom. To keep it simple, let's say there are only two features: allow the user to send messages, and display messages from other users.

With an example synchronous chatroom, a simple implementation might be to check for messages from other users, and then after receiving one or more messages from other users the implementation would allow the user to send a message. However, since this is synchronous, the program will block until one or more other users send one or more messages, which means that the user can't send any messages until they receive a message from someone else first. Now, let's say the user finally receives a message and gets unblocked, now it's the user's turn to send a message. The program will now block on the user sending a message, which means that the program would be unable to receive new messages until the user sends a message.

It's not hard to tell that this design sucks.

Now compare this to an example asynchronous implementation. The program would check to see if any messages have been received, and then display any messages that have been received. It doesn't matter if zero messages or one million messages were received, the program doesn't block waiting for new messages. Next, the program would check for any messages waiting to be sent from the user. Again, it doesn't matter if the user provided zero messages or one million messages, the program doesn't sit and wait for user input before moving on. Next, the program will go back to checking if any messages have been received, and this pattern will repeat until the user closes the chatroom.

This works how we expect a chatroom to work.

1.8 A Brief History of The Async Keyword

Now let's get into the async you probably already know about, the async keyword. Async all began with F#. F# is a functional programming language created in 2005 by the F# Software Foundation and Microsoft. In 2010, F# 2.0 was released and "invented async", but not async as we know it today. I'm far from an expert in F#, but in Section 6.3.10 of The F# 2.0 Language Specification lies the first usage of the await keyword as well as a ! symbol, which appears to denote asynchronous execution of "computation expressions". This section also talks about monads for some reason. Haskell mention, let's go.

Jumping forward to 2012, Microsoft releases C# 5.0. Looking at the Language Specification for C# 5.0, we can see that section 15.15 talks about "Async Functions", and section 12.8.8 talks about "Await Expressions". Now this sounds more like modern async.

1.9 Async & Await

Quoting the C# language specification from above, "The await operator is used to suspend evaluation of the enclosing async function until the asynchronous operation represented by the operand has completed." In other words, await foo() can only exist inside of a function declared with the async keyword. When the awaited foo() encounters a blocking operation, it causes all of the awaiting async functions above it on the call stack to temporarily suspend execution until foo() is able to unblock, at which time the call stack is resumed.

This is a lot of words, so let's go into an example.

1.9.1 Async & Await Example

C# looks way too verbose to use in an example, so I'm using pseudocode that will hopefully be more familiar to everyone.

async i_cant_wake_up(){

    await sleep_seconds(99999999999999999);

}

async wake_me_up_inside(){

    await i_cant_wake_up();

    do_important_stuff();

}

async main(){

    await wake_me_up_inside();

}

Yes, that Evanescence song was about async this whole time.

Anyway, let's follow the code and see what's happening. The entry point is the main function, so the execution pass gets handed to main.

main immediately gives the execution pass to the function wake_me_up_inside.

wake_me_up_inside immediately gives the execution pass to the function i_cant_wake_up.

i_cant_wake_up immediately gives the execution pass to the function sleep_seconds.

sleep_seconds blocks the entire program for 3.2 billion years (99999999999999999 seconds).

As soon as any blocking is awaited, the execution pass gets handed back up the call stack through all of the async functions, so:

sleep_seconds hands the execution pass back to i_cant_wake_up.

i_cant_wake_up is marked as async, so it hands the execution pass back to wake_me_up_inside.

wake_me_up_inside is marked as async, so it hands the execution pass back to main.

main is the entry point, and is marked as async. This means that the execution pass gets handed back to the operating system (or the runtime, but that's next chapter).

Now, this is a problem. Nobody has time to wait 3.2 billion years for their important work to get done, so this is why we made everything async. Why isn't it working? Well, there's one more very important part of the async equation.

1.10 Futures

Back to the C# language specification, the section on async explains that "The return-type of an async method shall be either void or a task type." Some more very relevant quotes are "The task is initially in an incomplete state" and "If the function body terminates ... any result value is recorded in the return task, which is put into a succeeded state".

If you're familiar with JavaScript, this might sound a lot like promises, and that's because JavaScript promises are futures (We'll get into this later). Futures basically allow you to take a suspended call stack of async functions, hold onto it, and check on it again later. This is probably hard to understand, so let's get back to the example.

1.10.1 Async, Await, & Futures Example Solution

If the async and await keywords can turn synchronous code into asynchronous code, futures are how you turn asynchronous code back into synchronous code.

Let's look at the example code after adding futures:

async i_cant_wake_up(){

    await sleep_seconds(99999999999999999);

}

wake_me_up_inside(){

    future = i_cant_wake_up();

    do_important_stuff();

    while(!future.done){

        sleep(0);

    }

}

main(){

    wake_me_up_inside();

}

Notice that main and wake_me_up_inside are no longer marked as async, and i_cant_wake_up is no longer being awaited by wake_me_up_inside.

Let's go through this new example code step by step to see what is happening.

The entry point is the main function, so the execution pass gets handed to main.

main immediately gives the execution pass to the function wake_me_up_inside.

wake_me_up_inside immediately gives the execution pass to the function i_cant_wake_up.

i_cant_wake_up immediately gives the execution pass to the function sleep_seconds.

sleep_seconds blocks the entire program for 3.2 billion years (99999999999999999 seconds).

As soon as any blocking is awaited, the execution pass gets handed back up the call stack through all of the async functions, so:

sleep_seconds hands the execution pass back to i_cant_wake_up.

i_cant_wake_up is marked as async, so it hands the execution pass back to...

Wait, this part is different. wake_me_up_inside is no longer marked async, so i_cant_wake_up returns a future, which is assigned to the future variable on line 6. Line 6 is done executing after this variable assignment, so the execution pass gets passed down to line 7.

Line 7 is a blank line, so the execution pass gets handed down to line 8.

Line 8 calls the function do_important_stuff. For the sake of this example, we'll assume this important stuff gets done correctly and quickly. Line 8 is done executing, so the execution pass gets handed down to line 9.

Line 9 is a blank line, so the execution pass gets handed to line 10.

Lines 10 through 12 contain a while loop, where as long as the future is not done yet, the program calls sleep with an argument of 0. Sleeping for 0 seconds is an easy way to do cooperative multitasking. This is basically telling the operating system "I don't have anything to do, so take my execution pass to do other stuff and give it back to me at any point in the future".

This means that we will still end up waiting 3.2 billion years for the program to end... But the important stuff got done first! Remember that async doesn't avoid waiting entirely, it simply allows the program to multitask while it's waiting.

Now, many of you might think that this looping behavior on lines 10 through 12 is very strange because popular async languages like Python, JS, and Go don't do this. In fact, many popular async languages will deadlock if you do what we did on lines 10 through 12, but this is where runtimes come in. Runtimes require a whole chapter of their own.

2 Async Runtimes

Runtimes are very tricky, and unfortunately runtimes vary quite a lot. I'd like to start this chapter by going over what a runtime is, but sadly there is no simple definition. There are many different programming languages with many different design decisions, and these differences cause every language to implement their runtimes differently. There is so much variety that some languages don't even have a runtime out of the box, so you have to deal with 3rd party runtimes or make your own!

For languages that are either interpreted (JavaScript, Python, etc) or otherwise run on some kind of VM (C#, Java), runtimes are typically built into the interpreter or VM and can be interacted with directly by the program.

For languages that are compiled and executed natively, these runtimes are usually implemented as software libraries that are used and managed directly by the program.

Some languages like Go or Elixir are special because they follow academic paradigms like CSP or The Actor Model respectively. These languages work completely differently, and therefore handle async and multithreading completely differently.

The only sensible place to start is with the original runtime: The operating system.

2.1 The Operating System Runtime

Operating systems are extremely complex, so I will oversimplify details that aren't relevant to us right now. I will still try my best to remain accurate, but don't take any of these statements about how operating systems work as fact. With that out of the way, let's dig into the important parts of the operating system.

The first and most important part of the operating system is a special program called the kernel. The kernel is the brain of the operating system, and is at the core of everything that happens on a machine, including multitasking. But how does an operating system do multitasking exactly?

2.1.1 Processes & Threads

When you execute a program, the program is copied from the disk into RAM, where it gets executed. The copy of the program that's in RAM is called a process. In programming terms, think of the program as a class{ } definition, and think of the process as an instance of new class().

To the CPU, this process is a giant list of single instructions that need to be executed one at a time until the process asks the kernel to exit it. This seems pretty straightforward, right? Well, there are plenty of situations where you will want a single process to spawn multiple processes that work together. For example, think about your web browser of choice. Look at how many processes that thing has. It's a lot.

But what about threads? Threads are simply a more lightweight version of a process, although on modern systems there are only a handful of differences between processes and threads. One of the more notable differences is that threads have almost no memory isolation from the process that spawned it (parent process) as well as any child threads. If you declare a global variable in the parent process or any of the child threads, that parent process and any of the threads can access that global variable.

Threads are also called operating system threads, OS Threads, or native threads.

2.1.2 Multitasking, Task Scheduling, & Context Switching

Those of you who have been reading from the beginning should remember the section on multitasking: "Preemptive multitasking involves periodically interrupting a program in the middle of execution and switching to executing another program." Good job, you. If you haven't been reading from the beginning, this part might not make sense.

The operating system will periodically get interrupted by a hardware timer, which is what triggers the preemption. When this hardware timer event triggers, the execution pass belonging to the interrupted CPU core gets yanked from whatever process or thread was holding it, and then the execution pass is given to whichever kernel Interrupt Service Routine (ISR) is responsible for handling these hardware timer events.

Typically, the first part of the ISR will back up the current state of the CPU in the memory associated with the process or thread that just stopped executing. Once this back up is finished, some kind of task scheduling algorithm (Wikipedia, OSDev Wiki) will be executed to decide which process or thread the operating system will give the execution pass to next.

Once the process or thread is chosen, the CPU state will be restored to whatever is stored in that process or thread's associated memory, and then the execution pass will be given back to the process or thread right where it left off, just like nothing ever happened. This entire process is called a context switch.

2.1.3 The Execution Loop

This isn't exactly a part of an operating system, but is more of an absolute truth that's just based on how hardware works. Say for the sake of argument that you have a CPU with a single CPU core and was clocked at 3 GHz; that means you have a single execution engine capable of executing 3 billion instructions per second. However, 3 billion instructions get executed every second by that CPU core, even if there is nothing to execute! This means the CPU is basically stuck in a while(!hasWorkToDo){} loop. So how does the CPU avoid being under 100% load all of the time while still being responsive when something happens?

Most every CPU instruction set architecture (ISA) will have a halt inutruction built in, called HLT. Although the wiki page explicitly refers to x86, every ISA should have a comparable instruction. Anyway, this instruction is basically a sleep(0) for that CPU core. But how does the core know when to wake back up when something happens? Well, the hardware tells the core when something happens by sending it an event. This can be expressed with the following pseudocode:

async do_task_scheduling(){

    let active_threads = get_active_threads();

    if(len(active_threads) < 1){

        do_assembly_instruction("HLT");

    }

    pick_and_execute_thread(active_threads);

}

When the HLT assembly instruction is executed, the CPU core stops executing and takes a little nap. This means that the execution pass for that execution engine can't be handed out to anything until the hardware timer wakes up the CPU core again. When a hardware event is triggered that wakes up the CPU core and the execution engine starts executing again, the execution pass doesn't go back to line 5 of the example. Instead, the execution pass gets handed to an ISR in the kernel and the execution engine starts executing kernel code from there.

2.2 Runtime Details

You might be wondering "If an operating system already has all of this stuff built in, then why do runtimes even exist?" This is a great question, and honestly it took me a while to figure out what the difference is. An operating system is, well, a system for operating the computer. An operating system has many responsibilities including managing processes, dealing with hardware, dealing with ACPI/UEFI, etc.

On the other hand, a runtime is responsible only for helping a single program execute. Runtimes are significantly more varied in scope, functionality, and interface, runtimes can sometimes be much more efficiently than an operating system for certain tasks, and larger runtimes typically change how a programming language work on a fundamental level.

However, most runtimes have some things in common:

2.2.1 Green Threads

Green threads, also called virtual threads by Java, are similar to OS threads, but green threads are very different from OS threads. OS threads exist within the context of the operating system, while green threads exist within the context of the async runtime. OS threads and green threads are similar to the extent that they are both a giant list of CPU instructions that the CPU needs to execute, but the difference is the inclusion of an async runtime with green threads.

An operating system is responsible for sharing the system's execution passes frequently with every thread that's being executed on the system. Meanwhile, when an async runtime gets an execute pass, it hands that execution pass to the one program that the async runtime is responsible for executing. Smart async runtimes have an advantage here, because when one part of a program is no longer able to make forward progress (due to blocking for example), the execution pass gets handed back to the async runtime, and then the async runtime can look at other green threads and give them the execution pass if the runtime decides that green thread might be able to make progress.

2.2.2 Runtime Task Scheduling

The way a runtime does task scheduling really depends heavily on the runtime and the work you're doing. Generally, when an async function blocks, the runtime will stop executing it and execute other useful stuff. At some point, it will go back to the async function that's blocking and wait for it to unblock, at this time the async functions will get resumed.

2.2.3 Event Loop

Event Loop Here

2.3 Runtime Differences

Now let's revisit the detail mentioned in the last chapter, blocking on a future. The entire point of using async is to avoid blocking, right? Well, the truth is that async doesn't mean you automatically avoid all blocking, it simply delays the blocking and allows you to do other stuff while that future is resolving. Sometimes, this delay is long enough that the future is done by the time you check it again, but other times the future won't be done yet and you won't have anything else to do, so you will have to wait on the future anyway.

To see this in action, let's look at everyones' favorite language: JavaScript. Browsers have something called an IndexedDB. This is basically a client side NoSQL database stored in the browser, probably backed by SQLite. The API for interacting with IndexedDB is entirely asynchronous.

Now let's get into a little more detail here. The function for opening a connection to IndexedDB returns a promise (the JavaScript version of a future). If you look at the example on the IndexedDB open function page, you'll notice that the promise works by registering callback functions instead of blocking on the promise. In fact, if you block on the promise and wait for it to become fulfilled, you'll find that the promise stays pending forever. It's tempting to write this off as a weird JavaScript quirk, but this actually happens in many languages. What's going on here? We need to get into runtimes.

Postface here. This is currently a public draft, and is not yet finished!

DRAFT - Stop Sucking at Async and Threading

Table Of Contents

1 Important Concepts

1.1 Hello, World!

1.2 Basic Theory of Computation

1.3 The Blocking Problem

1.4 CPU Multitasking & The Birth of Async

1.5 Multi-Core Machines

1.6 The Execution Pass

1.6.1 Where Does The Execution Pass Come From?

1.6.2 Reading The Execution Pass Example Code

1.7 What Even Is Async?

1.7.1 What Causes Code To Block?

1.7.2 Synchronous vs Asynchronous Example

1.8 A Brief History of The Async Keyword

1.9 Async & Await

1.9.1 Async & Await Example

1.10 Futures

1.10.1 Async, Await, & Futures Example Solution

2 Async Runtimes

2.1 The Operating System Runtime

2.1.1 Processes & Threads

2.1.2 Multitasking, Task Scheduling, & Context Switching

2.1.3 The Execution Loop

2.2 Runtime Details

2.2.1 Green Threads

2.2.2 Runtime Task Scheduling

2.2.3 Event Loop

2.3 Runtime Differences