A Book On x86_64

These are old notes that may not be accurate! This required a massive amount of research that I didn't have time for, but I will return to this some day.

0) Important Stuff, Please Read

0.1) Disclaimer

DISCLAIMER: This book is not gospel, the The AMD64 Architecture Programmer's Manual is gospel. The Intel® 64 and IA-32 Architecture Software Developer's Manual is also gospel, although slightly different than the AMD manual (this will be elaborated on further later).

This book is simply my summarized understanding of x86_64, and I am sharing this in an attempt to help others understand the x86_64 architecture. Although I would like for this book to be as accurate as possible, there will likely be mistakes, errors, and omissions made in this book.

DO NOT do something stupid and try to blame it on me. Instead, do something stupid and blame it on the gospel.

0.2) Why Learn x86_64?

For the most part, x86_64 is the CPU architecture that runs the modern world of computing. Every single programming language has to get translated down to machine code to run on x86_64 CPUs. Understanding the x86_64 architecture is critical for understanding how to write performant code that runs on these machines.

x86_64 is also quite a complex architecture, which includes many features that are not really used or understood by the vast majority of developers, or even systems developers. These would make excellent targets for security research.

Another great reason is that it will give you a lot of insight into the code of how your computer works. A lot of the function of the bootloader and kernel are dependnt on the CPU architecture, so understanding even a basic overview of the CPU architecture can provide great insight into many non-obvious things that can harm performance.

Lastly, I want to mention that knowledge and understanding helps ensure freedom and transparency. The more people who actually understand how the x86_64 architecture works, the more eyes and minds there are working together in this space to help hold each other accountable and imagine great new innovations. This is perhaps the greatest reason to learn the x86_64 architecture.

0.3) References

0.3.1) Naming

Unfortunately, before we can even talk about x86_64, we first have to define what x86_64 even is. Unfortunately, there have been many x86-related architectures and names over the years. To provide clarity, I have generated the following flowchart. Note that the blue text is hyperlinked in the SVG file.

Graphviz File

Some notes:

The term x86 is incredibly ambiguous these days. It is most often used to refer to IA-32, however it has also been used to other things, such as the entire x86 CPU lineage between today and the original Intel 8086, made in 1978. To avoid confusion, I simply left this term out of the above flowchart.
AMD64 is the canonical version of x86_64. Intel® 64 came afterwards and is a little bit different than AMD64, but the term x86_64 is used to describe either since they are mostly identical.
A Taiwanese company named VIA Technology created a line of CPUs called the VIA Nano. It is difficult to find detailed information on these CPUs, however they seem to have a certain amount of compatability with x86 and x86_64. Based on what little information I have found about these CPUs, they do appear to stray from the AMD64 and Intel® 64 specifications quite a lot, and therefore the contents of this book may not be accurate for these CPUs.

0.3.2) x86_64 Instruction Encoding Flowchart

The above flowchart is a screenshot from Page 2 of Volume 3 of the AMD64 manual.

0.4) Resources

0.4.1) AMD64 Manual

First and foremost is the AMD64 manual. This manual is very long, and consists of 5 volumes:

Application Programming - Introduces many concepts you will probably want to know for any serious assembly programming.
Systems Programming - Explains how to manage the CPU and do other bootloader/kernel/OS level stuff. There is a lot of black magic here.
General-Purpose and System Instructions - Explains instruction encoding and provides a reference for the base x86_64 instructions and opcodes.
128-Bit and 256-Bit Media Instructions - SIMD. Also AES instructions in Appendix A for some reason.
64-Bit Media and x87 Floating-Point Instructions - Probably worse SIMD and floating points

This is the manual, and should be treated as gospel.

0.4.2) Intel® 64 Manual

Almost as important, the Intel® 64 manual. Anecdotally, this manual is harder to understand than the AMD64 manual.

This manual is divided into 4 volumes:

Basic Architecture - Provides a "basic" overview of the x86_64 architecture. Basic is an extremely subjective term here.
Instruction Reference- Explains instruction encoding and provides a reference for the instructions. Also has a chapter on proprietary instructions.
System Programming Guide - Explains how to manage the CPU and do other bootloader/kernel/OS level stuff. There is also a lot of black magic here.
Model Specific Registers - Important for the voodoo in volume 3.

This is also the manual, and should also be treated as gospel.

0.4.3) Felix Cloutier

Felix Cloutier's website is a very popular and convenient online resource for x86_64 instructions.

Please also read and understand the disclaimer on his website, which states in part "It may be enough to replace the official documentation on your weekend reverse engineering project, but for anything where money is at stake, go get the official and freely available documentation."

This is not the manual, and should not be treated as gospel.

0.4.4) Agner Fog

Agner Fog is a computer scientist who is an associate professor of computer science at the Technical University of Denmark, according to his Wikipedia page.

His website contains (among other things) an entire section dedicated to software optimizations, including writing in assembly. He also provides documents explaining many complex details about CPU micro-architectures as well as provides a list of his own instruction timing benchmarks for Intel, AMD, and VIA CPUs.

This is not the manual, and should not be treated as gospel.

0.4.5) uops.info

uops.info is an online resource documenting various timing-related metrics on CPU instructions that can be used for optimizations. Note that their database only contains Intel CPUs.

This is not the manual, and should not be treated as gospel.

1) x86_64 Environment

1.1) Starting Point

x86_64 CPUs have many different operating modes, and a number of complex "runtime configurations". For this summary to make any sense, we need to start from the beginning.

Many things mentioned in this section have a significant amount of detail that is not really necessary to understand x86_64, and will therefore be left out for the sake of clarity. If you want all of the nuance and details, I will provide various "details" links in this section that give that more detailed information.

When a system is first powered on (details), the CPU starts in what is called "real mode", which means it is effectively operating as the original Intel 8086 CPU created all the way back in 1978. This environment has many capabilities and responsibilities depending on whether the machine is booted in UEFI mode (details) or "Legacy Boot" BIOS Mode (details). While in real mode, the CPU will typically execute a special program called a bootloader (details) to perform various tasks to help set up and configure the system (details). The last task of a bootloader is to load and execute the kernel (details).

At some point during the above process (typically in the bootloader), the CPU will be upgraded to what is called "protected mode". Protected mode is a 32-bit operating mode defined by IA-32, and is commonly referred to as x86. After entering protected mode, it is typical for the bootloader to jump execution to the kernel. Once the kernel begins executing, it will typically not wait long before doing some required setup (details) and upgrade again to the final CPU mode called "long mode".

Long mode is the 64-bit mode that is commonly referred to as x86_64. We will go into detail about long mode later in this chapter.

1.2) x86_64 CPU Modes

1.3) Long Mode

2) Parts of an Instruction

x86_64 instructions may have many different parts to them. The order in which the instruction parts are listed is not at all related to the order in which they appear in the instruction encoding; instead I list the parts in an order that is likely to be easiest to understand. x86_64 is a complex instruction set, so understanding the basics before the hard parts is absolutely vital.

2.1) Opcode

First and foremost is the instruction opcode. The opcode denotes the actual operation that is being requested of the CPU. Instructions will only have a single opcode each, and opcodes will generally be 1 byte in size, however an opcode can be up to 3 bytes long. Some simple examples are:

0x90 is the opcode for NOP, which performs no operation.
0xF3 0x90 is the opcode for PAUSE, which optimizes spin loops.
0x0F 0x05 is the opcode for SYSCALL, which performs a syscall.

Note that some instructions, such as MOV, have many different opcodes depending on what type of move you are wanting. This will be elaborated on more later.

2.2) Prefixes

The next most important part of an instruction is its prefixes. Prefixes will often times modify many parts of an opcode, including changing the types of operands it takes, the sizes of operants, require additional bytes to be encoded into the instruction, etc. The so called "Legacy prefixes" are pretty simple, however the additional prefixes added in x86_64 (REX, VEX, XOP) are quite complex.

All legacy prefixes are 1 byte, as well as the REX prefix. The VEX and XOP may be either 2 or 3 bytes long. A single instruction can have up to 5 prefixes, 4 "legacy prefixes" and one additional REX, VEX, or XOP prefix. These prefixes are explained in further detail in the following sub chapters.

Another important explanation, code examples in this section may contain ----. This is simply imaginary spacing that I add to make bytes line up between different instructions, so the differences stand out more. For example:

0x01 0x02 0x03 0x04 0x06 Imaginary instruction 1

0x02 0x03 0x05 Imaginary Instruction 2

It is difficult to easily look at these 2 lines and easily tell which parts are the same and which parts are different.

0x01 0x02 0x03 0x04 ---- 0x06 Imaginary instruction 1

---- 0x02 0x03 ---- 0x05 ---- Imaginary Instruction 2

It is much eaiser to see the similarities and differences between the two.

2.2.1) Legacy - Operand-Size Override

This legacy prefix is denoted with the byte 0x66. In 64-bit mode, this prefix is used to tell an opcode that is being given 16-bit operands instead of the normal 32-bit operands. Unfortunately, the topic of operand sizing is complicated on x86_64, and there are different mechanisms for specifying different operand sizes. That means that this prefix is only used for this very specific case.

Here is an example of this prefix in action using a MOV opcode 0xB8:

---- 0xB8 0x00 0x00 0x00 0x00 moves the four 0x00 bytes into the eax register.

0x66 0xB8 0x00 0x00 ---- ---- moves the two 0x00 bytes into the ax register.

2.2.2) Legacy - Address-Size Override

This legacy prefix is denoted with the byte 0x67. In 64-bit mode, this prefix is used to denote that a register holding a memory address id 32-bits rather than 64-bits. For example:

---- 0x8A 0x00 is a MOV instruction saying to dereference the memory address stored in rax and store the result in al.

0x67 0x8A 0x00 is a MOV instruction saying to dereference the memory address stored in eax and store the result in al.

I'm not sure how this would be useful in x86_64, and from what little I can find about this prefix, its use is discouraged.

2.2.3) Legacy - Segment-Override

There are a total of 6 different segments, each of which have a different prefix byte:

0x2E is the prefix byte for the CS segment
0x3E is the prefix byte for the DS segment
0x26 is the prefix byte for the ES segment
0x64 is the prefix byte for the FS segment
0x65 is the prefix byte for the GS segment
0x36 is the prefix byte for the SS segment

There is actually a lot of misunderstanding about segments and 64-bit mode. According to the Linux kernel documentation, the FS segment is generally used for Thread Local Storage, and the GS segment is free for the application to use as it pleases.

The other 4 segments are not available in 64-bit mode unless a CPU feature called "Upper Address Ignore" is enabled. This CPU feature is documented in the The AMD64 Architecture Programmer's Manual Volume 2 Chapter 5 Section 10. It is not very likely that this CPU feature will be enabled unless you enable it yourself, so the CS, DS, ES, and SS segments will likely not be available in x86_64 unless you explicitly allow them.

Here is an example regarding the FS segment:

For the sake of the example, say that there is a variable named var that exists at memory address 0 relative to the FS segment.

---- 0x8B 0x04 0x25 0x00 0x00 0x00 0x00 is the instruction that loads the memory address of var into eax.

0x64 0x8B 0x04 0x25 0x00 0x00 0x00 0x00 is the instruction that loads the memory address of var relative to the FS segment into eax.

2.2.4) Legacy - Lock

This legacy prefix is denoted with the byte 0xF0. In 64-bit mode, this prefix is used to atomically change values in memory, which is important for lock-free algorithms. Note that because this prefix only modifies values in memory, it can only be used with instructions that operate on memory.

Here is an example of the lock prefix:

For the purposes of this example, iassume that RAX holds a memory address to a variable that is going to be incremented.

---- 0xFF 0x00 is the instruction that uses the INC opcode on the value pointed to by the memory address held in RAX.

0xF0 0xFF 0x00 is the instruction that atomically uses the INC opcode on the value pointed to by the memory address held in RAX.

2.2.5) Legacy - Repeat

This legacy prefix has 3 variants:

0xF3 is the prefix byte for REP, which repeats a string operation until RCX is 0.
0xF3 is the also prefix byte for REPE/REPZ, which repeats a string operation until RCX is 0 or the zero flag is 0.
0xF2 is the prefix byte for REPNE/REPNZ, which repeats a string operation until RCX is 0 or the zero flag is 1.

The REP prefix is used for string operations that involve some kind of memory copying, while the other prefixes are used for memory comparison instructions. An example:

---- 0xA4 is the instruction that uses the MOVSB opcode to copy a single byte from the memory address stored in RSI to the memory address stored in RDI, then increments RSI and RDI.

0xF3 0xA4 is the instruction that uses the MOVSB opcode to copy a single byte from the memory address stored in RSI to the memory address stored in RDI, then increments RSI and RDI. Bceause of the REP prefix, RCX is then decremented. This process is repeated until RCX becomes 0.

Another example with REPNE:

---- 0xA6 is the instruction that uses the CMPSB opcode to compare a single byte from the the memory addresses stored in RSI and RDI, then increments RSI and RDI, then sets the zero flag to 1 if the bytes are equal or 0 if not.

0xF2 0xA6 is the instruction that uses the CMPSB opcode to compare a single byte from the the memory addresses stored in RSI and RDI, then increments RSI and RDI, then sets the zero flag to 1 if the bytes are equal or 0 if not. Because of the REPNE prefix, RCX is then decremented. This operation repeats until RCX is 0 or the zero flag is 1.

2.2.6) REX

Based on many factors, I am thinking that the REX prefix is where most people throw their hands up and go "x86_64 is too hard, I give up", and this is for good reason! The REX prefix actually does many different things packaged together in a single instruction, and requires understanding a handful of things that have not yet been explained.

Because of the complexity and further prerequisite knowledge required to understand th REX prefix, we will actually go into it in much greater detail later. For now, jsut know that this prefix exists, and generally enables opcodes to work with 64-bit registers and values.

1.2.7) VEX and XOP

These prefixes are also quite complex, so these prefixes will also be documented later. Generally speaking, they are used for SIMD instructions, such as SSE and AVX.

2.3) ModRM

The ModRM byte is optional and depends on the opcode.

2.4) SIB

The SIB byte stands for "Scale, Index, Base". If you've done assembly programming in AT&T syntax, this will probably sound familiar to you.