Recommended reading:
William Stallings: Operating Systems. Internals and Design Principles. Edinburgh: Pearson, 2018.
Yosifovich, Paul – Ionescu, Alex – Russinovich, Mark E. – Solomon, David A.:
Windows Internals. 7th ed. Part 1. System architecture, processes, threads, memory management, and more. Pearson Education, 2017.
Micskei Zoltán: The Windows operating system. Budapest: Budapest University of Technology and Economics, 2014.
Overview of Windows Components (2024-03-26)
Wikipedia, selected entries. (2024-03-04)
Operating system definitions
What is an operating system?
Most computer users have had some experience with an operating system, but it is difficult to pin down precisely what an operating system is. Part of the problem is that operating systems perform two basically unrelated functions:
– extending the machine, that is, providing application programmers (as well as application programs) a clean abstract set of resources instead of the messy hardware ones, and
– managing hardware resources.
Depending on who is doing the talking, you hear mostly about one function or the other. Let us now look at both.
(1) The Operating System as an Extended Machine
As mentioned earlier, the architecture (instruction set, memory organization, I/O, and bus structure) of most computers at the machine language level is primitive and awkward to program, especially for input/output.
In this view, the function of the operating system is to present the user with the equivalent of an extended machine or virtual machine that is easier to program than the underlying hardware.
(2) The Operating System as a Resource Manager
The concept of the operating system as primarily providing its users with a convenient interface is a top-down view. An alternative, bottom-up, view holds that the operating system is there to manage all the pieces of a complex system (i.e. the hardware and the software). Modern computers consist of processors, memories, timers, disks, mice, network interfaces, printers, and a wide variety of other devices. In the alternative view, the job of the operating system is to provide for an orderly and controlled allocation of the processors, memories, and I/O devices among the various programs competing for them.
(cf. A. S. Tanenbaum A. S. Modern Operating Systems. Prentice Hall, 2007.; Tanenbaum – H. Bos: Modern Operating Systems. Pearson, 2023.)
An operating system (OS) is system software that manages computer hardware and software resources, and provides common services for computer programs. (Wikipedia)
An operating system is software that manages a computer's hardware. It also provides a basis for application programs and acts as an intermediary between the computer user and the computer hardware. (A. Silberschatz – P. B. Galvin – G. Gagne: Operating Sustem Concepts. Wiley, 2018.)
provides a basis for application programs = provides common services for the application programs and manages them (i.e. starting, running, interrupting, stopping etc. the programs, allocating the necessary hardware resources for them, providing a communication mechanism between them etc.)
An OS is a program that controls the execution of application programs, and acts as an interface between applications and the computer hardware.
It can be thought of as having three objectives:
• Convenience: An OS makes a computer more convenient to use. It acts as a user/computer interface.
• Efficiency: An OS allows the computer system resources to be used in an efficient manner. It acts as a resource manager.
• Ability to evolve: An OS should be constructed in such a way as to permit the effective development, testing, and introduction of new system functions without interfering with service. A major OS will evolve over time for a number of reasons (e.g. new types of hardware, new services, faults ("bugs") and fixes etc.).
(cf. W. Stallings: Operating Systems. Pearson, 2018.)An operating system (OS) exploits the hardware resources of one or more processors to provide a set of services to system users. The OS also manages secondary memory and I/O (input/output) devices on behalf of its users. (W. Stallings: Operating Systems. Pearson, 2018.)
An operating system is the software that controls the working of the hardware resources and all the other software.
The operating system controls how all software applications work on the computer. Among other things, it is responsible for
– allowing file management (e.g. saving, copying, renaming and deleting files);
– implementing multitasking (allowing more than one program to run at the same time);
– creating multi-user environment (e.g. allowing more than one user on a computer network to access the same file at the same time);
– providing security (e.g. in multi-user environment, allowing only those with the correct password to use the computer).
Imagine an operating system as a building block to which all other blocks (i.e. software applications) have to be added.
The operating system also sets the rules for controlling hardware resources such as
– peripherals (controlling peripheral devices such as monitors, keyboards, printers etc.);
– memory (controlling the amount of memory used);
– CPU (controlling the time allocated to a task during which it is executed by the central processing unit);
– disk space (controlling the amount of disk space used).
(cf. T. Roderick – G. Rushbrook: ICT for GCSE. Oxford Univ. Press, 2002.)
Some examples of well-known operating systems
- [personal computers: desktops, laptops etc.]
- Microsoft Windows⇒
- ...
- windows XP
- windows 7
- windows 8
- windows 10
- Windows 11
- ...
- Apple macOS
- Linux
- [mobiles: smartphones, tablets etc.]
- Google Android
- Apple iOS
Basic functions of the operating system
Based on the above definitions, let us summarize the basic functions of the operating system (OS).
The operating system is a system software that
- manages hardware and software resources
- checking and testing the available hardware resources
- controls the execution of application programs (i.e. starting, running, interrupting, resuming, stopping etc. them)
Note that it is the main function of the OS.- implements multitasking; performs process management
- allocates the hardware resources (e.g. processors, memories, I/O devices etc.) to the various processes competing for them
- manages secondary memory and I/O devices to provide virtual memory (to increase the amount of main memory physically available)
- shares the CPU time between the running processes so that the processor(s) can be used in an efficient manner
- detects and handles asynchronous events (e.g. detecting an interrupt request from a hardware device and performing an interrupt of the current process)
- synchronizes the execution of concurrent processes (e.g. using wait/signal semaphores to temporarily suspend and resume the execution of a process)
- protects the dedicated memory areas allocated to the processes in order to avoid access violation (i.e. illegal access)
- provides communication mechanisms between processes (in order that they can exchange data between each other)
- shares data between the running processes and the peripheral devices (e.g. providing buffers)
- acts as an interface between application programs and the hardware (cf. the OS produces an extended or virtual machine built on the computer hardware)
- provides common services for application programs
- acts as an intermediary between the users and the computer hardware
- creates a multi-user environment
- performs access control
- verifies proper authentication (i.e. verifying the identity of a user or client, e.g. with the use of username and password)
- checks authorization to use the system resources (i.e. checking permissions to perform a specific action)
- provides the users with a convenient interface
- graphical user interface (GUI)
- command-line interface (CMD; "console")
- offers appropriate means for users to perform basic tasks
- system information
- basic settings
- file management (e.g. saving, copying, renaming and deleting files)
- ...
- ensures and fulfills safety and security requirements of operation
- provides a built-in antivirus and security solution (e.g. application security, malware protection, firewall, web security etc.)
- saves critical data automatically
- prevents damage to the system components
- has the ability to evolve and develop; allows regular updates
- permits the introduction of new system functions (e.g. because of new types of hardware, new services etc.)
- detecting, diagnosing and fixing faults ("bugs")
Note that the operating system has several software components including the so-called kernel (which is the most important part of the OS).
Brief overview of computer system hardware (cf. Stallings 2018: 30-32)
In general, a computer consists of a processor, a main memory, and several input-output (I/O) components.
![]()
Single system bus architecture
- The processor or central processing unit (CPU) controls the operation of the computer and performs a few data processing functions. To achieve these purposes, the CPU has a few cooperating components (e.g. some special-purpose and general-purpose registers, the control unit, the arithmetic and logic unit etc.).
Its main purpose is to execute the machine-level instructions of programs.
- Each processor has a dedicated instruction set which contains all the machine-level instructions that a given processor can execute.
- The processor contains some registers for its operation. For example,
- the program counter (PC) or instruction pointer (IP) specifies (holds, "points to") the address of the next instruction of the currently running program to be executed;
- the instruction register (IR) holds the instruction of the currently running program to be decoded and executed.
- The processor has a special unit called execution unit (EU) containing, among others, an arithmetic-logic unit (ALU), a floating-point unit (FPU), some general-purpose registers (e.g. the accumulator, AC or AX) and other, special-purpose registers (e.g. the stack pointer, SS). For examle,
The EU is responsible to execute arithmetic and logic operations.
- some registers of the i8086 processor are illustrated here
.)
- the available registers for the assembly programs in the Intel x86/x64 architecture are summarized here⇒
- Finally, the control unit (CU) of the processor manages (or directs, controls) the overall operation of the processor. It is responsible for
- The main memory stores data and programs, or more precisely, those instructions which make up the currently running programs. The main memory is typically volatile, that is, when the computer is shut down, the contents of the memory are lost. (In contrast with the contents of non-volatile memories which are retained , that is, permanently stored, even when the computer is shut down.)
- A memory module consists of a set of memory cells or memory locations, defined by sequentially numbered physical addresses. Each address refers to a location or a group of locations that contain a sequence of bits that can be interpreted as either a machine-level instruction or data represented by a certain type.
- The processor and the main memory form the central unit of the computer.
- The main function of input-output (I/O) modules is to send, receive, store, display, print etc. data moved, for the most part, between the central unit of the computer and an I/O device. The I/O modules include a great variety of devices, e.g. a monitor, a keyboard, secondary memory devices (e.g. disks), network equipments etc.
- An I/O device usually exchanges data between the central unit of the computer and an (internal or external) buffer memory which temporarily stores data until they can be transferred and processed. A buffer is normally used to accommodate the difference in the rate at which the communicating devices can handle data during the transfer.
- The system bus is responsible for communication among processors, main memory, and I/O modules transferring data, (physical) addresses and control signals.
- In personal computer environment, the system bus is part of the motherboard which also contains the processor, the main memory, and other components (e.g. an interrupt controller, a BIOS or UEFI chip, network adapters, extension slots and cards, USB ports etc.).
The figure above illustrates the logic of the operation of the system bus. The CPU contains some (internal) registers to support data exchange among the CPU, the main memory and the I/O module. These registers and their function are as follows:
- the memory address register (MAR) contains the (physical) address of a specific location in memory for the next read or write operation
- the memory buffer register (MBR) or the memory data register (MDR) contains the data to be written into memory, or receives the data to be read from memory for the next read or write operation
- the I/O address register (I/O AR) specifies the address of a particular I/O device for the next read or write operation
- the I/O buffer register (I/O BR) refers to the data to be written into the I/O device or to be read from the I/O device during the next read or write operation
Execution of instructions (cf. Stallings 2018: 32-35)
A program to be executed by a processor consists of a set of machine-level instructions stored in the memory. In its simplest form, the processing of instructions consists of two basic steps:
– first, the processor reads (or fetches) the instructions from the memory one at a time, and
– second, the processor executes each instruction.
The execution of a program is a repeating process (a cycle or loop) of these two steps: the instruction fetch and the instruction execution. (Note that instruction execution may involve several operations and depends on the nature of the instruction.)The figure below illustrates the instruction cycle:
At the beginning of each instruction cycle, the processor fetches an instruction from memory. In this respect, the program counter (PC) register is of utmost importance: the PC holds the address of the next instruction to be fetched. After the instruction has been fetched, the processor increments the value of the PC so that it will hold the address of the next instruction in the sequence of instructions (i.e. in the program which is currently being executed).
The fetched instruction is loaded into the instruction register (IR). An instruction is normally made up of a combination of an operation code and the specification of the operands that present or refer to the data upon which the operation is to be performed. The operation code of the instruction contains bits that specify the action the processor is to take. The processor (or more specifically, the control unit of the processor) interprets the instruction and performs the required action. In general, these actions fall into four categories:
- Processor-memory: data may be transferred from processor to memory, or from memory to processor.
- Processor-I/O: Data may be transferred to or from a peripheral device by initiating a transfer between the processor and an I/O module.
- Data processing: The processor may perform some arithmetic or logic operation on data.
- Control: An instruction may specify that the sequence of execution should be altered.
The execution of an instruction may involve a certain combination of these actions.
An example of the operation of the fetch-execute cycle (cf. Stallings 2018: 33-35)
Let the memory of a virtual machine be organized with 16-bit length (i.e. word-length) memory cells. Each instruction consists of a 4-bit operation code (opcode) and a 12-bit operand. Note that if the operand contains an address, this allows to directly address a maximum of 212=4096 memory cells.
We shall use four hexadecimal digits to represent the 16-bit (one-word) content of the registers, memory addresses and the content of memory cells. (Note that for the 12-bit long addresses three hexadecimal digits would be enough.) Similarly, we shall use one hexadecimal digit to represent the opcode of each instruction.
In the example we want to add two whole numbers represented by two's complement code. We will use one general-purpose register (AC) and three instructions as follows:
- opcode=1: move data into AC from an address specified by the operand
- opcode=2: move data from AC into a memory cell located in the address specified by the operand
- opcode=5: add the content of the memory cell located in the address specified by the operand to the content of AC; after the addition, the result will be stored in AC (i.e. the sum overwrites the previous content of AC)
We assume that the first instruction to be performed is located at the memory address 300 followed sequentially by the further instructions of the program (located at the addresses 301, 302 etc., respectively). Furthermore, we assume that the data that the program manipulates are stored in the memory locations between addresses 940 and 941.
Memory content Address Content (instructions)
0 3 0 0
1 9 4 0
0 3 0 1
5 9 4 1
0 3 0 2
2 9 4 1 (data)
0 9 4 0
0 0 0 3
0 9 4 1
0 0 0 2 In the example we analyze the operation of the fetch-execute cycle. Since the initial value of the program counter (PC) is set to location 300, in the first cycle the processor will fetch the instruction at the memory location 300. On the succeeding instruction cycles, the CPU will fetch instructions from locations 301, 302, and so on. (Note, however, that the sequential execution of instructions can be altered at any time by a certain control instruction.)
In each cycle the fetched instruction is always loaded into the instruction register (IR). The operation code (opcode) of the instruction will specify the necessary action that the processor is to take. After separating the opcode and the operand, the processor interprets the opcode of the instruction and performs the required action.
1st. cycle Storage unit Value Comment Fetch stage PC
0 3 0 0 fetch the instruction from M(300) M(300)
1 9 4 0 load the content of M(300) into IR IR
1 9 4 0 interpret the instruction
- opcode=1: move memory data into AC
- operand=940: the data is located at M(940)
PC
0 3 0 1 increment the value of PC with 1 Execute stage: AC←M(940) or MOV AC,M(0940) M(940)
0 0 0 3 load the content of M(940) into AC AC
0 0 0 3 store the content of M(940) in AC
2nd. cycle Storage unit Value Comment Fetch stage PC
0 3 0 1 fetch the instruction from M(301) M(301)
5 9 4 1 load the content of M(300) into IR IR
5 9 4 1 interpret the instruction
- opcode=5: add memory data to AC
- operand=941: the data to be added is located at M(941)
PC
0 3 0 2 increment the value of PC with 1 Execute stage: AC←AC+M(941) or ADD AC,M(0941) AC
0 0 0 3 add the content of M(941) to AC M(941)
0 0 0 2 AC
0 0 0 5 store the result of the addition in AC
3rd. cycle Storage unit Value Comment Fetch stage PC
0 3 0 2 fetch the instruction from M(302) M(302)
2 9 4 1 load the content of M(302) into IR IR
2 9 4 1 interpret the instruction
- opcode=2: move the content of AC into a memory cell
- operand=941: the memory cell is located at M(941)
PC
0 3 0 3 increment the value of PC with 1 Execute stage: M(941)←AC or MOV M(0941),AC M(941)
0 0 0 2 move the content of AC into M(941) AC
0 0 0 5 M(941)
0 0 0 5 store the content of AC in M(941)
In this example three instruction cycles were needed, each consisting of a fetch stage and an execute stage, to add the contents of the memory location 940 to the contents of the memory location 941.
Implementation of the above example in Windows
I. Setting the environment
To implement the sinple example of adding two integers together we need first a compiler. Download the x64 build of the GCC compiler for Windows from SourceForge:
gcc-win64 download (2025-03-06)
All you need is to
– create a subdirectory or folder named 'temp' in the root directory of the local disk 'c:',
– create a new subdirectory named 'gcc' within the 'temp' folder, and
– select and copy the whole content of the downloaded and compressed file (e.g. gcc-14.2.0-no-debug.7z) into the c:\temp\gcc folder.It seems that in some cases the sinple use of the built-in Windows File Explorer for unzipping and copying the necessary files will not work.⇒ Therefore it is strongly recommended to download and install either the Total Commander in your machine (and use it instead of the File Explorer), or the 7-Zip application. It the latter case you can still use the built-in file manager of the Windows operating system.It is easy to test if the installed 'gcc' compiler works:
- open a new 'cmd' window
- set the default path issuing the following command:
- set PATH=c:\temp\gcc;c:\temp\gcc\bin;%PATH%
- enter the gcc --help command to see whether the compiler works.
Note that from now on we shall always use the c:\temp folder as a default folder to create, compile, edit, modify etc. our files. In order to use the GCC compiler in a 'cmd' window conveniently, let us create a simple setpath.bat file with the 'notepad' text editor. It is to contain only three lines:
@echo off
set PATH=c:\temp\gcc;c:\temp\gcc\bin;%PATH%
PATHWith those steps we created a new command called 'setpath' which adds the path to the location of the gcc.exe compiler (and the necessary libraries) to the default path in the current 'cmd' window environment. Now if we want to use the GCC in a 'cmd' window, first we should run the 'setpath' command once (and only once).
Implementation of the above example in Windows
II.1. Create and compile C files
Table of contents:
- Printing "Hello World!" (hello.c)⇒
- Adding 3+2 (simple.c)⇒
- Creating a batch file to display ERRORLEVEL (err.bat)⇒
- Adding 3+2 with a function (simplef.c)⇒
- Assembly version of 'simplef.c' with quadword-length operands (simplex.s)⇒
- Adding and printing 3+2 (example.c)⇒
- Assembly version of 'example.c' with quadword-length operands (examplex.s)⇒
Printing "Hello World!" (hello.c)
Open a new 'cmd' window in the c:\temp\gcc directory and set the default path running the 'setpath' command (only once). Using the notepad hello.c command, create a new file named 'hello.c' with the following content:
#include <stdio.h> int main() { printf("Hello world!\n"); return 0; }Compile, link and run the C program as follows:
Adding 3+2 (simple.c)
Now let us create another simple C program which implements the former example adding two integers together. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad simple.c command, and create a new file named 'simple.c' with the following content:
#include <stdio.h> int main() { int a=3; int b=2; b=a+b; return 0; }Compile, link and run the C program as follows:
Note that in the 'cmd' window, we can display the returned value of the 'simple.exe' program using the echo %ERRORLEVEL% command.
For the sake of simplicity, let us create a batch file named 'err.bat' using the notepad err.bat command. It is to contain those two lines:
@echo off
echo %ERRORLEVEL%
With that we created a new command called err which will easily display, if entered, the actual value of the ERRORLEVEL environment variable in the 'cmd' window.
It will be instructive for later considerations that using the GCC compiler we can generate easily the assembly code of the 'simple.c' program (as well as any other C programs). For that purpose, we should enter the gcc simple.c -S -o simple.s command in the 'cmd' window.
The generated assembly program is as follows:
.file "simple.c".text.def __main; .scl 2; .type 32; .endef.globl main.def main; .scl 2; .type 32; .endef .seh_proc mainmain: pushq %rbp.seh_pushreg %rbpmovq %rsp, %rbp.seh_setframe %rbp, 0subq $48, %rsp.seh_stackalloc 48 .seh_endprologuecall __mainmovl $3, -4(%rbp) movl $2, -8(%rbp) movl -4(%rbp), %eax addl %eax, -8(%rbp) movl $0, %eax addq $48, %rsp popq %rbp ret.seh_endproc .ident "GCC: (GNU) 13.2.0"The explanation of some important parts of the assembly code:
- the first three bold lines of the 'main' section creates a stack frame⇒ of 48 bytes which is enough to dynamically allocate 12 double-word (i.e. 12*4 byte) length memory space for local variables
- the
movl $3, -4(%rbp)
instruction assigns ("moves") '3' as a 4-byte length double-word ("long") value to the local variable at the address [%rbp-4] (which corresponds to the integer type variable 'a' in the simple.c program)- the
movl $2, -8(%rbp)
instruction assigns ("moves") '2' as a 4-byte length double-word ("long") value to the local variable at the address [%rbp-8] (which corresponds to the integer type variable 'b' in the simple.c program)- the
movl -4(%rbp), %eax
instruction assigns ("moves") the content of the double-word length local variable at the address [%rbp-4] to the second half of the quadword (8-byte) length accumulator register (%eax) (and fills the first, "most significant" half of the accumulator register with leading zeros)- the
addl %eax, -8(%rbp)
instruction
- first adds the content of the second half of the accumulator register (%eax) to the double-word value of the local variable at the address [%rbp-8],
- then stores the sum in the local variable at the address [%rbp-8] (rewriting its previous content)
- the last two lines of the 'main' section before the 'ret' instruction destroys the stack frame of 48 bytes and restores the previous value of the base pointer (%rbp) register
After such considerations, we can easily create the 'simplex.s' assembly program which contains quadword length operands, and returns the sum of the addition (as an ERRORLEVEL value):
.globl main main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movq $3, -8(%rbp) movq $2, -16(%rbp) movq -8(%rbp), %rax addq %rax, -16(%rbp) # the sum of a and b movq -16(%rbp), %rax # return (ERRORLEVEL) value addq $48, %rsp popq %rbp retCompile, link and run the C program as follows:
Adding 3+2 with a function (simplef.c)
The aim of the 'simple.c' program can also be implemented using a function named 'sum' which adds two integers together. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad simplef.c command, and create a new file named 'simplef.c' with the following content:
#include <stdio.h> int sum(int x,int y) { int temp; temp=x+y; return temp; } int main() { int a=3; int b=2; int c; c=sum(a,b); return c; }Set the default path running the 'setpath' command (remember, only once). Compile, link and run the C program as follows:
In the 'cmd' window, we can display again the returned value of the 'simplef.exe' program using the echo %ERRORLEVEL% command (or running the 'err' batch file).
Adding and printing 3+2 (example.c)
It was not easy to check the 'simple.c' or 'simplef.c' programs because there were no visual output in them. So open a new 'cmd' window again in the c:\temp\gcc directory. Using the notepad example.c command, create a new file named 'example.c' with the following content:
#include <stdio.h> int main() { int a=3; int b=2; int c=a+b; printf("%d + %d = %d\n",a,b,c); return 0; }Set the default path running the 'setpath' command. Compile, link and run the C program as follows:
Like before, we can generate easily the assembly code of the 'example.c' program by entering the gcc example.c -S -o example.s command in the 'cmd' window.
The generated assembly program is as follows:.file "example.c".text.def printf; .scl 3; .type 32; .endef .seh_proc printfprintf: pushq %rbp.seh_pushreg %rbppushq %rbx.seh_pushreg %rbxsubq $56, %rsp.seh_stackalloc 56leaq 48(%rsp), %rbp.seh_setframe %rbp, 48.seh_endprologuemovq %rcx, 32(%rbp) # 4th argument stored movq %rdx, 40(%rbp) # 3rd argument stored movq %r8, 48(%rbp) # 5th argument stored movq %r9, 56(%rbp) # 6th argument storedleaq 40(%rbp), %rax movq %rax, -16(%rbp) movq -16(%rbp), %rbx movl $1, %ecx movq __imp___acrt_iob_func(%rip), %rax call *%rax movq %rax, %rcx movq 32(%rbp), %rax movq %rbx, %r8 movq %rax, %rdx call __mingw_vfprintf movl %eax, -4(%rbp) movl -4(%rbp), %eaxaddq $56, %rsp popq %rbx popq %rbp ret.seh_endproc .def __main; .scl 2; .type 32; .endef.section .rdata,"dr".LC0: .ascii "%d + %d = %d\12\0" .text .globl main.def main; .scl 2; .type 32; .endef .seh_proc mainmain: pushq %rbp.seh_pushreg %rbpmovq %rsp, %rbp.seh_setframe %rbp, 0subq $48, %rsp.seh_stackalloc 48 .seh_endprologue call __mainmovl $3, -4(%rbp) movl $2, -8(%rbp) movl -4(%rbp), %edx movl -8(%rbp), %eax addl %edx, %eax movl %eax, -12(%rbp) movl -12(%rbp), %ecx movl -8(%rbp), %edx movl -4(%rbp), %eax movl %ecx, %r9d movl %edx, %r8d movl %eax, %edx leaq .LC0(%rip), %rax movq %rax, %rcx call printf movl $0, %eax addq $48, %rsp popq %rbp ret.seh_endproc .ident "GCC: (GNU) 13.2.0" .def __mingw_vfprintf; .scl 2; .type 32; .endefBased on the compiled program, we can easily create the 'examplex.s' assembly program which contains only quadword length operands. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad examplex.s command, and create a new file named 'examplex.s' with the following content:
.data .msg: .ascii "%d + %d = %d\12\0" .text .globl main main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movq $3, -8(%rbp) # local variable a movq $2, -16(%rbp) # local variable b movq -8(%rbp), %rdx movq -16(%rbp), %rax addq %rdx, %rax movq %rax, -24(%rbp) # local variable c movq -24(%rbp), %rcx movq -16(%rbp), %rdx movq -8(%rbp), %rax movq %rcx, %r9 # 6th argument (var c) movq %rdx, %r8 # 5th argument (var b) movq %rax, %rdx # 3rd argument (var a) leaq .msg, %rax movq %rax, %rcx # 4th argument (pattern .msg) call printf movq $0, %rax addq $48, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Implementation of the above example in Windows
II.2. Create and compile assembly files
Table of contents:
Printing "Hello World!" (asmh.s)
Now let us create a simple program in Intel x86/x64 assembly language which displays the well-known 'Hello world!' message. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad asmh.s command, and create a new file named 'asmh.s' with the following content:
.globl main // definitions of constants and variables .data hello: .ascii "Hello world!\12\0" // program instructions (code) .text main: pushq %rbp movq %rsp, %rbp subq $32, %rsp leaq hello, %rax /* setting the parameter for the function 'printf' */ movq %rax, %rcx # address of 'hello' call printf /* displayed 'Hello world!' */ movl $0, %eax # set ERRORLEVEL value addq $32, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Note that the size of the stack frame is 32 bytes, even though there are no local variables in the program. The four quadwords allocated at the top of the stack frame can be used for the (possible) parameters of the 'printf' function.Setting the ERRORLEVEL (abc.s)
After we have successfully created and compiled the 'asmh.s' program, let us create another simple program in Intel x86/x64 assembly language which does nothing except returns the value 10 as an ERRORLEVEL value.
Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abc.s command, and create a new file named 'abc.s' with the following content:
.globl main main: enter $0, $0 movq $10, %rax leave retNote that the first two instructions of the 'main' function creates a stack frame which, among others, can contain the values of the local variables (if there are any such variables at all). The basic pointer register (%rbp) is used as a reference to point to the address of those local variables (i.e. the local variables can be addressed relatively to the value of the basic pointer).
The 'enter $0, $0' assembly instruction corresponds to the
pushq %rbp
instructions. It creates the stack frame of the function.
movq %rsp, %rbp
Note that e.g. the 'enter $24, $0 instruction would, after the two instructions shown above, allocate in the stack 24 byte memory space by subtracting 24 from the actual value of the stack pointer. Because 6*4=24 holds, this would be enough for six double-word (i.e. 4 byte=32 bit) length local variables (or for three quadword length local variables, respectively).
The 'leave' assembly instruction corresponds to the
movq %rbp, %rsp
instructions. It frees (or destroys) the stack frame of the function.
popq %rbp
The stack is a dedicated and designated part of the memory which can store data according to the current needs of the programs. In this respect, the push and pop instructions are of most importance for adding or removing (as well as retrieving) data to or from the top of the stack. In order to implement and use a stack
– the programs use a dedicated register called stack pointer (%rsp) that always points to the top of the stack by containing the address of the last data item that has been pushed;
– when a data item is pushed into the stack first the stack pointer is decreased by the size of the operand (e.g. by subtracting 8 from the actual value of the stack pointer for a quadword), and then the content of the operand is stored at that address;
– when a data item is popped from the stack first the data from the top of the stack is retrieved from the memory address (and stored in the operand of the 'pop' instruction), and then the stack pointer is increased by the size of the operand (e.g. by adding 8 to the actual value of the stack pointer for a quadword).
The diagram below illustrates the push and pop operations:
![]()
Using the push / pop instructions instead of the enter / leave instructions, we can create another version of the program 'abc.s' as follows:
.globl main main: pushq %rbp movq %rsp, %rbp movq $10, %rax movq %rbp, %rsp popq %rbp retCompile, link and run the assembly program:
Here, like in the case of the 'simple.exe' program⇒ or the 'simplex.exe' program,⇒ we can display the returned value of the 'abc.exe' program using the echo %ERRORLEVEL% command in the 'cmd' window (or we can enter the 'err' command⇒ if the 'err.bat' file exists).
Adding 3+2 (abcs.s)
Now let us create an equivalent of the 'simple.c' program in Intel x86/x64 assembly language which adds two numbers (3 and 2) together as long integer types, stores the sum in another longint variable, and returns the sum as an ERRORLEVEL value. Before that, the program will warn us to check the actual value of the ERRORLEVEL environment variable.
Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abcs.s command, and create a new file named 'abcs.s' with the following content:
.globl main .data hello: .ascii "\12See the ERRORLEVEL value!\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $56, %rsp /* stack frame created */ movq $3, -8(%rbp) movq $2, -16(%rbp) movq -8(%rbp), %rax addq -16(%rbp), %rax movq %rax, -24(%rbp) leaq hello, %rcx call printf movq -24(%rbp), %rax # set ERRORLEVEL value /* stack frame to be destroyed */ addq $56, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Here, like in the case of the 'simple.exe' and 'abc.exe' programs, in the 'cmd' window we can display the returned value of the 'abcs.exe' program using the echo %ERRORLEVEL% or simply the err command. But in this case it returns the sum of the addition 3+2 (i.e. 5).
The size and content of the stack frame needs some explanation. The size of the stack frame is 56 bytes which corresponds to 7 quadwords (i.e. 56=7*8). The structure and content of the stack frame is as follows:
address content 0(%rsp) -56(%rbp) parameters for the function 'printf' 8(%rsp) -48(%rbp) 16(%rsp) -40(%rbp) 24(%rsp) -32(%rbp) -24(%rbp) variable c -16(%rbp) variable b -8(%rbp) variable a 56(%rsp) 0(%rbp) previous value of %rbp (pushed by the first instruction of main) 8(%rbp) return address for the caller of 'main' (for 'ret' in main) The basic pointer (%rbp) register has a special purpose: it points to the bottom of the stack frame of the current function, so local variables can be accessed relative to its value.As for the last row of the table which belongs to the address 8(%rbp) just below the bottom of the stack frame, when the program environment (i.e. the cmd.exe in our case) runs the abcs.exe program, it calls the 'main' global function of the abcs.exe program, and the current value of the instruction pointer is automatically pushed onto the top of the stack. (Thus when the called 'main' function exits and returns, the CPU can continue the execution of the caller program by popping the address of the next instruction to be performed from the stack and loading it into the instruction pointer).
In general, when a specific function of the program is called by another function (from the same or from another program), the return address of the next instruction to be executed after the 'call' instruction is automatically pushed onto the top of the stack.
Note that in the fetch-execute cycle the address of the next instruction is always stored in the %rip instruction pointer or program counter register. Thus the 'call' function, when executed, pushes the current value of the instruction pointer onto the top of the stack. After that the called function (the callee) pushes the value of the basic pointer and creates its stack frame.
The ret instruction is always the last instruction of any function. It "pops" the stored address of the next instruction to be executed from the top of the stack and restores the value the instruction pointer. Then the next fetch-execute cycle will continue the execution of the program immediately after the 'call' instruction.
The called function is named the callee, and the function that calls the callee is named the caller.
The diagram below illustrates the mechanism of the 'call' and the 'ret' (i.e. return) instructions:
![]()
Adding 3+2 with a function (abcf.s)
Let us now create an equivalent of the 'simplef.c' program⇒ in Intel x86/x64 assembly language which adds two numbers (3 and 2) together with a function named 'sum' and returns the sum. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abcf.s command, and create a new file named 'abcf.s' with the following content:
.globl sum sum: pushq %rbp movq %rsp, %rbp subq $16, %rsp movl %ecx, 16(%rbp) movl %edx, 24(%rbp) movl 16(%rbp), %edx movl 24(%rbp), %eax addl %edx, %eax movl %eax, -4(%rbp) movl -4(%rbp), %eax addq $16, %rsp popq %rbp ret .globl main main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $3, -4(%rbp) movl $2, -8(%rbp) movl -8(%rbp), %edx movl -4(%rbp), %eax movl %eax, %ecx call sum movl %eax, -12(%rbp) // movl $0, %eax movl -12(%rbp), %eax addq $48, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Like in the case of the previous programs, in the 'cmd' window we can display the returned value of the 'abcf.exe' program using either the 'err' command or the echo %ERRORLEVEL% command. Now we can see again the sum of the addition 3+2 (i.e. 5).
So far, we used a lot of still unknown registers. Therefore it is high time to have an overview which registers are available for the assembly programs in the Intel x86/x64 architecture. First note, that using the AT&T assembly syntax,
– the 32 bit wide register names are prefixed with the %e characters, and
– the 64 bit wide register names are prefixed with the %r characters.Note that when we declare an int type variable in C, its length will be 32 bit (i.e. it is double-word wide).
In the Intel x86/x64 architecture the list of some important registers are as follows (cf. X86-64 Architecture Guide, 2025-03-11; Assembly 1: Basics, 2025-03-30):
Register Purpose Size Saved across calls General-purpose registers %rax
temp register for arithmetic or logical calculations etc. (called accumulator)
return value of a function64 bit No %eax the lower half of the 8 byte wide %rax register 32 bit %ax the lower half of the 4 byte wide %eax register 16 bit %ah the higher half of the 2 byte wide %ax register 8 bit %al the lower half of the 2 byte wide %ax register 8 bit %rbx callee-saved 64 bit Yes %ebx the lower half of the 8 byte wide %rbx register 32 bit %bx the lower half of the 4 byte wide %ebx register 16 bit %bh the higher half of the 2 byte wide %bx register 8 bit %bl the lower half of the 2 byte wide %bx register 8 bit %rcx used to pass 4th argument to functions 64 bit No %ecx the lower half of the 8 byte wide %rcx register 32 bit %cx the lower half of the 4 byte wide %ecx register 16 bit %ch the higher half of the 2 byte wide %cx register 8 bit %cl the lower half of the 2 byte wide %cx register 8 bit %rdx used to pass 3rd argument to functions 64 bit No %edx the lower half of the 8 byte wide %rdx register 32 bit %dx the lower half of the 4 byte wide %edx register 16 bit %dh the higher half of the 2 byte wide %dx register 8 bit %dl the lower half of the 2 byte wide %dx register 8 bit %rsi used to pass 2nd argument to functions 64 bit No %esi the lower half of the 8 byte wide %rsi register 32 bit %si the lower half of the 4 byte wide %esi register 16 bit %sil the lower half of the 2 byte wide %si register 8 bit %rdi used to pass 1st argument to functions 64 bit No %edi the lower half of the 8 byte wide %rdi register 32 bit %di the lower half of the 4 byte wide %edi register 16 bit %dil the lower half of the 2 byte wide %di register 8 bit %r8
used to pass 5th argument to functions
64 bit No
%r8d the lower half of the 8 byte wide %r8 register 32 bit %r8w the lower half of the 4 byte wide %r8d register 16 bit %r8b the lower half of the 2 byte wide %r8w register 8 bit %r9
used to pass 6th argument to functions
64 bit No
%r9d the lower half of the 8 byte wide %r9 register 32 bit %r9w the lower half of the 4 byte wide %r9d register 16 bit %r9b the lower half of the 2 byte wide %r9w register 8 bit %r10 temporary 64 bit No
%r11 temporary 64 bit No
%r12 callee-saved 64 bit Yes
%r13 callee-saved 64 bit Yes
%r14 callee-saved 64 bit Yes
%r15 callee-saved 64 bit Yes
Special-purpose registers %rsp
stack pointer 64 bit Yes %esp the lower half of the 8 byte wide %rsp register 32 bit %sp the lower half of the 4 byte wide %esp register 16 bit %spl the lower half of the 2 byte wide %sp register 8 bit %rbp
base pointer; callee-saved 64 bit Yes %ebp the lower half of the 8 byte wide %rbp register 32 bit %bp the lower half of the 4 byte wide %ebp register 16 bit %bpl the lower half of the 2 byte wide %bp register 8 bit %rip
instruction pointer or program counter 64 bit (call↔ret) %eip the lower half of the 8 byte wide %rip register 32 bit %ip the lower half of the 4 byte wide %eip register 16 bit %rflags
status or control flags 64 bit No %eflags the lower half of the 8 byte wide %rflags register 32 bit %flags the lower half of the 4 byte wide %eflags register 16 bit The status (or flags) register contains mostly one-bit storage units ("flags") that reflect the current state of an x86/x64 CPU. For example, some flags show some important characteristics of the result of arithmetic or logical operations (including comparisons etc.). Some usual flags are illustrated below within a 64-bit %rflags register:
63 ... 11 ... 7 6 5 4 3 2 1 0 OF SF ZF AF PF CF The flag names are abbreviated as follows:
- CF: Carry Flag (CF=1 when an arithmetic carry has been generated)
- PF: Parity Flag (PF=1 indicates that the number of the 1 bits in the result of the last operation is even; otherwise, i.e. when the number of 1 bits is odd, PF=0)
- AF: Auxiliary Carry Flag (AF=1 when an arithmetic carry has been generated using binary-coded decimal (BCD) arithmetic)
- ZF: Zero Flag (the zero flag is a central feature on most conventional CPU architectures: it is used to check the result of an arithmetic operation, including comparisons; ZF=1 if the result of the operation is zero, otherwise ZF=0)
- SF: Sign Flag or Negative Flag (SF=1 indicates that the result of the last mathematical operation produced value 1 in the most significant bit (MSB) position, i.e. the leftmost or sign bit of the result was set)
- OF: Overflow Bit (OF=1 shows that an overflow has occurred in the last arithmetic operation)
- Note that in two's complement coding, the operation C=A+B produces an overflow if the
(SA∧SB∧⌝ SC) ∨ (⌝ SA∧⌝ SB∧SC)
logical expression is true (where SA is the sign bit of the operand 'A' etc.).Formerly (e.g. in the mainframe age) the program counter and the status register were collectively called PSW (program status word) register.
GNU x86/x64 assembly: Summary and further examples
Table of contents:
- Calculating the first 10 elements of the Fibonacci sequence (fib.s)⇒
- Simple loop examples
Calculating the first 10 elements of the Fibonacci sequence (fib.s)
First, let us see a C program that prints the first 10 elements of the Fibonacci sequence. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fibonacci.c command, and create a new file named 'fibonacci.c' with the following content:
#include <stdio.h> int main() { int k1, k2, i; int n=10; k1=1; k2=1; printf("Finonacci numbers\n"); printf("%d\n",k1); i=2; while(i<=n) { printf("%d\n",k2); int x=k2; k2=k1+k2; k1=x; i++; } return 0; }Compile, link and run the compiled C program as follows:
Let us now create an equivalent of the 'fibonacci.c' program in Intel x86/x64 assembly language which prints the first 10 elements of the Fibonacci sequence using quadwords length variables. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fibonacci.s command, and create a new file named 'fibonacci.s' with the following content:
.globl main .data .P1: .ascii "Finonacci numbers\12\0" .P2: .ascii "%d\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $72, %rsp movq $10, -32(%rbp) # n movq $1, -8(%rbp) # k1 movq $1, -16(%rbp) # k2 leaq .P1, %rax movq %rax, %rcx call printf movq -8(%rbp), %rax movq %rax, %rdx leaq .P2, %rax movq %rax, %rcx call printf movq $2, -24(%rbp) # i jmp .J2 /* begin loop */ .J1: movq -16(%rbp), %rax movq %rax, %rdx leaq .P2, %rax movq %rax, %rcx call printf movq -16(%rbp), %rax movq %rax, -40(%rbp) # x movq -8(%rbp), %rax addq %rax, -16(%rbp) movq -40(%rbp), %rax movq %rax, -8(%rbp) addq $1, -24(%rbp) .J2: movq -24(%rbp), %rax cmpq -32(%rbp), %rax jle .J1 # jump if %rax≤-32(%rbp) /* end of loop */ movq $10, %rax addq $72, %rsp popq %rbp retTo understand the operation of the loop it is essential to know the status register or flags register⇒ that indicates the current state of an x86/x64 CPU, and especially the status flags (e.g. carry, parity, zero, sign, overflow etc. flags) which usually characterize the result of arithmetic operations.
Note that the return value of the 'main' function was now set to 10 (instead of 0) to see the difference from the compiled fibonacci.c program.
Compile, link and run the assembly program as follows:
Finally, it is very instructive to see the machine code representation of the 'fibonacci.s' assembly program. For that purpose, first compile the program into an object file with debugging information entering the
gcc -g -c fibonacci.s -o fibonacci.o
command, and then create the 'memory dump' of the object file typing the
objdump -d -M -S fibonacci.o
command in the 'cmd' window. We shall get something like this:The result of the is as follows:
fibonacci.o: file format pe-x86-64 Disassembly of section .text: 0000000000000000: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 48 sub $0x48,%rsp 8: 48 c7 45 e0 0a 00 00 movq $0xa,-0x20(%rbp) f: 00 10: 48 c7 45 f8 01 00 00 movq $0x1,-0x8(%rbp) 17: 00 18: 48 c7 45 f0 01 00 00 movq $0x1,-0x10(%rbp) 1f: 00 20: 48 8d 04 25 00 00 00 lea 0x0,%rax 27: 00 28: 48 89 c1 mov %rax,%rcx 2b: e8 00 00 00 00 callq 30 30: 48 8b 45 f8 mov -0x8(%rbp),%rax 34: 48 89 c2 mov %rax,%rdx 37: 48 8d 04 25 13 00 00 lea 0x13,%rax 3e: 00 3f: 48 89 c1 mov %rax,%rcx 42: e8 00 00 00 00 callq 47 47: 48 c7 45 e8 02 00 00 movq $0x2,-0x18(%rbp) 4e: 00 4f: eb 34 jmp 85 <.J2> 0000000000000051 <.J1>: 51: 48 8b 45 f0 mov -0x10(%rbp),%rax 55: 48 89 c2 mov %rax,%rdx 58: 48 8d 04 25 13 00 00 lea 0x13,%rax 5f: 00 60: 48 89 c1 mov %rax,%rcx 63: e8 00 00 00 00 callq 68 <.J1+0x17> 68: 48 8b 45 f0 mov -0x10(%rbp),%rax 6c: 48 89 45 d8 mov %rax,-0x28(%rbp) 70: 48 8b 45 f8 mov -0x8(%rbp),%rax 74: 48 01 45 f0 add %rax,-0x10(%rbp) 78: 48 8b 45 d8 mov -0x28(%rbp),%rax 7c: 48 89 45 f8 mov %rax,-0x8(%rbp) 80: 48 83 45 e8 01 addq $0x1,-0x18(%rbp) 0000000000000085 <.J2>: 85: 48 8b 45 e8 mov -0x18(%rbp),%rax 89: 48 3b 45 e0 cmp -0x20(%rbp),%rax 8d: 7e c2 jle 51 <.J1> 8f: 48 c7 c0 0a 00 00 00 mov $0xa,%rax 96: 48 83 c4 48 add $0x48,%rsp 9a: 5d pop %rbp 9b: c3 retq 9c: 90 nop 9d: 90 nop 9e: 90 nop 9f: 90 nop In order to organize a loop in an assembly program we need both comparison and control instructions. Let us review some of them via examples:
- cmp %rsi, %rax, cmp $10, %rax or cmp -24(%rbp), %rax (performs a comparison operation between the two operands subtracting the first op from the second; sets the ZF, SF, PF, CF and OF status flags)
- Example 1 (e.g. for pre-tested loops with a loop control variable named 'i'):
- cmpl -12(%rbp), %eax # i>n ?
- jg .L1
- (these instructions compare the data located in the memory address %rbp−12 ('n') with the content of the accumulator ('i'); if the content of the accumulator is greater than 'n', then a conditional jump occurs to the label .L1)
- Example 2 (e.g. for post-tested loops with a loop control variable named 'i'):
- cmpl -12(%rbp), %eax # i<=n ?
- jle .loop
- (these instructions compare the data located in the memory address %rbp−12 ('n') with the content of the accumulator ('i'); if the content of the accumulator is less or equal than 'n', then a conditional jump occurs to the label .loop)
- test %rax, %rax (sets the ZF, SF and PF status flags; note that this 'test' instruction is equivalent to the cmp $0, %rax instruction)
- Example:
- andl $1,%eax # bitwise 'and' operation is performed
- testl %eax,%eax # %eax==0 ?
- jz .cont
- (these instructions test whether the content of the accumulator is even or not; if 'yes', then a conditional jump occurs to the label .cont)
- jmp .J2 (jumps directly to the instruction labelled by .J2 by setting the content of the %rip register)
- je .L1 or jz .L1 (jumps to the instruction labelled by .L1 if ZF equals 1)
- jne .L2 or jnz .L2 (jumps to the instruction labelled by .L2 if ZF is not equal to 1, i.e. ZF equals 0)
- jl .J2 (jumps to the instruction labelled by .J2 if SF is not equal to OF, i.e. the second op of the previous cmp instruction is less than the first op)
- jle .J2 (jumps to the instruction labelled by .J2 if the second op of the previous cmp instruction is less than or equal to the first op)
- jg .J2 (jumps to the instruction labelled by .J2 if the second operand of the previous cmp instruction is greater than the first operand)
- jge .J2 (jumps to the instruction labelled by .J2 if the second op of the previous cmp instruction is greater than or equal to the first op)
Note that after the operating codes of some isntructions we can use certain suffixes to indicate the length of the operands.
suffix length of operands example(s) -b byte movb $5, -1(%rbp)
movl -1(%rbp), %al-w word
(2 bytes = 16 bits)movw $5, -2(%rbp)
movw -2(%rbp), %ax-l doubleword
(4 bytes = 32 bits)movl $5, -4(%rbp)
movl -4(%rbp), %eax-q quadword
(8 bytes = 64 bits)movq $5, -8(%rbp)
movq -8(%rbp), %rax
leaq pattern, %rcxNote that in the C programming language we can use the following format specifiers of the printf() function: %hi (for short integers), %d or %i (for doubleword integers), %ld (for quadword integers).
Simple examples of using suffixes
(1) Using signed bytes (or characters):
.globl main .data pattern: .ascii "a=%hi, b=%hi, a+b=%hi\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movb $5, -1(%rbp) # a movb $-8, -2(%rbp) # b movb -1(%rbp), %al addb -2(%rbp), %al cbtw movw %ax, %r9w # a+b movb -1(%rbp), %al cbtw movw %ax, %dx # a movb -2(%rbp), %al cbtw movw %ax, %r8w # b leaq pattern, %rcx call printf movq $0, %rax addq $48, %rsp popq %rbp retNote that the assembly instruction cbtw converts the 'al' register containing a signed 8-bit integer value to the word-length 'ax' register.
(2) Using short integers:
.globl main .data pattern: .ascii "a=%hi, b=%hi, a+b=%hi\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movw $5, -2(%rbp) # a movw $-8, -4(%rbp) # b movw -2(%rbp), %ax addw -4(%rbp), %ax movw -2(%rbp), %dx # a movw -4(%rbp), %r8w # b movw %ax, %r9w # a+b leaq pattern, %rcx call printf movq $0, %rax addq $48, %rsp popq %rbp ret(3) Using doubleword integers:
.globl main .data pattern: .ascii "a=%d, b=%d, a+b=%d\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $5, -4(%rbp) # a movl $-8, -8(%rbp) # b movl -4(%rbp), %eax addl -8(%rbp), %eax movl -4(%rbp), %edx # a movl -8(%rbp), %r8d # b movl %eax, %r9d # a+b leaq pattern, %rcx call printf movl $0, %eax addq $48, %rsp popq %rbp ret(4) Using quadword integers:
.globl main .data pattern: .ascii "a=%ld, b=%ld, a+b=%ld\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movq $5, -8(%rbp) # a movq $-8, -16(%rbp) # b movq -8(%rbp), %rax addq -16(%rbp), %rax movq -8(%rbp), %rdx # a movq -16(%rbp), %r8 # b movq %rax, %r9 # a+b leaq pattern, %rcx call printf movq $0, %rax addq $48, %rsp popq %rbp retIt is important to study very carefully in the simple examples presented above the correct use of suffixes for the assembly instructions as well as the format specifiers used in the printf() function.
Listing the first 10 natural numbers
First, let us see a C program that prints the first 10 natural numbers (starting with 1, then 2, 3, 4, ..., 10). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad natural.c command, and create a new file named 'natural.c' with the following content:
#include <stdio.h> int main() { int x=1; int i, n=10; for(i=1;i<=n;i++) { printf("%d\n",x); x++; } return i; }Compile, link and run the compiled C program as follows:
Now it can be very instructive to see the compiled assembly version of the C program. Type and run in the 'cmd' window the gcc natural.c -S -o nat.s command. After making some changes (omitting some parts, commenting some of the instructions etc.), the resulting file will look like something like this:
.data .pattern: .ascii "%d\12\0" .text printf: pushq %rbp pushq %rbx # callee saved subq $56, %rsp leaq 48(%rsp), %rbp /* 56=48+8; pushing %rbx allocates +8 bytes at the stack⇒ */ movq %rcx, 32(%rbp) # 4th argument stored movq %rdx, 40(%rbp) # 3rd argument stored movq %r8, 48(%rbp) # 5th argument stored movq %r9, 56(%rbp) # 6th argument storedleaq 40(%rbp), %rax movq %rax, -16(%rbp) # local variable movq -16(%rbp), %rbx movl $1, %ecx movq __imp___acrt_iob_func(%rip), %rax call *%raxmovq %rax, %rcx movq 32(%rbp), %rax movq %rbx, %r8 movq %rax, %rdx call __mingw_vfprintf movl %eax, -4(%rbp) movl -4(%rbp), %eaxaddq $56, %rsp popq %rbx popq %rbp ret .text .globl main main: pushq %rbp movq %rsp, %rbp subq $48, %rsp # stack frame (48 byte)call __mainmovl $1, -4(%rbp) # variable x movl $10, -12(%rbp) # variable n movl $1, -8(%rbp) # variable i jmp .L4 /* begin of loop */ .L5: movl -4(%rbp), %eax # variable x to print movl %eax, %edx # 3rd parameter for printf leaq .pattern, %rax # address of .pattern movq %rax, %rcx # 4th parameter for printf call printf addl $1, -4(%rbp) # x++ addl $1, -8(%rbp) # i++ .L4: movl -8(%rbp), %eax # i → %eax cmpl -12(%rbp), %eax # %eax≤-12(%rbp) ? // -12(%rbp) is a reference to variable n jle .L5 # jump if i≤n /* end of loop */ movl -8(%rbp), %eax addq $48, %rsp popq %rbp retBefore the 'printf' function is called and the local variables are "declared" (i.e. before the 'jmp .L4' instruction), the content of the stack frame created by the 'main' function is as follows:
address content 0(%rsp) -48(%rbp) allocated space for the four parameters (or arguments) for the function 'printf' 8(%rsp) -40(%rbp) 16(%rsp) -32(%rbp) 24(%rsp) -24(%rbp) (not used) -12(%rbp) variable n (initially n=10) -8(%rbp) variable i (initially i=1) -4(%rbp) variable x (initially x=1) 48(%rsp) 0(%rbp) previous value of %rbp (pushed by the first instruction of main) 8(%rbp) return address for the caller of 'main' (for 'ret' in main) After the 'printf' function is called for the first time by the 'main' function, the content of the stack frame created by the 'printf' function is as follows:
address content 0(%rsp) -48(%rbp) ... ... ... 48(%rsp) 0(%rbp) 56(%rsp) 8(%rbp) previous value of %rbx (pushed by the second instruction of printf) 16(%rbp) previous value of %rbp (pushed by the first instruction of printf) 24(%rbp) return address for the caller of 'printf', i.e. the address of the next instruction of 'main' after the 'call printf' instruction (the following part of the stack is the same space for the arguments (or parameters) of the 'printf' function that has been allocated by the 'main' function, see the top of its stack frame in the table above) 32(%rbp) 4th argument of the function 'printf' 40(%rbp) 3rd argument of the function 'printf' 48(%rbp) 5th argument of the function 'printf' 56(%rbp) 6th argument of the function 'printf' Let us now create an equivalent of the 'natural.c' program in Intel x86/x64 assembly language which prints the first 10 natural numbers. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad natural.s command, and create a new file named 'natural.s' with the following content:
.globl main .data msg: .ascii "The first %d natural numbers:\12\0" pattern: .ascii "%d\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $1, -4(%rbp) # x movl $10, -12(%rbp) # n movl $1, -8(%rbp) # i leaq msg, %rcx movl -12(%rbp), %edx call printf .L0: movl -8(%rbp), %eax cmpl -12(%rbp), %eax # i>n ? jg .L1 leaq pattern, %rcx movl -4(%rbp), %edx call printf incl -4(%rbp) incl -8(%rbp) jmp .L0 .L1: movl -8(%rbp), %eax addq $48, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Listing the first 10 powers of 2
First, let us see a C program that prints the first 10 powers of 2 (starting with 1, then 2, 4, 8 etc.). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad powers.c command, and create a new file named 'powers.c' with the following content:
#include <stdio.h> int nextpow(int x) { int p=x+x; return p; } int main() { int x=1; int i=1, n=10; do { printf("%d\n",x); x=nextpow(x); i++; } while(i<=n); return i; }Compile, link and run the compiled C program as follows:
Let us now create an equivalent of the 'powers.c' program in Intel x86/x64 assembly language which prints the first 10 powers of 2. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad powers.s command, and create a new file named 'powers.s' with the following content:
.globl main .data pattern: .ascii "%d\12\0" .text nextpow: pushq %rbp movq %rsp, %rbp subq $16, %rsp # stack frame size movl %ecx, 16(%rbp) # parameter x movl 16(%rbp), %eax addl %eax, %eax movl %eax, -4(%rbp) # local variable p movl -4(%rbp), %eax addq $16, %rsp popq %rbp ret main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $1, -4(%rbp) # variable x movl $1, -8(%rbp) # variable i movl $10, -12(%rbp) # variable n .loop: movl -4(%rbp), %edx leaq pattern, %rcx call printf movl -4(%rbp), %ecx call nextpow movl %eax, -4(%rbp) addl $1, -8(%rbp) # i++ movl -8(%rbp), %eax cmpl -12(%rbp), %eax # i<=n ? jle .loop movl -8(%rbp), %eax addq $48, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Listing the first 10 factorials
First, let us see a C program that prints the first 10 factorials (starting with 1, then 2, 6, 24 etc.). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fact.c command, and create a new file named 'fact.c' with the following content:
#include <stdio.h> int f(int n) { int temp; temp=1; for(int i=2;i<=n;i++) { temp=temp*i; } return temp; } int main() { int n=10; printf("List of the first %d factorials:\n",n); int i=1; while(i<=n) { printf("%d\n",f(i)); i=i+1; }; return i; }Compile, link and run the compiled C program as follows:
Let us now create an equivalent of the 'fact.c' program in Intel x86/x64 assembly language which prints the first 10 factorials. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fact.s command, and create a new file named 'fact.s' with the following content:
.globl main .globl f .data .LC0: .ascii "List of the first %d factorials:\12\0" .LC1: .ascii "%d\12\0" .text f: pushq %rbp movq %rsp, %rbp subq $16, %rsp movl %ecx, 16(%rbp) movl $1, -4(%rbp) movl $2, -8(%rbp) jmp .L4 .L5: movl -4(%rbp), %eax imull -8(%rbp), %eax movl %eax, -4(%rbp) addl $1, -8(%rbp) .L4: movl -8(%rbp), %eax cmpl 16(%rbp), %eax jle .L5 movl -4(%rbp), %eax addq $16, %rsp popq %rbp ret main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $10, -8(%rbp) movl -8(%rbp), %eax movl %eax, %edx leaq .LC0, %rax movq %rax, %rcx call printf movl $1, -4(%rbp) jmp .L8 .L9: movl -4(%rbp), %eax movl %eax, %ecx call f movl %eax, %edx leaq .LC1, %rax movq %rax, %rcx call printf addl $1, -4(%rbp) .L8: movl -4(%rbp), %eax cmpl -8(%rbp), %eax jle .L9 movl -4(%rbp), %eax addq $48, %rsp popq %rbp retNote that the imul assembly instruction executes a signed multiplication or product of the first operand (which can be either a register or a word-length or doubleword-length memory content) and the second operand (a register), and stores the resulting product in the the register specified as the second operand.
Compile, link and run the assembly program as follows: