Operating systems - Introduction


Recommended reading:
William Stallings: Operating Systems. Internals and Design Principles. Edinburgh: Pearson, 2018.
Yosifovich, Paul – Ionescu, Alex – Russinovich, Mark E. – Solomon, David A.:
Windows Internals. 7th ed. Part 1. System architecture, processes, threads, memory management, and more. Pearson Education, 2017.

Micskei Zoltán: The Windows operating system. Budapest: Budapest University of Technology and Economics, 2014.
Overview of Windows Components (2024-03-26)
Wikipedia, selected entries. (2024-03-04)



Operating system definitions

What is an operating system?

Most computer users have had some experience with an operating system, but it is difficult to pin down precisely what an operating system is. Part of the problem is that operating systems perform two basically unrelated functions:
– extending the machine, that is, providing application programmers (as well as application programs) a clean abstract set of resources instead of the messy hardware ones, and
– managing hardware resources.
Depending on who is doing the talking, you hear mostly about one function or the other. Let us now look at both.
(1) The Operating System as an Extended Machine
    As mentioned earlier, the architecture (instruction set, memory organization, I/O, and bus structure) of most computers at the machine language level is primitive and awkward to program, especially for input/output.
    In this view, the function of the operating system is to present the user with the equivalent of an extended machine or virtual machine that is easier to program than the underlying hardware.
(2) The Operating System as a Resource Manager
    The concept of the operating system as primarily providing its users with a convenient interface is a top-down view. An alternative, bottom-up, view holds that the operating system is there to manage all the pieces of a complex system (i.e. the hardware and the software). Modern computers consist of processors, memories, timers, disks, mice, network interfaces, printers, and a wide variety of other devices. In the alternative view, the job of the operating system is to provide for an orderly and controlled allocation of the processors, memories, and I/O devices among the various programs competing for them.
(cf. A. S. Tanenbaum A. S. Modern Operating Systems. Prentice Hall, 2007.; Tanenbaum – H. Bos: Modern Operating Systems. Pearson, 2023.)


An operating system (OS) is system software that manages computer hardware and software resources, and provides common services for computer programs. (Wikipedia)


An operating system is software that manages a computer's hardware. It also provides a basis for application programs and acts as an intermediary between the computer user and the computer hardware. (A. Silberschatz – P. B. Galvin – G. Gagne: Operating Sustem Concepts. Wiley, 2018.)

provides a basis for application programs = provides common services for the application programs and manages them (i.e. starting, running, interrupting, stopping etc. the programs, allocating the necessary hardware resources for them, providing a communication mechanism between them etc.)


An OS is a program that controls the execution of application programs, and acts as an interface between applications and the computer hardware.
It can be thought of as having three objectives:
   • Convenience: An OS makes a computer more convenient to use. It acts as a user/computer interface.
   • Efficiency: An OS allows the computer system resources to be used in an efficient manner. It acts as a resource manager.
   • Ability to evolve: An OS should be constructed in such a way as to permit the effective development, testing, and introduction of new system functions without interfering with service. A major OS will evolve over time for a number of reasons (e.g. new types of hardware, new services, faults ("bugs") and fixes etc.).
(cf. W. Stallings: Operating Systems. Pearson, 2018.)

An operating system (OS) exploits the hardware resources of one or more processors to provide a set of services to system users. The OS also manages secondary memory and I/O (input/output) devices on behalf of its users. (W. Stallings: Operating Systems. Pearson, 2018.)


An operating system is the software that controls the working of the hardware resources and all the other software.
The operating system controls how all software applications work on the computer. Among other things, it is responsible for
– allowing file management (e.g. saving, copying, renaming and deleting files);
– implementing multitasking (allowing more than one program to run at the same time);
– creating multi-user environment (e.g. allowing more than one user on a computer network to access the same file at the same time);
– providing security (e.g. in multi-user environment, allowing only those with the correct password to use the computer).
Imagine an operating system as a building block to which all other blocks (i.e. software applications) have to be added.
The operating system also sets the rules for controlling hardware resources such as
– peripherals (controlling peripheral devices such as monitors, keyboards, printers etc.);
– memory (controlling the amount of memory used);
– CPU (controlling the time allocated to a task during which it is executed by the central processing unit);
– disk space (controlling the amount of disk space used).
(cf. T. Roderick – G. Rushbrook: ICT for GCSE. Oxford Univ. Press, 2002.)


Some examples of well-known operating systems



Basic functions of the operating system

Based on the above definitions, let us summarize the basic functions of the operating system (OS).

The operating system is a system software that

  • manages hardware and software resources
    • checking and testing the available hardware resources
    • controls the execution of application programs (i.e. starting, running, interrupting, resuming, stopping etc. them)
      Note that it is the main function of the OS.
    • implements multitasking; performs process management
      • allocates the hardware resources (e.g. processors, memories, I/O devices etc.) to the various processes competing for them
        • manages secondary memory and I/O devices to provide virtual memory (to increase the amount of main memory physically available)
      • shares the CPU time between the running processes so that the processor(s) can be used in an efficient manner
        • detects and handles asynchronous events (e.g. detecting an interrupt request from a hardware device and performing an interrupt of the current process)
        • synchronizes the execution of concurrent processes (e.g. using wait/signal semaphores to temporarily suspend and resume the execution of a process)
      • protects the dedicated memory areas allocated to the processes in order to avoid access violation (i.e. illegal access)
      • provides communication mechanisms between processes (in order that they can exchange data between each other)
        • shares data between the running processes and the peripheral devices (e.g. providing buffers)
  • acts as an interface between application programs and the hardware (cf. the OS produces an extended or virtual machine built on the computer hardware)
    • provides common services for application programs
  • acts as an intermediary between the users and the computer hardware
    • creates a multi-user environment
      • performs access control
        • verifies proper authentication (i.e. verifying the identity of a user or client, e.g. with the use of username and password)
        • checks authorization to use the system resources (i.e. checking permissions to perform a specific action)
    • provides the users with a convenient interface
      • graphical user interface (GUI)
      • command-line interface (CMD; "console")
    • offers appropriate means for users to perform basic tasks
      • system information
      • basic settings
      • file management (e.g. saving, copying, renaming and deleting files)
      • ...
  • ensures and fulfills safety and security requirements of operation
    • provides a built-in antivirus and security solution (e.g. application security, malware protection, firewall, web security etc.)
    • saves critical data automatically
    • prevents damage to the system components
  • has the ability to evolve and develop; allows regular updates
    • permits the introduction of new system functions (e.g. because of new types of hardware, new services etc.)
    • detecting, diagnosing and fixing faults ("bugs")

Note that the operating system has several software components including the so-called kernel (which is the most important part of the OS).



Brief overview of computer system hardware (cf. Stallings 2018: 30-32)

In general, a computer consists of a processor, a main memory, and several input-output (I/O) components.

Von Neumann architecture
The Von Neumann architecture

System bus architecture
Single system bus architecture

Computer Components: Top-Level View

The figure above illustrates the logic of the operation of the system bus. The CPU contains some (internal) registers to support data exchange among the CPU, the main memory and the I/O module. These registers and their function are as follows:



Execution of instructions (cf. Stallings 2018: 32-35)

A program to be executed by a processor consists of a set of machine-level instructions stored in the memory. In its simplest form, the processing of instructions consists of two basic steps:
– first, the processor reads (or fetches) the instructions from the memory one at a time, and
– second, the processor executes each instruction.
The execution of a program is a repeating process (a cycle or loop) of these two steps: the instruction fetch and the instruction execution. (Note that instruction execution may involve several operations and depends on the nature of the instruction.)

The figure below illustrates the instruction cycle:

Basic Instruction Cycle

At the beginning of each instruction cycle, the processor fetches an instruction from memory. In this respect, the program counter (PC) register is of utmost importance: the PC holds the address of the next instruction to be fetched. After the instruction has been fetched, the processor increments the value of the PC so that it will hold the address of the next instruction in the sequence of instructions (i.e. in the program which is currently being executed).

The fetched instruction is loaded into the instruction register (IR). An instruction is normally made up of a combination of an operation code and the specification of the operands that present or refer to the data upon which the operation is to be performed. The operation code of the instruction contains bits that specify the action the processor is to take. The processor (or more specifically, the control unit of the processor) interprets the instruction and performs the required action. In general, these actions fall into four categories:

The execution of an instruction may involve a certain combination of these actions.



An example of the operation of the fetch-execute cycle (cf. Stallings 2018: 33-35)

Let the memory of a virtual machine be organized with 16-bit length (i.e. word-length) memory cells. Each instruction consists of a 4-bit operation code (opcode) and a 12-bit operand. Note that if the operand contains an address, this allows to directly address a maximum of 212=4096 memory cells.

We shall use four hexadecimal digits to represent the 16-bit (one-word) content of the registers, memory addresses and the content of memory cells. (Note that for the 12-bit long addresses three hexadecimal digits would be enough.) Similarly, we shall use one hexadecimal digit to represent the opcode of each instruction.

In the example we want to add two whole numbers represented by two's complement code. We will use one general-purpose register (AC) and three instructions as follows:

We assume that the first instruction to be performed is located at the memory address 300 followed sequentially by the further instructions of the program (located at the addresses 301, 302 etc., respectively). Furthermore, we assume that the data that the program manipulates are stored in the memory locations between addresses 940 and 941.

Memory content
Address Content
(instructions)
0 3 0 0
1 9 4 0
0 3 0 1
5 9 4 1
0 3 0 2
2 9 4 1
(data)
0 9 4 0
0 0 0 3
0 9 4 1
0 0 0 2

In the example we analyze the operation of the fetch-execute cycle. Since the initial value of the program counter (PC) is set to location 300, in the first cycle the processor will fetch the instruction at the memory location 300. On the succeeding instruction cycles, the CPU will fetch instructions from locations 301, 302, and so on. (Note, however, that the sequential execution of instructions can be altered at any time by a certain control instruction.)

In each cycle the fetched instruction is always loaded into the instruction register (IR). The operation code (opcode) of the instruction will specify the necessary action that the processor is to take. After separating the opcode and the operand, the processor interprets the opcode of the instruction and performs the required action.


1st. cycle
Storage unit Value Comment
Fetch stage
PC
0 3 0 0
fetch the instruction from M(300)
M(300)
1 9 4 0
load the content of M(300) into IR
IR
1 9 4 0
interpret the instruction
  • opcode=1: move memory data into AC
  • operand=940: the data is located at M(940)
PC
0 3 0 1
increment the value of PC with 1
Execute stage: AC←M(940) or MOV AC,M(0940)
M(940)
0 0 0 3
load the content of M(940) into AC
AC
0 0 0 3
store the content of M(940) in AC

2nd. cycle
Storage unit Value Comment
Fetch stage
PC
0 3 0 1
fetch the instruction from M(301)
M(301)
5 9 4 1
load the content of M(300) into IR
IR
5 9 4 1
interpret the instruction
  • opcode=5: add memory data to AC
  • operand=941: the data to be added is located at M(941)
PC
0 3 0 2
increment the value of PC with 1
Execute stage: AC←AC+M(941) or ADD AC,M(0941)
AC
0 0 0 3
add the content of M(941) to AC
M(941)
0 0 0 2
AC
0 0 0 5
store the result of the addition in AC

3rd. cycle
Storage unit Value Comment
Fetch stage
PC
0 3 0 2
fetch the instruction from M(302)
M(302)
2 9 4 1
load the content of M(302) into IR
IR
2 9 4 1
interpret the instruction
  • opcode=2: move the content of AC into a memory cell
  • operand=941: the memory cell is located at M(941)
PC
0 3 0 3
increment the value of PC with 1
Execute stage: M(941)←AC or MOV M(0941),AC
M(941)
0 0 0 2
move the content of AC into M(941)
AC
0 0 0 5
M(941)
0 0 0 5
store the content of AC in M(941)

In this example three instruction cycles were needed, each consisting of a fetch stage and an execute stage, to add the contents of the memory location 940 to the contents of the memory location 941.



Implementation of the above example in Windows
I. Setting the environment

To implement the sinple example of adding two integers together we need first a compiler. Download the x64 build of the GCC compiler for Windows from SourceForge:
gcc-win64 download (2025-03-06)
All you need is to
– create a subdirectory or folder named 'temp' in the root directory of the local disk 'c:',
– create a new subdirectory named 'gcc' within the 'temp' folder, and
– select and copy the whole content of the downloaded and compressed file (e.g. gcc-14.2.0-no-debug.7z) into the c:\temp\gcc folder.

It seems that in some cases the sinple use of the built-in Windows File Explorer for unzipping and copying the necessary files will not work. Therefore it is strongly recommended to download and install either the Total Commander in your machine (and use it instead of the File Explorer), or the 7-Zip application. It the latter case you can still use the built-in file manager of the Windows operating system.

It is easy to test if the installed 'gcc' compiler works:

  1. open a new 'cmd' window
  2. set the default path issuing the following command:
    • set PATH=c:\temp\gcc;c:\temp\gcc\bin;%PATH%
  3. enter the gcc --help command to see whether the compiler works.

Note that from now on we shall always use the c:\temp folder as a default folder to create, compile, edit, modify etc. our files. In order to use the GCC compiler in a 'cmd' window conveniently, let us create a simple setpath.bat file with the 'notepad' text editor. It is to contain only three lines:

@echo off
set PATH=c:\temp\gcc;c:\temp\gcc\bin;%PATH%
PATH

With those steps we created a new command called 'setpath' which adds the path to the location of the gcc.exe compiler (and the necessary libraries) to the default path in the current 'cmd' window environment. Now if we want to use the GCC in a 'cmd' window, first we should run the 'setpath' command once (and only once).



Implementation of the above example in Windows
II.1. Create and compile C files

Table of contents:

  • Printing "Hello World!" (hello.c)
  • Adding 3+2 (simple.c)
    • Creating a batch file to display ERRORLEVEL (err.bat)
  • Adding 3+2 with a function (simplef.c)
    • Assembly version of 'simplef.c' with quadword-length operands (simplex.s)
  • Adding and printing 3+2 (example.c)
    • Assembly version of 'example.c' with quadword-length operands (examplex.s)

Printing "Hello World!" (hello.c)

Open a new 'cmd' window in the c:\temp\gcc directory and set the default path running the 'setpath' command (only once). Using the notepad hello.c command, create a new file named 'hello.c' with the following content:

#include <stdio.h>

int main() {
 printf("Hello world!\n");
 return 0;
 }

Compile, link and run the C program as follows:

Compile and run hello.c using GCC

Adding 3+2 (simple.c)

Now let us create another simple C program which implements the former example adding two integers together. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad simple.c command, and create a new file named 'simple.c' with the following content:

#include <stdio.h>

int main() {
 int a=3;
 int b=2;
 b=a+b;
 return 0;
 }

Compile, link and run the C program as follows:

Compile and run simple.c using GCC

Note that in the 'cmd' window, we can display the returned value of the 'simple.exe' program using the echo %ERRORLEVEL% command.

For the sake of simplicity, let us create a batch file named 'err.bat' using the notepad err.bat command. It is to contain those two lines:

@echo off
echo %ERRORLEVEL%

With that we created a new command called err which will easily display, if entered, the actual value of the ERRORLEVEL environment variable in the 'cmd' window.

It will be instructive for later considerations that using the GCC compiler we can generate easily the assembly code of the 'simple.c' program (as well as any other C programs). For that purpose, we should enter the gcc simple.c -S -o simple.s command in the 'cmd' window.

Compile the simple.c program into assembly code using GCC

The generated assembly program is as follows:

	.file	"simple.c"
	.text
	.def	__main;
		.scl	2;
		.type	32;
	.endef
	.globl	main
	.def	main;
		.scl	2;
		.type	32;
	.endef
	.seh_proc	main
main:
	pushq	%rbp
	.seh_pushreg	%rbp
	movq	%rsp, %rbp
	.seh_setframe	%rbp, 0
	subq	$48, %rsp
	.seh_stackalloc	48
	.seh_endprologue
	call	__main
	movl	$3, -4(%rbp)
	movl	$2, -8(%rbp)
	movl	-4(%rbp), %eax
	addl	%eax, -8(%rbp)
	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret
	.seh_endproc
	.ident	"GCC: (GNU) 13.2.0"

The explanation of some important parts of the assembly code:

After such considerations, we can easily create the 'simplex.s' assembly program which contains quadword length operands, and returns the sum of the addition (as an ERRORLEVEL value):

.globl	main

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp
	movq	$3, -8(%rbp)
	movq	$2, -16(%rbp)
	movq	-8(%rbp), %rax
	addq	%rax, -16(%rbp)	# the sum of a and b
	movq	-16(%rbp), %rax	# return (ERRORLEVEL) value
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the C program as follows:

Compile and run simplex.s using GCC

Adding 3+2 with a function (simplef.c)

The aim of the 'simple.c' program can also be implemented using a function named 'sum' which adds two integers together. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad simplef.c command, and create a new file named 'simplef.c' with the following content:

#include <stdio.h>

int sum(int x,int y) {
 int temp;
 temp=x+y;
 return temp;
 }

int main() {
 int a=3;
 int b=2;
 int c;
 c=sum(a,b);
 return c;
 }

Set the default path running the 'setpath' command (remember, only once). Compile, link and run the C program as follows:

Compile and run simplef.c using GCC

In the 'cmd' window, we can display again the returned value of the 'simplef.exe' program using the echo %ERRORLEVEL% command (or running the 'err' batch file).

Adding and printing 3+2 (example.c)

It was not easy to check the 'simple.c' or 'simplef.c' programs because there were no visual output in them. So open a new 'cmd' window again in the c:\temp\gcc directory. Using the notepad example.c command, create a new file named 'example.c' with the following content:

#include <stdio.h>

int main() {
 int a=3;
 int b=2;
 int c=a+b;
 printf("%d + %d = %d\n",a,b,c);
 return 0;
 }

Set the default path running the 'setpath' command. Compile, link and run the C program as follows:

Compile and run example.c using GCC

Like before, we can generate easily the assembly code of the 'example.c' program by entering the gcc example.c -S -o example.s command in the 'cmd' window.
The generated assembly program is as follows:

	.file	"example.c"
	.text
	.def	printf;
		.scl	3;
		.type	32;
		.endef
	.seh_proc	printf

printf:
	pushq	%rbp
	.seh_pushreg	%rbp
	pushq	%rbx
	.seh_pushreg	%rbx
	subq	$56, %rsp
	.seh_stackalloc	56
	leaq	48(%rsp), %rbp
	.seh_setframe	%rbp, 48
	.seh_endprologue
	movq	%rcx, 32(%rbp)	# 4th argument stored
	movq	%rdx, 40(%rbp)	# 3rd argument stored
	movq	%r8, 48(%rbp)	# 5th argument stored
	movq	%r9, 56(%rbp)	# 6th argument stored
	leaq	40(%rbp), %rax
	movq	%rax, -16(%rbp)
	movq	-16(%rbp), %rbx
	movl	$1, %ecx
	movq	__imp___acrt_iob_func(%rip), %rax
	call	*%rax
	movq	%rax, %rcx
	movq	32(%rbp), %rax
	movq	%rbx, %r8
	movq	%rax, %rdx
	call	__mingw_vfprintf
	movl	%eax, -4(%rbp)
	movl	-4(%rbp), %eax
	addq	$56, %rsp
	popq	%rbx
	popq	%rbp
	ret
	.seh_endproc
	.def	__main;
		.scl	2;
		.type	32;
		.endef
	.section .rdata,"dr"
.LC0:
	.ascii "%d + %d = %d\12\0"
	.text
	.globl	main
	.def	main;
		.scl	2;
		.type	32;
		.endef
	.seh_proc	main
main:
	pushq	%rbp
	.seh_pushreg	%rbp
	movq	%rsp, %rbp
	.seh_setframe	%rbp, 0
	subq	$48, %rsp
	.seh_stackalloc	48
	.seh_endprologue
	call	__main
	movl	$3, -4(%rbp)
	movl	$2, -8(%rbp)
	movl	-4(%rbp), %edx
	movl	-8(%rbp), %eax
	addl	%edx, %eax
	movl	%eax, -12(%rbp)
	movl	-12(%rbp), %ecx
	movl	-8(%rbp), %edx
	movl	-4(%rbp), %eax
	movl	%ecx, %r9d
	movl	%edx, %r8d
	movl	%eax, %edx
	leaq	.LC0(%rip), %rax
	movq	%rax, %rcx
	call	printf
	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret
	.seh_endproc
	.ident	"GCC: (GNU) 13.2.0"
	.def	__mingw_vfprintf;
		.scl	2;
		.type	32;
		.endef

Based on the compiled program, we can easily create the 'examplex.s' assembly program which contains only quadword length operands. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad examplex.s command, and create a new file named 'examplex.s' with the following content:

.data
.msg:
	.ascii "%d + %d = %d\12\0"

.text
.globl	main
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp
	movq	$3, -8(%rbp)	# local variable a
	movq	$2, -16(%rbp)	# local variable b
	movq	-8(%rbp), %rdx
	movq	-16(%rbp), %rax
	addq	%rdx, %rax
	movq	%rax, -24(%rbp)	# local variable c
	movq	-24(%rbp), %rcx
	movq	-16(%rbp), %rdx
	movq	-8(%rbp), %rax

	movq	%rcx, %r9	# 6th argument (var c)
	movq	%rdx, %r8	# 5th argument (var b)
	movq	%rax, %rdx	# 3rd argument (var a)
	leaq	.msg, %rax
	movq	%rax, %rcx	# 4th argument (pattern .msg)
	call	printf

	movq	$0, %rax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run examplex.s using GCC



Implementation of the above example in Windows
II.2. Create and compile assembly files

Table of contents:

  • Printing "Hello World!" (asmh.s)
  • Setting the ERRORLEVEL (abc.s)
    • A short explanation of the stack (push, pop)
  • Adding 3+2 (abcs.s)
    • The explanation of the stack frame
    • A short explanation of function calls (call, ret)
  • Adding 3+2 with a function (abcf.s)
    • List of important registers in the Intel x86/x64 architecture
    • The flags register

Printing "Hello World!" (asmh.s)

Now let us create a simple program in Intel x86/x64 assembly language which displays the well-known 'Hello world!' message. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad asmh.s command, and create a new file named 'asmh.s' with the following content:

.globl	main

// definitions of constants and variables
.data

hello:
	.ascii "Hello world!\12\0"

// program instructions (code)
.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$32, %rsp
	leaq	hello, %rax

/* setting the parameter for the function 'printf' */
	movq	%rax, %rcx	# address of 'hello'
	call	printf
/* displayed 'Hello world!' */

	movl	$0, %eax	# set ERRORLEVEL value
	addq	$32, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run asmh.s using GCC

Note that the size of the stack frame is 32 bytes, even though there are no local variables in the program. The four quadwords allocated at the top of the stack frame can be used for the (possible) parameters of the 'printf' function.

Setting the ERRORLEVEL (abc.s)

After we have successfully created and compiled the 'asmh.s' program, let us create another simple program in Intel x86/x64 assembly language which does nothing except returns the value 10 as an ERRORLEVEL value.

Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abc.s command, and create a new file named 'abc.s' with the following content:

.globl main

main:
        enter $0, $0
        movq $10, %rax
        leave
        ret

Note that the first two instructions of the 'main' function creates a stack frame which, among others, can contain the values of the local variables (if there are any such variables at all). The basic pointer register (%rbp) is used as a reference to point to the address of those local variables (i.e. the local variables can be addressed relatively to the value of the basic pointer).

The 'enter $0, $0' assembly instruction corresponds to the
   pushq %rbp
   movq %rsp, %rbp
instructions. It creates the stack frame of the function.

Note that e.g. the 'enter $24, $0 instruction would, after the two instructions shown above, allocate in the stack 24 byte memory space by subtracting 24 from the actual value of the stack pointer. Because 6*4=24 holds, this would be enough for six double-word (i.e. 4 byte=32 bit) length local variables (or for three quadword length local variables, respectively).

The 'leave' assembly instruction corresponds to the
   movq %rbp, %rsp
   popq %rbp
instructions. It frees (or destroys) the stack frame of the function.

The stack is a dedicated and designated part of the memory which can store data according to the current needs of the programs. In this respect, the push and pop instructions are of most importance for adding or removing (as well as retrieving) data to or from the top of the stack. In order to implement and use a stack
– the programs use a dedicated register called stack pointer (%rsp) that always points to the top of the stack by containing the address of the last data item that has been pushed;
– when a data item is pushed into the stack first the stack pointer is decreased by the size of the operand (e.g. by subtracting 8 from the actual value of the stack pointer for a quadword), and then the content of the operand is stored at that address;
– when a data item is popped from the stack first the data from the top of the stack is retrieved from the memory address (and stored in the operand of the 'pop' instruction), and then the stack pointer is increased by the size of the operand (e.g. by adding 8 to the actual value of the stack pointer for a quadword).
The diagram below illustrates the push and pop operations:
Illustration of the push and pop operations

Using the push / pop instructions instead of the enter / leave instructions, we can create another version of the program 'abc.s' as follows:

.globl main

main:
        pushq %rbp
        movq %rsp, %rbp
        movq $10, %rax
	movq %rbp, %rsp
	popq %rbp
        ret

Compile, link and run the assembly program:

Compile and run abc.s using GCC

Here, like in the case of the 'simple.exe' program or the 'simplex.exe' program, we can display the returned value of the 'abc.exe' program using the echo %ERRORLEVEL% command in the 'cmd' window (or we can enter the 'err' command if the 'err.bat' file exists).

Adding 3+2 (abcs.s)

Now let us create an equivalent of the 'simple.c' program in Intel x86/x64 assembly language which adds two numbers (3 and 2) together as long integer types, stores the sum in another longint variable, and returns the sum as an ERRORLEVEL value. Before that, the program will warn us to check the actual value of the ERRORLEVEL environment variable.

Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abcs.s command, and create a new file named 'abcs.s' with the following content:

.globl	main

.data 
hello:
	.ascii "\12See the ERRORLEVEL value!\12\0"

.text
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$56, %rsp
	/* stack frame created */

	movq	$3, -8(%rbp)
	movq	$2, -16(%rbp)
	movq	-8(%rbp), %rax
	addq	-16(%rbp), %rax 
	movq	%rax, -24(%rbp)

	leaq	hello, %rcx
	call	printf

	movq	-24(%rbp), %rax	# set ERRORLEVEL value

	/* stack frame to be destroyed */
	addq	$56, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run abcs.c using GCC

Here, like in the case of the 'simple.exe' and 'abc.exe' programs, in the 'cmd' window we can display the returned value of the 'abcs.exe' program using the echo %ERRORLEVEL% or simply the err command. But in this case it returns the sum of the addition 3+2 (i.e. 5).

The size and content of the stack frame needs some explanation. The size of the stack frame is 56 bytes which corresponds to 7 quadwords (i.e. 56=7*8). The structure and content of the stack frame is as follows:

address content
0(%rsp) -56(%rbp) parameters for the function 'printf'
8(%rsp) -48(%rbp)
16(%rsp) -40(%rbp)
24(%rsp) -32(%rbp)
-24(%rbp) variable c
-16(%rbp) variable b
-8(%rbp) variable a
56(%rsp) 0(%rbp) previous value of %rbp (pushed by the first instruction of main)
8(%rbp) return address for the caller of 'main' (for 'ret' in main)
The basic pointer (%rbp) register has a special purpose: it points to the bottom of the stack frame of the current function, so local variables can be accessed relative to its value.

As for the last row of the table which belongs to the address 8(%rbp) just below the bottom of the stack frame, when the program environment (i.e. the cmd.exe in our case) runs the abcs.exe program, it calls the 'main' global function of the abcs.exe program, and the current value of the instruction pointer is automatically pushed onto the top of the stack. (Thus when the called 'main' function exits and returns, the CPU can continue the execution of the caller program by popping the address of the next instruction to be performed from the stack and loading it into the instruction pointer).

In general, when a specific function of the program is called by another function (from the same or from another program), the return address of the next instruction to be executed after the 'call' instruction is automatically pushed onto the top of the stack.

Note that in the fetch-execute cycle the address of the next instruction is always stored in the %rip instruction pointer or program counter register. Thus the 'call' function, when executed, pushes the current value of the instruction pointer onto the top of the stack. After that the called function (the callee) pushes the value of the basic pointer and creates its stack frame.

The ret instruction is always the last instruction of any function. It "pops" the stored address of the next instruction to be executed from the top of the stack and restores the value the instruction pointer. Then the next fetch-execute cycle will continue the execution of the program immediately after the 'call' instruction.

The called function is named the callee, and the function that calls the callee is named the caller.

The diagram below illustrates the mechanism of the 'call' and the 'ret' (i.e. return) instructions:
Illustration of the call and ret instructions

Adding 3+2 with a function (abcf.s)

Let us now create an equivalent of the 'simplef.c' program in Intel x86/x64 assembly language which adds two numbers (3 and 2) together with a function named 'sum' and returns the sum. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abcf.s command, and create a new file named 'abcf.s' with the following content:

.globl	sum

sum:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$16, %rsp
	movl	%ecx, 16(%rbp)
	movl	%edx, 24(%rbp)
	movl	16(%rbp), %edx
	movl	24(%rbp), %eax
	addl	%edx, %eax
	movl	%eax, -4(%rbp)
	movl	-4(%rbp), %eax
	addq	$16, %rsp
	popq	%rbp
	ret

.globl	main

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp
	movl	$3, -4(%rbp)
	movl	$2, -8(%rbp)
	movl	-8(%rbp), %edx
	movl	-4(%rbp), %eax
	movl	%eax, %ecx
	call	sum
	movl	%eax, -12(%rbp)
	// movl	$0, %eax
	movl	-12(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run abcf.s using GCC

Like in the case of the previous programs, in the 'cmd' window we can display the returned value of the 'abcf.exe' program using either the 'err' command or the echo %ERRORLEVEL% command. Now we can see again the sum of the addition 3+2 (i.e. 5).


So far, we used a lot of still unknown registers. Therefore it is high time to have an overview which registers are available for the assembly programs in the Intel x86/x64 architecture. First note, that using the AT&T assembly syntax,
– the 32 bit wide register names are prefixed with the %e characters, and
– the 64 bit wide register names are prefixed with the %r characters.

Note that when we declare an int type variable in C, its length will be 32 bit (i.e. it is double-word wide).

In the Intel x86/x64 architecture the list of some important registers are as follows (cf. X86-64 Architecture Guide, 2025-03-11; Assembly 1: Basics, 2025-03-30):

Register Purpose Size Saved across calls
General-purpose registers
%rax temp register for arithmetic or logical calculations etc. (called accumulator)
return value of a function
64 bit No
%eax the lower half of the 8 byte wide %rax register 32 bit
%ax the lower half of the 4 byte wide %eax register 16 bit
%ah the higher half of the 2 byte wide %ax register 8 bit
%al the lower half of the 2 byte wide %ax register 8 bit
%rbx callee-saved 64 bit Yes
%ebx the lower half of the 8 byte wide %rbx register 32 bit
%bx the lower half of the 4 byte wide %ebx register 16 bit
%bh the higher half of the 2 byte wide %bx register 8 bit
%bl the lower half of the 2 byte wide %bx register 8 bit
%rcx used to pass 4th argument to functions 64 bit No
%ecx the lower half of the 8 byte wide %rcx register 32 bit
%cx the lower half of the 4 byte wide %ecx register 16 bit
%ch the higher half of the 2 byte wide %cx register 8 bit
%cl the lower half of the 2 byte wide %cx register 8 bit
%rdx used to pass 3rd argument to functions 64 bit No
%edx the lower half of the 8 byte wide %rdx register 32 bit
%dx the lower half of the 4 byte wide %edx register 16 bit
%dh the higher half of the 2 byte wide %dx register 8 bit
%dl the lower half of the 2 byte wide %dx register 8 bit
%rsi used to pass 2nd argument to functions 64 bit No
%esi the lower half of the 8 byte wide %rsi register 32 bit
%si the lower half of the 4 byte wide %esi register 16 bit
%sil the lower half of the 2 byte wide %si register 8 bit
%rdi used to pass 1st argument to functions 64 bit No
%edi the lower half of the 8 byte wide %rdi register 32 bit
%di the lower half of the 4 byte wide %edi register 16 bit
%dil the lower half of the 2 byte wide %di register 8 bit
%r8
used to pass 5th argument to functions
64 bit No
%r8d the lower half of the 8 byte wide %r8 register 32 bit
%r8w the lower half of the 4 byte wide %r8d register 16 bit
%r8b the lower half of the 2 byte wide %r8w register 8 bit
%r9
used to pass 6th argument to functions
64 bit No
%r9d the lower half of the 8 byte wide %r9 register 32 bit
%r9w the lower half of the 4 byte wide %r9d register 16 bit
%r9b the lower half of the 2 byte wide %r9w register 8 bit
%r10 temporary 64 bit No
%r11 temporary 64 bit No
%r12 callee-saved 64 bit Yes
%r13 callee-saved 64 bit Yes
%r14 callee-saved 64 bit Yes
%r15 callee-saved 64 bit Yes
Special-purpose registers
%rsp stack pointer 64 bit Yes
%esp the lower half of the 8 byte wide %rsp register 32 bit
%sp the lower half of the 4 byte wide %esp register 16 bit
%spl the lower half of the 2 byte wide %sp register 8 bit
%rbp base pointer; callee-saved 64 bit Yes
%ebp the lower half of the 8 byte wide %rbp register 32 bit
%bp the lower half of the 4 byte wide %ebp register 16 bit
%bpl the lower half of the 2 byte wide %bp register 8 bit
%rip instruction pointer or program counter 64 bit (call↔ret)
%eip the lower half of the 8 byte wide %rip register 32 bit
%ip the lower half of the 4 byte wide %eip register 16 bit
%rflags status or control flags 64 bit No
%eflags the lower half of the 8 byte wide %rflags register 32 bit
%flags the lower half of the 4 byte wide %eflags register 16 bit

The status (or flags) register contains mostly one-bit storage units ("flags") that reflect the current state of an x86/x64 CPU. For example, some flags show some important characteristics of the result of arithmetic or logical operations (including comparisons etc.). Some usual flags are illustrated below within a 64-bit %rflags register:

63 ... 11 ... 7 6 5 4 3 2 1 0
OF SF ZF AF PF CF

The flag names are abbreviated as follows:

Formerly (e.g. in the mainframe age) the program counter and the status register were collectively called PSW (program status word) register.



GNU x86/x64 assembly: Summary and further examples

Table of contents:

  • Calculating the first 10 elements of the Fibonacci sequence (fib.s)
    • Organizing a loop in an assembly program
    • Using suffixes in operating codes
    • Simple examples of using suffixes
  • Simple loop examples
    • Listing the first 10 natural numbers
    • Listing the first 10 powers of 2
    • Listing the first 10 factorials

Calculating the first 10 elements of the Fibonacci sequence (fib.s)

First, let us see a C program that prints the first 10 elements of the Fibonacci sequence. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fibonacci.c command, and create a new file named 'fibonacci.c' with the following content:

#include <stdio.h>

int main() {
 int k1, k2, i;
 int n=10;
 k1=1;
 k2=1;
 
 printf("Finonacci numbers\n");
 printf("%d\n",k1);
 i=2;

 while(i<=n) {
  printf("%d\n",k2);
  int x=k2;
  k2=k1+k2;
  k1=x; 
  i++;
  }

 return 0;
 }

Compile, link and run the compiled C program as follows:

Compile and run fibonacci.c using GCC

Let us now create an equivalent of the 'fibonacci.c' program in Intel x86/x64 assembly language which prints the first 10 elements of the Fibonacci sequence using quadwords length variables. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fibonacci.s command, and create a new file named 'fibonacci.s' with the following content:

.globl main

.data
.P1:
	.ascii "Finonacci numbers\12\0"
.P2:
	.ascii "%d\12\0"

.text
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$72, %rsp
	movq	$10, -32(%rbp)	# n
	movq	$1, -8(%rbp)	# k1
	movq	$1, -16(%rbp)	# k2

	leaq	.P1, %rax
	movq	%rax, %rcx
	call	printf

	movq	-8(%rbp), %rax
	movq	%rax, %rdx
	leaq	.P2, %rax
	movq	%rax, %rcx
	call	printf

	movq	$2, -24(%rbp)	# i
	jmp	.J2

/* begin loop */
.J1:
	movq	-16(%rbp), %rax
	movq	%rax, %rdx
	leaq	.P2, %rax
	movq	%rax, %rcx
	call	printf

	movq	-16(%rbp), %rax
	movq	%rax, -40(%rbp)	# x
	movq	-8(%rbp), %rax
	addq	%rax, -16(%rbp)
	movq	-40(%rbp), %rax
	movq	%rax, -8(%rbp)
	addq	$1, -24(%rbp)
.J2:
	movq	-24(%rbp), %rax
	cmpq	-32(%rbp), %rax
	jle	.J1		# jump if %rax≤-32(%rbp)
/* end of loop */

	movq	$10, %rax
	addq	$72, %rsp
	popq	%rbp
	ret

To understand the operation of the loop it is essential to know the status register or flags register that indicates the current state of an x86/x64 CPU, and especially the status flags (e.g. carry, parity, zero, sign, overflow etc. flags) which usually characterize the result of arithmetic operations.

Note that the return value of the 'main' function was now set to 10 (instead of 0) to see the difference from the compiled fibonacci.c program.

Compile, link and run the assembly program as follows:

Compile and run fibonacci.s using GCC

Finally, it is very instructive to see the machine code representation of the 'fibonacci.s' assembly program. For that purpose, first compile the program into an object file with debugging information entering the
gcc -g -c fibonacci.s -o fibonacci.o
command, and then create the 'memory dump' of the object file typing the
objdump -d -M -S fibonacci.o
command in the 'cmd' window. We shall get something like this:

The dump of fibonacci.o using GCC

The result of the is as follows:

fibonacci.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 
: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 48 sub $0x48,%rsp 8: 48 c7 45 e0 0a 00 00 movq $0xa,-0x20(%rbp) f: 00 10: 48 c7 45 f8 01 00 00 movq $0x1,-0x8(%rbp) 17: 00 18: 48 c7 45 f0 01 00 00 movq $0x1,-0x10(%rbp) 1f: 00 20: 48 8d 04 25 00 00 00 lea 0x0,%rax 27: 00 28: 48 89 c1 mov %rax,%rcx 2b: e8 00 00 00 00 callq 30 30: 48 8b 45 f8 mov -0x8(%rbp),%rax 34: 48 89 c2 mov %rax,%rdx 37: 48 8d 04 25 13 00 00 lea 0x13,%rax 3e: 00 3f: 48 89 c1 mov %rax,%rcx 42: e8 00 00 00 00 callq 47 47: 48 c7 45 e8 02 00 00 movq $0x2,-0x18(%rbp) 4e: 00 4f: eb 34 jmp 85 <.J2> 0000000000000051 <.J1>: 51: 48 8b 45 f0 mov -0x10(%rbp),%rax 55: 48 89 c2 mov %rax,%rdx 58: 48 8d 04 25 13 00 00 lea 0x13,%rax 5f: 00 60: 48 89 c1 mov %rax,%rcx 63: e8 00 00 00 00 callq 68 <.J1+0x17> 68: 48 8b 45 f0 mov -0x10(%rbp),%rax 6c: 48 89 45 d8 mov %rax,-0x28(%rbp) 70: 48 8b 45 f8 mov -0x8(%rbp),%rax 74: 48 01 45 f0 add %rax,-0x10(%rbp) 78: 48 8b 45 d8 mov -0x28(%rbp),%rax 7c: 48 89 45 f8 mov %rax,-0x8(%rbp) 80: 48 83 45 e8 01 addq $0x1,-0x18(%rbp) 0000000000000085 <.J2>: 85: 48 8b 45 e8 mov -0x18(%rbp),%rax 89: 48 3b 45 e0 cmp -0x20(%rbp),%rax 8d: 7e c2 jle 51 <.J1> 8f: 48 c7 c0 0a 00 00 00 mov $0xa,%rax 96: 48 83 c4 48 add $0x48,%rsp 9a: 5d pop %rbp 9b: c3 retq 9c: 90 nop 9d: 90 nop 9e: 90 nop 9f: 90 nop

In order to organize a loop in an assembly program we need both comparison and control instructions. Let us review some of them via examples:

Note that after the operating codes of some isntructions we can use certain suffixes to indicate the length of the operands.

suffix length of operands example(s)
-b byte movb $5, -1(%rbp)
movl -1(%rbp), %al
-w word
(2 bytes = 16 bits)
movw $5, -2(%rbp)
movw -2(%rbp), %ax
-l doubleword
(4 bytes = 32 bits)
movl $5, -4(%rbp)
movl -4(%rbp), %eax
-q quadword
(8 bytes = 64 bits)
movq $5, -8(%rbp)
movq -8(%rbp), %rax
leaq pattern, %rcx

Note that in the C programming language we can use the following format specifiers of the printf() function: %hi (for short integers), %d or %i (for doubleword integers), %ld (for quadword integers).

Simple examples of using suffixes

(1) Using signed bytes (or characters):

.globl	main

.data
pattern:
	.ascii "a=%hi, b=%hi, a+b=%hi\12\0"

.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movb	$5, -1(%rbp)	# a
	movb	$-8, -2(%rbp)	# b
	movb	-1(%rbp), %al
	addb	-2(%rbp), %al
	cbtw
	movw	%ax, %r9w	# a+b
	
	movb	-1(%rbp), %al
	cbtw
	movw	%ax, %dx	# a

	movb	-2(%rbp), %al
	cbtw
	movw	%ax, %r8w	# b

	leaq	pattern, %rcx
	call	printf

	movq	$0, %rax
	addq	$48, %rsp
	popq	%rbp
	ret

Note that the assembly instruction cbtw converts the 'al' register containing a signed 8-bit integer value to the word-length 'ax' register.

(2) Using short integers:

.globl	main

.data
pattern:
	.ascii "a=%hi, b=%hi, a+b=%hi\12\0"

.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movw	$5, -2(%rbp)	# a
	movw	$-8, -4(%rbp)	# b
	movw	-2(%rbp), %ax
	addw	-4(%rbp), %ax

	movw	-2(%rbp), %dx	# a
	movw	-4(%rbp), %r8w	# b
	movw	%ax, %r9w	# a+b
	leaq	pattern, %rcx
	call	printf

	movq	$0, %rax
	addq	$48, %rsp
	popq	%rbp
	ret

(3) Using doubleword integers:

.globl	main

.data
pattern:
	.ascii "a=%d, b=%d, a+b=%d\12\0"

.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	$5, -4(%rbp)	# a
	movl	$-8, -8(%rbp)	# b
	movl	-4(%rbp), %eax
	addl	-8(%rbp), %eax

	movl	-4(%rbp), %edx	# a
	movl	-8(%rbp), %r8d	# b
	movl	%eax, %r9d	# a+b
	leaq	pattern, %rcx
	call	printf

	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret

(4) Using quadword integers:

.globl	main

.data
pattern:
	.ascii "a=%ld, b=%ld, a+b=%ld\12\0"

.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movq	$5, -8(%rbp)	# a
	movq	$-8, -16(%rbp)	# b
	movq	-8(%rbp), %rax
	addq	-16(%rbp), %rax

	movq	-8(%rbp), %rdx	# a
	movq	-16(%rbp), %r8	# b
	movq	%rax, %r9	# a+b
	leaq	pattern, %rcx
	call	printf

	movq	$0, %rax
	addq	$48, %rsp
	popq	%rbp
	ret

It is important to study very carefully in the simple examples presented above the correct use of suffixes for the assembly instructions as well as the format specifiers used in the printf() function.

Listing the first 10 natural numbers

First, let us see a C program that prints the first 10 natural numbers (starting with 1, then 2, 3, 4, ..., 10). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad natural.c command, and create a new file named 'natural.c' with the following content:

#include <stdio.h>

int main() {
 int x=1;
 int i, n=10;
 for(i=1;i<=n;i++) {
  printf("%d\n",x);
  x++;
  }
 return i;
 }

Compile, link and run the compiled C program as follows:

Compile and run natural.c using GCC

Now it can be very instructive to see the compiled assembly version of the C program. Type and run in the 'cmd' window the gcc natural.c -S -o nat.s command. After making some changes (omitting some parts, commenting some of the instructions etc.), the resulting file will look like something like this:

.data
.pattern:
	.ascii "%d\12\0"

.text
printf:
	pushq	%rbp
	pushq	%rbx		# callee saved
	subq	$56, %rsp
	leaq	48(%rsp), %rbp
/* 56=48+8; pushing %rbx allocates +8 bytes at the stack */

	movq	%rcx, 32(%rbp)	# 4th argument stored
	movq	%rdx, 40(%rbp)	# 3rd argument stored
	movq	%r8, 48(%rbp)	# 5th argument stored
	movq	%r9, 56(%rbp)	# 6th argument stored

	leaq	40(%rbp), %rax
	movq	%rax, -16(%rbp)	# local variable
	movq	-16(%rbp), %rbx
	movl	$1, %ecx
	movq	__imp___acrt_iob_func(%rip), %rax
	call	*%rax

	movq	%rax, %rcx
	movq	32(%rbp), %rax
	movq	%rbx, %r8
	movq	%rax, %rdx
	call	__mingw_vfprintf
	movl	%eax, -4(%rbp)
	movl	-4(%rbp), %eax

	addq	$56, %rsp
	popq	%rbx
	popq	%rbp
	ret

.text
.globl	main
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp	# stack frame (48 byte)
	call	__main

	movl	$1, -4(%rbp)	# variable x
	movl	$10, -12(%rbp)	# variable n
	movl	$1, -8(%rbp)	# variable i
	jmp	.L4

/* begin of loop */
.L5:
	movl	-4(%rbp), %eax	# variable x to print
	movl	%eax, %edx	# 3rd parameter for printf
	leaq	.pattern, %rax	# address of .pattern
	movq	%rax, %rcx	# 4th parameter for printf
	call	printf

	addl	$1, -4(%rbp)	# x++
	addl	$1, -8(%rbp)	# i++
.L4:
	movl	-8(%rbp), %eax	# i → %eax
	cmpl	-12(%rbp), %eax	# %eax≤-12(%rbp) ?
	// -12(%rbp) is a reference to variable n
	jle	.L5		# jump if i≤n
/* end of loop */

	movl	-8(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Before the 'printf' function is called and the local variables are "declared" (i.e. before the 'jmp .L4' instruction), the content of the stack frame created by the 'main' function is as follows:

address content
0(%rsp) -48(%rbp) allocated space for the four parameters (or arguments) for the function 'printf'
8(%rsp) -40(%rbp)
16(%rsp) -32(%rbp)
24(%rsp) -24(%rbp)
(not used)
-12(%rbp) variable n (initially n=10)
-8(%rbp) variable i (initially i=1)
-4(%rbp) variable x (initially x=1)
48(%rsp) 0(%rbp) previous value of %rbp (pushed by the first instruction of main)
8(%rbp) return address for the caller of 'main' (for 'ret' in main)

After the 'printf' function is called for the first time by the 'main' function, the content of the stack frame created by the 'printf' function is as follows:

address content
0(%rsp) -48(%rbp)
... ... ...
48(%rsp) 0(%rbp)
56(%rsp) 8(%rbp) previous value of %rbx (pushed by the second instruction of printf)
16(%rbp) previous value of %rbp (pushed by the first instruction of printf)
24(%rbp) return address for the caller of 'printf', i.e. the address of the next instruction of 'main' after the 'call printf' instruction
(the following part of the stack is the same space for the arguments (or parameters) of the 'printf' function that has been allocated by the 'main' function, see the top of its stack frame in the table above)
32(%rbp) 4th argument of the function 'printf'
40(%rbp) 3rd argument of the function 'printf'
48(%rbp) 5th argument of the function 'printf'
56(%rbp) 6th argument of the function 'printf'

Let us now create an equivalent of the 'natural.c' program in Intel x86/x64 assembly language which prints the first 10 natural numbers. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad natural.s command, and create a new file named 'natural.s' with the following content:

.globl	main

.data 
msg:
	.ascii "The first %d natural numbers:\12\0"

pattern:
	.ascii "%d\12\0"

.text
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	$1, -4(%rbp)	# x
	movl	$10, -12(%rbp)	# n
	movl	$1, -8(%rbp)	# i

	leaq	msg, %rcx
	movl	-12(%rbp), %edx
	call	printf

.L0:	
	movl	-8(%rbp), %eax
	cmpl	-12(%rbp), %eax	# i>n ?
	jg	.L1

	leaq	pattern, %rcx
	movl	-4(%rbp), %edx
	call	printf
	incl	-4(%rbp)
	incl	-8(%rbp)
	jmp	.L0

.L1:
	movl	-8(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run natural.s using GCC

Listing the first 10 powers of 2

First, let us see a C program that prints the first 10 powers of 2 (starting with 1, then 2, 4, 8 etc.). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad powers.c command, and create a new file named 'powers.c' with the following content:

#include <stdio.h>

int nextpow(int x) {
 int p=x+x;
 return p;
 }

int main() {
 int x=1;
 int i=1, n=10;
 do {
  printf("%d\n",x);
  x=nextpow(x);
  i++;
  } while(i<=n);
 return i;
 }

Compile, link and run the compiled C program as follows:

Compile and run powers.c using GCC

Let us now create an equivalent of the 'powers.c' program in Intel x86/x64 assembly language which prints the first 10 powers of 2. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad powers.s command, and create a new file named 'powers.s' with the following content:

.globl	main

.data
pattern:
	.ascii "%d\12\0"

.text
nextpow:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$16, %rsp	# stack frame size
	movl	%ecx, 16(%rbp)	# parameter x
	movl	16(%rbp), %eax
	addl	%eax, %eax
	movl	%eax, -4(%rbp)	# local variable p
	movl	-4(%rbp), %eax
	addq	$16, %rsp
	popq	%rbp
	ret

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp
	movl	$1, -4(%rbp)	# variable x
	movl	$1, -8(%rbp)	# variable i
	movl	$10, -12(%rbp)	# variable n

.loop:
	movl	-4(%rbp), %edx
	leaq	pattern, %rcx
	call	printf
	movl	-4(%rbp), %ecx
	call	nextpow
	movl	%eax, -4(%rbp)
	addl	$1, -8(%rbp)	# i++
	movl	-8(%rbp), %eax
	cmpl	-12(%rbp), %eax	# i<=n ?
	jle	.loop

	movl	-8(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run powers.s using GCC

Listing the first 10 factorials

First, let us see a C program that prints the first 10 factorials (starting with 1, then 2, 6, 24 etc.). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fact.c command, and create a new file named 'fact.c' with the following content:

#include <stdio.h>

int f(int n) {
 int temp;
 temp=1;
 for(int i=2;i<=n;i++) {
  temp=temp*i;
  }
 return temp;
 }

int main() {
 int n=10;
 printf("List of the first %d factorials:\n",n);
 int i=1;
 while(i<=n) {
  printf("%d\n",f(i));
  i=i+1;
  };
 return i;
 }

Compile, link and run the compiled C program as follows:

Compile and run powers.c using GCC

Let us now create an equivalent of the 'fact.c' program in Intel x86/x64 assembly language which prints the first 10 factorials. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fact.s command, and create a new file named 'fact.s' with the following content:

.globl	main
.globl	f

.data
.LC0:
	.ascii "List of the first %d factorials:\12\0"
.LC1:
	.ascii "%d\12\0"

.text

f:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$16, %rsp

	movl	%ecx, 16(%rbp)
	movl	$1, -4(%rbp)
	movl	$2, -8(%rbp)
	jmp	.L4
.L5:
	movl	-4(%rbp), %eax
	imull	-8(%rbp), %eax
	movl	%eax, -4(%rbp)
	addl	$1, -8(%rbp)
.L4:
	movl	-8(%rbp), %eax
	cmpl	16(%rbp), %eax
	jle	.L5

	movl	-4(%rbp), %eax
	addq	$16, %rsp
	popq	%rbp
	ret

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	$10, -8(%rbp)
	movl	-8(%rbp), %eax
	movl	%eax, %edx
	leaq	.LC0, %rax
	movq	%rax, %rcx
	call	printf

	movl	$1, -4(%rbp)
	jmp	.L8
.L9:
	movl	-4(%rbp), %eax
	movl	%eax, %ecx
	call	f
	movl	%eax, %edx
	leaq	.LC1, %rax
	movq	%rax, %rcx
	call	printf
	addl	$1, -4(%rbp)
.L8:
	movl	-4(%rbp), %eax
	cmpl	-8(%rbp), %eax
	jle	.L9

	movl	-4(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Note that the imul assembly instruction executes a signed multiplication or product of the first operand (which can be either a register or a word-length or doubleword-length memory content) and the second operand (a register), and stores the resulting product in the the register specified as the second operand.

Compile, link and run the assembly program as follows:

Compile and run fact.s using GCC



Boda István, 2025.