Computer architectures - A brief overview


Recommended reading:
Jim Ledin: Modern Computer Architecture and Organization.
Birmingham – Mumbai: Packt, 2020.

Wikipedia, selected entries. (2025-09-28)



Computer architecture - Definition

Computers have become an integral part of our daily lives. They power everything from smartphones to hospital systems and have shaped society to such an extent that many people simply couldn't live without the hardware and software that defines the digital world.

Despite this, the majority of people still have no idea how computers work and the role of hardware and software in powering the modern technologies we use today.

Behind the sleek* screens and intricate interfaces, computer architecture forms the fundamental components and processes that make our computers tick. (Stewart 2005)

Computer architecture (CA) is the structure of a computer system made from component parts. At the highest level, the computer can be considered as a black box, while at the lowest level as a complex network of physical components like combinational and sequential circuits and logic gates.

At each level, CA describes the internal organization of a computer in an abstract way that ignores details of the implementation at the lower level. At the highest level, CA defines the capabilities of the computer (from the user's viewpoint) and its programming model (from the programmer's viewpoint).

CA is the science and art of designing computers by defining the functional behavior and organization of hardware components like the CPU, memory, storage and I/O devices etc., including how they interact. CA establishes*
– the Instruction Set Architecture (ISA);
– the Microarchitecture;
– the Hardware System Architecture (HSA);
– the Macroarchitecture.

Key Components of Computer Architecture

Why Computer Architecture Matters

Vocabulary:

Pronunciation symbols:
ae as in cap [kaep], hat [haet], valid [vaelid]
oe as in beggar [begoer], altar [awltoer], signal [sign(oe)l]
aw as in all [awl], fall [fawl], law [law], raw [raw]
ow as in how [how], howl [howl], power [powoer]
uo as in poor [puor], cure [kyuor], pure [pyuor]

establish [i staeblish] = to build or bring into being sth on a stable basis (Webster 2009)
syn/rel: ground, base, be the basis for

Scalability establishes the ability of the system to handle a growing amount of workload.

blueprint [blu:print] = a design plan or other technical drawing (e.g. a system diagram, a data flow diagram etc.)

sleek [sli:k] = having a smooth attractive shape (Longman 2009); finely contoured [kontuord]; streamlined (Webster 2009)

a sleek computer screen

References:

AI about "computer architecture" Google Search. (2025-09-11)

Stewart, Ellis 2025. What is Computer Architecture? Definition, Types, Structure.
https://em360tech.com/tech-articles/what-computer-architecture-definition-types-structure (2025-09-11)

Illingworth, Valerie – Pyle, Ian 1996-1997. A Dictionary of Computing. Oxford – New York etc.: Oxford University Press.

Ledin, Jim 2020. Modern Computer Architecture and Organization. Birmingham – Mumbai: Packt.

Stallings, William 2018. Operating Systems. Internals and Design Principles. Edinburgh: Pearson, 2018.

Wikipedia entries: Computer Architecture etc.



Brief overview of computer system hardware (cf. Stallings 2018: 30-32)

In general, a computer consists of a processor, a main memory, and several input-output (I/O) components.

Von Neumann architecture
The Von Neumann architecture

Block diagram of a computer
Block diagram of a computer with uniprocessor CPU
(black lines indicate the flow of control signals, whereas red lines indicate the flow of processor instructions, address information and data. Arrows indicate the direction of flow)

System bus architecture
Single system bus architecture

Computer Components: Top-Level View

The figure above illustrates the logic of the operation of the system bus. The CPU contains some (internal) registers to support data exchange among the CPU, the main memory and the I/O module. These registers and their function are as follows:



Execution of instructions (cf. Stallings 2018: 32-35)

A program to be executed by a processor consists of a set of machine-level instructions stored in the memory. In its simplest form, the processing of instructions consists of two basic steps:
– first, the processor reads (or fetches) the instructions from the memory one at a time, and
– second, the processor executes each instruction.
The execution of a program is a repeating process (a cycle or loop) of these two steps: the instruction fetch and the instruction execution. (Note that instruction execution may involve several operations and depends on the nature of the instruction.)

The figure below illustrates the instruction cycle:

Basic Instruction Cycle

At the beginning of each instruction cycle, the processor fetches an instruction from memory. In this respect, the program counter (PC) register is of utmost importance: the PC holds the address of the next instruction to be fetched. After the instruction has been fetched, the processor increments the value of the PC so that it will hold the address of the next instruction in the sequence of instructions (i.e. in the program which is currently being executed).

The fetched instruction is loaded into the instruction register (IR). An instruction is normally made up of a combination of an operation code and the specification of the operands that present or refer to the data upon which the operation is to be performed. The operation code of the instruction contains bits that specify the action the processor is to take. The processor (or more specifically, the control unit of the processor) interprets the instruction and performs the required action. In general, these actions fall into four categories:

The execution of an instruction may involve a certain combination of these actions.



An example of the operation of the fetch-execute cycle (cf. Stallings 2018: 33-35)

Let the memory of a virtual machine be organized with 16-bit length (i.e. word-length) memory cells. Each instruction consists of a 4-bit operation code (opcode) and a 12-bit operand. Note that if the operand contains an address, this allows to directly address a maximum of 212=4096 memory cells.

Instruction format (example)

We shall use four hexadecimal digits to represent the 16-bit (one-word) content of the registers, memory addresses and the content of memory cells. (Note that for the 12-bit long addresses three hexadecimal digits would be enough.) Similarly, we shall use one hexadecimal digit to represent the opcode of each instruction.

In the example we want to add two whole numbers represented by two's complement code. We will use one general-purpose register (the accumulator, AC) and three instructions as follows:

We assume that the first instruction to be performed is located at the memory address 300 followed sequentially by the further instructions of the program (located at the addresses 301, 302 etc., respectively). Furthermore, we assume that the data that the program manipulates are stored in the memory locations between addresses 940 and 941.

Memory content
Address Content
(instructions)
0 3 0 0
1 9 4 0
0 3 0 1
5 9 4 1
0 3 0 2
2 9 4 1
(data)
0 9 4 0
0 0 0 3
0 9 4 1
0 0 0 2

Now let's see how the operation is performed in three fetch-execute cycle.

– In the example we analyze in detail the operation of the fetch-execute cycle. Since the initial value of the program or instruction counter register (PC or IP) is set to location 300, in the first cycle the processor will fetch the instruction at the memory location 300 and then immediately increments the value of PC. On the succeeding instruction cycles, the CPU will fetch instructions from locations 301, 302, and so on. (Note, however, that the sequential execution of instructions can be altered at any time by a certain control instruction.)

– In each cycle the fetched instruction is always loaded into the instruction register (IR). The operation code (opcode) of the instruction will specify the necessary action that the processor is to take. After separating the opcode and the operand, the processor (actually, the control unit) interprets the opcode of the instruction and sends control signals to the appropriate units to perform the required action.


1st. cycle
Storage unit Value Comment
Fetch stage
PC
0 3 0 0
fetch the instruction from M(300)
M(300)
1 9 4 0
load the content of M(300) into IR
IR
1 9 4 0
interpret the instruction
  • opcode=1: move memory data into AC
  • operand=940: the data is located at M(940)
PC
0 3 0 1
increment the value of PC with 1
Execute stage: AC←M(940) or MOV AC,M(0940)
M(940)
0 0 0 3
load the content of M(940) into AC
AC
0 0 0 3
store the content of M(940) in AC

2nd. cycle
Storage unit Value Comment
Fetch stage
PC
0 3 0 1
fetch the instruction from M(301)
M(301)
5 9 4 1
load the content of M(300) into IR
IR
5 9 4 1
interpret the instruction
  • opcode=5: add memory data to AC
  • operand=941: the data to be added is located at M(941)
PC
0 3 0 2
increment the value of PC with 1
Execute stage: AC←AC+M(941) or ADD M(0941),AC
AC
0 0 0 3
add the content of M(941) to AC
M(941)
0 0 0 2
AC
0 0 0 5
store the result of the addition in AC

3rd. cycle
Storage unit Value Comment
Fetch stage
PC
0 3 0 2
fetch the instruction from M(302)
M(302)
2 9 4 1
load the content of M(302) into IR
IR
2 9 4 1
interpret the instruction
  • opcode=2: move the content of AC into a memory cell
  • operand=941: the memory cell is located at M(941)
PC
0 3 0 3
increment the value of PC with 1
Execute stage: M(941)←AC or MOV M(0941),AC
M(941)
0 0 0 2
move the content of AC into M(941)
AC
0 0 0 5
M(941)
0 0 0 5
store the content of AC in M(941)

In this example three instruction cycles were needed, each consisting of a fetch stage and an execute stage. As a result, we added the contents of the memory location 940 to the contents of the memory location 941, and then stored the sum at the memory location 941.

The following figure summarizes the process.

Example of Program Execution



Implementation of the above example in Windows
II.1. Create and compile C files

Table of contents:

  • Printing "Hello World!" (hello.c)
  • Adding 3+2 (simple.c)
    • Creating a batch file to display ERRORLEVEL (err.bat)
  • Adding 3+2 with a function (simplef.c)
    • Assembly version of 'simplef.c' with quadword-length operands (simplex.s)
  • Adding and printing 3+2 (example.c)
    • Assembly version of 'example.c' with quadword-length operands (examplex.s)

Printing "Hello World!" (hello.c)

Open a new 'cmd' window in the c:\temp\gcc directory and set the default path running the 'setpath' command (only once). Using the notepad hello.c command, create a new file named 'hello.c' with the following content:

#include <stdio.h>

int main() {
 printf("Hello world!\n");
 return 0;
 }

Compile, link and run the C program as follows:

Compile and run hello.c using GCC

Adding 3+2 (simple.c)

Now let us create another simple C program which implements the former example adding two integers together. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad simple.c command, and create a new file named 'simple.c' with the following content:

#include <stdio.h>

int main() {
 int a=3;
 int b=2;
 b=a+b;
 return 0;
 }

Compile, link and run the C program as follows:

Compile and run simple.c using GCC

Note that in the 'cmd' window, we can display the returned value of the 'simple.exe' program using the echo %ERRORLEVEL% command.

For the sake of simplicity, let us create a batch file named 'err.bat' using the notepad err.bat command. It is to contain those two lines:

@echo off
echo %ERRORLEVEL%

With that we created a new command called err which will easily display, if entered, the actual value of the ERRORLEVEL environment variable in the 'cmd' window.

It will be instructive for later considerations that using the GCC compiler we can generate easily the assembly code of the 'simple.c' program (as well as any other C programs). For that purpose, we should enter the gcc simple.c -S -o simple.s command in the 'cmd' window.

Compile the simple.c program into assembly code using GCC

The generated assembly program is as follows:

	.file	"simple.c"
	.text
	.def	__main;
		.scl	2;
		.type	32;
	.endef
	.globl	main
	.def	main;
		.scl	2;
		.type	32;
	.endef
	.seh_proc	main
main:
	pushq	%rbp
	.seh_pushreg	%rbp
	movq	%rsp, %rbp
	.seh_setframe	%rbp, 0
	subq	$48, %rsp
	.seh_stackalloc	48
	.seh_endprologue
	call	__main
	movl	$3, -4(%rbp)
	movl	$2, -8(%rbp)
	movl	-4(%rbp), %eax
	addl	%eax, -8(%rbp)
	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret
	.seh_endproc
	.ident	"GCC: (GNU) 13.2.0"

The explanation of some important parts of the assembly code:

After such considerations, we can easily create the 'simplex.s' assembly program which contains quadword length operands, and returns the sum of the addition (as an ERRORLEVEL value):

.globl	main

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp
	movq	$3, -8(%rbp)
	movq	$2, -16(%rbp)
	movq	-8(%rbp), %rax
	addq	%rax, -16(%rbp)	# the sum of a and b
	movq	-16(%rbp), %rax	# return (ERRORLEVEL) value
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the C program as follows:

Compile and run simplex.s using GCC

Adding 3+2 with a function (simplef.c)

The aim of the 'simple.c' program can also be implemented using a function named 'sum' which adds two integers together. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad simplef.c command, and create a new file named 'simplef.c' with the following content:

#include <stdio.h>

int sum(int x,int y) {
 int temp;
 temp=x+y;
 return temp;
 }

int main() {
 int a=3;
 int b=2;
 int c;
 c=sum(a,b);
 return c;
 }

Set the default path running the 'setpath' command (remember, only once). Compile, link and run the C program as follows:

Compile and run simplef.c using GCC

In the 'cmd' window, we can display again the returned value of the 'simplef.exe' program using the echo %ERRORLEVEL% command (or running the 'err' batch file).

Adding and printing 3+2 (example.c)

It was not easy to check the 'simple.c' or 'simplef.c' programs because there were no visual output in them. So open a new 'cmd' window again in the c:\temp\gcc directory. Using the notepad example.c command, create a new file named 'example.c' with the following content:

#include <stdio.h>

int main() {
 int a=3;
 int b=2;
 int c=a+b;
 printf("%d + %d = %d\n",a,b,c);
 return 0;
 }

Set the default path running the 'setpath' command. Compile, link and run the C program as follows:

Compile and run example.c using GCC

Like before, we can generate easily the assembly code of the 'example.c' program by entering the gcc example.c -S -o example.s command in the 'cmd' window.
The generated assembly program is as follows:

	.file	"example.c"
	.text
	.def	printf;
		.scl	3;
		.type	32;
		.endef
	.seh_proc	printf

printf:
	pushq	%rbp
	.seh_pushreg	%rbp
	pushq	%rbx
	.seh_pushreg	%rbx
	subq	$56, %rsp
	.seh_stackalloc	56
	leaq	48(%rsp), %rbp
	.seh_setframe	%rbp, 48
	.seh_endprologue
	movq	%rcx, 32(%rbp)	# 4th argument stored
	movq	%rdx, 40(%rbp)	# 3rd argument stored
	movq	%r8, 48(%rbp)	# 5th argument stored
	movq	%r9, 56(%rbp)	# 6th argument stored
	leaq	40(%rbp), %rax
	movq	%rax, -16(%rbp)
	movq	-16(%rbp), %rbx
	movl	$1, %ecx
	movq	__imp___acrt_iob_func(%rip), %rax
	call	*%rax
	movq	%rax, %rcx
	movq	32(%rbp), %rax
	movq	%rbx, %r8
	movq	%rax, %rdx
	call	__mingw_vfprintf
	movl	%eax, -4(%rbp)
	movl	-4(%rbp), %eax
	addq	$56, %rsp
	popq	%rbx
	popq	%rbp
	ret
	.seh_endproc
	.def	__main;
		.scl	2;
		.type	32;
		.endef
	.section .rdata,"dr"
.LC0:
	.ascii "%d + %d = %d\12\0"
	.text
	.globl	main
	.def	main;
		.scl	2;
		.type	32;
		.endef
	.seh_proc	main
main:
	pushq	%rbp
	.seh_pushreg	%rbp
	movq	%rsp, %rbp
	.seh_setframe	%rbp, 0
	subq	$48, %rsp
	.seh_stackalloc	48
	.seh_endprologue
	call	__main
	movl	$3, -4(%rbp)
	movl	$2, -8(%rbp)
	movl	-4(%rbp), %edx
	movl	-8(%rbp), %eax
	addl	%edx, %eax
	movl	%eax, -12(%rbp)
	movl	-12(%rbp), %ecx
	movl	-8(%rbp), %edx
	movl	-4(%rbp), %eax
	movl	%ecx, %r9d
	movl	%edx, %r8d
	movl	%eax, %edx
	leaq	.LC0(%rip), %rax
	movq	%rax, %rcx
	call	printf
	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret
	.seh_endproc
	.ident	"GCC: (GNU) 13.2.0"
	.def	__mingw_vfprintf;
		.scl	2;
		.type	32;
		.endef

Based on the compiled program, we can easily create the 'examplex.s' assembly program which contains only quadword length operands. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad examplex.s command, and create a new file named 'examplex.s' with the following content:

.data
.msg:
	.ascii "%d + %d = %d\12\0"

.text
.globl	main
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp
	movq	$3, -8(%rbp)	# local variable a
	movq	$2, -16(%rbp)	# local variable b
	movq	-8(%rbp), %rdx
	movq	-16(%rbp), %rax
	addq	%rdx, %rax
	movq	%rax, -24(%rbp)	# local variable c
	movq	-24(%rbp), %rcx
	movq	-16(%rbp), %rdx
	movq	-8(%rbp), %rax

	movq	%rcx, %r9	# 6th argument (var c)
	movq	%rdx, %r8	# 5th argument (var b)
	movq	%rax, %rdx	# 3rd argument (var a)
	leaq	.msg, %rax
	movq	%rax, %rcx	# 4th argument (pattern .msg)
	call	printf

	movq	$0, %rax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run examplex.s using GCC



Implementation of the above example in Windows
II.2. Create and compile assembly files

Table of contents:

  • Printing "Hello World!" (asmh.s)
  • Setting the ERRORLEVEL (abc.s)
    • A short explanation of the stack (push, pop)
  • Adding 3+2 (abcs.s)
    • The explanation of the stack frame
    • A short explanation of function calls (call, ret)
  • Adding 3+2 with a function (abcf.s)
    • List of important registers in the Intel x86/x64 architecture
    • The flags register

Printing "Hello World!" (asmh.s)

Now let us create a simple program in Intel x86/x64 assembly language which displays the well-known 'Hello world!' message. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad asmh.s command, and create a new file named 'asmh.s' with the following content:

.globl	main

// definitions of constants and variables
.data

hello:
	.ascii "Hello world!\12\0"

// program instructions (code)
.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$32, %rsp
	leaq	hello, %rax

/* setting the parameter for the function 'printf' */
	movq	%rax, %rcx	# address of 'hello'
	call	printf
/* displayed 'Hello world!' */

	movl	$0, %eax	# set ERRORLEVEL value
	addq	$32, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run asmh.s using GCC

Note that the size of the stack frame is 32 bytes, even though there are no local variables in the program. The four quadwords allocated at the top of the stack frame can be used for the (possible) parameters of the 'printf' function.

Setting the ERRORLEVEL (abc.s)

After we have successfully created and compiled the 'asmh.s' program, let us create another simple program in Intel x86/x64 assembly language which does nothing except returns the value 10 as an ERRORLEVEL value.

Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abc.s command, and create a new file named 'abc.s' with the following content:

.globl main

main:
        enter $0, $0
        movq $10, %rax
        leave
        ret

Note that the first two instructions of the 'main' function creates a stack frame which, among others, can contain the values of the local variables (if there are any such variables at all). The basic pointer register (%rbp) is used as a reference to point to the address of those local variables (i.e. the local variables can be addressed relatively to the value of the basic pointer).

The 'enter $0, $0' assembly instruction corresponds to the
   pushq %rbp
   movq %rsp, %rbp
instructions. It creates the stack frame of the function.

Note that e.g. the 'enter $24, $0 instruction would, after the two instructions shown above, allocate in the stack 24 byte memory space by subtracting 24 from the actual value of the stack pointer. Because 6*4=24 holds, this would be enough for six double-word (i.e. 4 byte=32 bit) length local variables (or for three quadword length local variables, respectively).

The 'leave' assembly instruction corresponds to the
   movq %rbp, %rsp
   popq %rbp
instructions. It frees (or destroys) the stack frame of the function.

The stack is a dedicated and designated part of the memory which can store data according to the current needs of the programs. In this respect, the push and pop instructions are of most importance for adding or removing (as well as retrieving) data to or from the top of the stack. In order to implement and use a stack
– the programs use a dedicated register called stack pointer (%rsp) that always points to the top of the stack by containing the address of the last data item that has been pushed;
– when a data item is pushed into the stack first the stack pointer is decreased by the size of the operand (e.g. by subtracting 8 from the actual value of the stack pointer for a quadword), and then the content of the operand is stored at that address;
– when a data item is popped from the stack first the data from the top of the stack is retrieved from the memory address (and stored in the operand of the 'pop' instruction), and then the stack pointer is increased by the size of the operand (e.g. by adding 8 to the actual value of the stack pointer for a quadword).
The diagram below illustrates the push and pop operations:
Illustration of the push and pop operations

Using the push / pop instructions instead of the enter / leave instructions, we can create another version of the program 'abc.s' as follows:

.globl main

main:
        pushq %rbp
        movq %rsp, %rbp
        movq $10, %rax
	movq %rbp, %rsp
	popq %rbp
        ret

Compile, link and run the assembly program:

Compile and run abc.s using GCC

Here, like in the case of the 'simple.exe' program or the 'simplex.exe' program, we can display the returned value of the 'abc.exe' program using the echo %ERRORLEVEL% command in the 'cmd' window (or we can enter the 'err' command if the 'err.bat' file exists).

Adding 3+2 (abcs.s)

Now let us create an equivalent of the 'simple.c' program in Intel x86/x64 assembly language which adds two numbers (3 and 2) together as long integer types, stores the sum in another longint variable, and returns the sum as an ERRORLEVEL value. Before that, the program will warn us to check the actual value of the ERRORLEVEL environment variable.

Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abcs.s command, and create a new file named 'abcs.s' with the following content:

.globl	main

.data 
hello:
	.ascii "\12See the ERRORLEVEL value!\12\0"

.text
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$56, %rsp
	/* stack frame created */

	movq	$3, -8(%rbp)
	movq	$2, -16(%rbp)
	movq	-8(%rbp), %rax
	addq	-16(%rbp), %rax 
	movq	%rax, -24(%rbp)

	leaq	hello, %rcx
	call	printf

	movq	-24(%rbp), %rax	# set ERRORLEVEL value

	/* stack frame to be destroyed */
	addq	$56, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run abcs.c using GCC

Here, like in the case of the 'simple.exe' and 'abc.exe' programs, in the 'cmd' window we can display the returned value of the 'abcs.exe' program using the echo %ERRORLEVEL% or simply the err command. But in this case it returns the sum of the addition 3+2 (i.e. 5).

The size and content of the stack frame needs some explanation. The size of the stack frame is 56 bytes which corresponds to 7 quadwords (i.e. 56=7*8). The structure and content of the stack frame is as follows:

address content
0(%rsp) -56(%rbp) parameters for the function 'printf'
8(%rsp) -48(%rbp)
16(%rsp) -40(%rbp)
24(%rsp) -32(%rbp)
-24(%rbp) variable c
-16(%rbp) variable b
-8(%rbp) variable a
56(%rsp) 0(%rbp) previous value of %rbp (pushed by the first instruction of main)
8(%rbp) return address for the caller of 'main' (for 'ret' in main)
The basic pointer (%rbp) register has a special purpose: it points to the bottom of the stack frame of the current function, so local variables can be accessed relative to its value.

As for the last row of the table which belongs to the address 8(%rbp) just below the bottom of the stack frame, when the program environment (i.e. the cmd.exe in our case) runs the abcs.exe program, it calls the 'main' global function of the abcs.exe program, and the current value of the instruction pointer is automatically pushed onto the top of the stack. (Thus when the called 'main' function exits and returns, the CPU can continue the execution of the caller program by popping the address of the next instruction to be performed from the stack and loading it into the instruction pointer).

In general, when a specific function of the program is called by another function (from the same or from another program), the return address of the next instruction to be executed after the 'call' instruction is automatically pushed onto the top of the stack.

Note that in the fetch-execute cycle the address of the next instruction is always stored in the %rip instruction pointer or program counter register. Thus the 'call' function, when executed, pushes the current value of the instruction pointer onto the top of the stack. After that the called function (the callee) pushes the value of the basic pointer and creates its stack frame.

The ret instruction is always the last instruction of any function. It "pops" the stored address of the next instruction to be executed from the top of the stack and restores the value the instruction pointer. Then the next fetch-execute cycle will continue the execution of the program immediately after the 'call' instruction.

The called function is named the callee, and the function that calls the callee is named the caller.

The diagram below illustrates the mechanism of the 'call' and the 'ret' (i.e. return) instructions:
Illustration of the call and ret instructions

Adding 3+2 with a function (abcf.s)

Let us now create an equivalent of the 'simplef.c' program in Intel x86/x64 assembly language which adds two numbers (3 and 2) together with a function named 'sum' and returns the sum. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abcf.s command, and create a new file named 'abcf.s' with the following content:

.globl	sum

sum:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$16, %rsp
	movl	%ecx, 16(%rbp)
	movl	%edx, 24(%rbp)
	movl	16(%rbp), %edx
	movl	24(%rbp), %eax
	addl	%edx, %eax
	movl	%eax, -4(%rbp)
	movl	-4(%rbp), %eax
	addq	$16, %rsp
	popq	%rbp
	ret

.globl	main

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp
	movl	$3, -4(%rbp)
	movl	$2, -8(%rbp)
	movl	-8(%rbp), %edx
	movl	-4(%rbp), %eax
	movl	%eax, %ecx
	call	sum
	movl	%eax, -12(%rbp)
	// movl	$0, %eax
	movl	-12(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run abcf.s using GCC

Like in the case of the previous programs, in the 'cmd' window we can display the returned value of the 'abcf.exe' program using either the 'err' command or the echo %ERRORLEVEL% command. Now we can see again the sum of the addition 3+2 (i.e. 5).


So far, we used a lot of still unknown registers. Therefore it is high time to have an overview which registers are available for the assembly programs in the Intel x86/x64 architecture. First note, that using the AT&T assembly syntax,
– the 32 bit wide register names are prefixed with the %e characters, and
– the 64 bit wide register names are prefixed with the %r characters.

Note that when we declare an int type variable in C, its length will be 32 bit (i.e. it is double-word wide).

In the Intel x86/x64 architecture the list of some important registers are as follows (cf. X86-64 Architecture Guide, 2025-03-11; Assembly 1: Basics, 2025-03-30):

Register Purpose Size Saved across calls
General-purpose registers
%rax temp register for arithmetic or logical calculations etc. (called accumulator)
return value of a function
64 bit No
%eax the lower half of the 8 byte wide %rax register 32 bit
%ax the lower half of the 4 byte wide %eax register 16 bit
%ah the higher half of the 2 byte wide %ax register 8 bit
%al the lower half of the 2 byte wide %ax register 8 bit
%rbx callee-saved 64 bit Yes
%ebx the lower half of the 8 byte wide %rbx register 32 bit
%bx the lower half of the 4 byte wide %ebx register 16 bit
%bh the higher half of the 2 byte wide %bx register 8 bit
%bl the lower half of the 2 byte wide %bx register 8 bit
%rcx used to pass 4th argument to functions 64 bit No
%ecx the lower half of the 8 byte wide %rcx register 32 bit
%cx the lower half of the 4 byte wide %ecx register 16 bit
%ch the higher half of the 2 byte wide %cx register 8 bit
%cl the lower half of the 2 byte wide %cx register 8 bit
%rdx used to pass 3rd argument to functions 64 bit No
%edx the lower half of the 8 byte wide %rdx register 32 bit
%dx the lower half of the 4 byte wide %edx register 16 bit
%dh the higher half of the 2 byte wide %dx register 8 bit
%dl the lower half of the 2 byte wide %dx register 8 bit
%rsi used to pass 2nd argument to functions 64 bit No
%esi the lower half of the 8 byte wide %rsi register 32 bit
%si the lower half of the 4 byte wide %esi register 16 bit
%sil the lower half of the 2 byte wide %si register 8 bit
%rdi used to pass 1st argument to functions 64 bit No
%edi the lower half of the 8 byte wide %rdi register 32 bit
%di the lower half of the 4 byte wide %edi register 16 bit
%dil the lower half of the 2 byte wide %di register 8 bit
%r8
used to pass 5th argument to functions
64 bit No
%r8d the lower half of the 8 byte wide %r8 register 32 bit
%r8w the lower half of the 4 byte wide %r8d register 16 bit
%r8b the lower half of the 2 byte wide %r8w register 8 bit
%r9
used to pass 6th argument to functions
64 bit No
%r9d the lower half of the 8 byte wide %r9 register 32 bit
%r9w the lower half of the 4 byte wide %r9d register 16 bit
%r9b the lower half of the 2 byte wide %r9w register 8 bit
%r10 temporary 64 bit No
%r11 temporary 64 bit No
%r12 callee-saved 64 bit Yes
%r13 callee-saved 64 bit Yes
%r14 callee-saved 64 bit Yes
%r15 callee-saved 64 bit Yes
Special-purpose registers
%rsp stack pointer 64 bit Yes
%esp the lower half of the 8 byte wide %rsp register 32 bit
%sp the lower half of the 4 byte wide %esp register 16 bit
%spl the lower half of the 2 byte wide %sp register 8 bit
%rbp base pointer; callee-saved 64 bit Yes
%ebp the lower half of the 8 byte wide %rbp register 32 bit
%bp the lower half of the 4 byte wide %ebp register 16 bit
%bpl the lower half of the 2 byte wide %bp register 8 bit
%rip instruction pointer or program counter 64 bit (call↔ret)
%eip the lower half of the 8 byte wide %rip register 32 bit
%ip the lower half of the 4 byte wide %eip register 16 bit
%rflags status or control flags 64 bit No
%eflags the lower half of the 8 byte wide %rflags register 32 bit
%flags the lower half of the 4 byte wide %eflags register 16 bit

The status (or flags) register contains mostly one-bit storage units ("flags") that reflect the current state of an x86/x64 CPU. For example, some flags show some important characteristics of the result of arithmetic or logical operations (including comparisons etc.). Some usual flags are illustrated below within a 64-bit %rflags register:

63 ... 11 ... 7 6 5 4 3 2 1 0
OF SF ZF AF PF CF

The flag names are abbreviated as follows:

Formerly (e.g. in the mainframe age) the program counter and the status register were collectively called PSW (program status word) register. Nevertheless, this term can also be used for modern computers as well, including e.g. the content of the flags register. "The PSW contains status information about the currently running process, including memory usage information, condition codes, and other status information such as an interrupt enable/disable bit and a kernel/ user-mode bit." (Stallings 2018: 41)



GNU x86/x64 assembly: Summary and further examples

Table of contents:

  • Calculating the first 10 elements of the Fibonacci sequence (fib.s)
    • Machine code representation of the assembly program
  • Organizing a loop in an assembly program
    • Using suffixes in operating codes
    • Simple examples of using suffixes
    • Simple examples for demonstrating comparisons and jumps
  • Further loop examples
    • Listing the first 10 natural numbers
    • Listing the first 10 powers of 2
    • Listing the first 10 factorials

Calculating the first 10 elements of the Fibonacci sequence (fib.s)

First, let us see a C program that prints the first 10 elements of the Fibonacci sequence. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fibonacci.c command, and create a new file named 'fibonacci.c' with the following content:

#include <stdio.h>

int main() {
 int k1, k2, i;
 int n=10;
 k1=1;
 k2=1;
 
 printf("Finonacci numbers\n");
 printf("%d\n",k1);
 i=2;

 while(i<=n) {
  printf("%d\n",k2);
  int x=k2;
  k2=k1+k2;
  k1=x; 
  i++;
  }

 return 0;
 }

Compile, link and run the compiled C program as follows:

Compile and run fibonacci.c using GCC

Let us now create an equivalent of the 'fibonacci.c' program in Intel x86/x64 assembly language which prints the first 10 elements of the Fibonacci sequence using quadwords length variables. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fibonacci.s command, and create a new file named 'fibonacci.s' with the following content:

.globl main

.data
.P1:
	.ascii "Finonacci numbers\12\0"
.P2:
	.ascii "%d\12\0"

.text
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$72, %rsp
	movq	$10, -32(%rbp)	# n
	movq	$1, -8(%rbp)	# k1
	movq	$1, -16(%rbp)	# k2

	leaq	.P1, %rax
	movq	%rax, %rcx
	call	printf

	movq	-8(%rbp), %rax
	movq	%rax, %rdx
	leaq	.P2, %rax
	movq	%rax, %rcx
	call	printf

	movq	$2, -24(%rbp)	# i
	jmp	.J2

/* begin loop */
.J1:
	movq	-16(%rbp), %rax
	movq	%rax, %rdx
	leaq	.P2, %rax
	movq	%rax, %rcx
	call	printf

	movq	-16(%rbp), %rax
	movq	%rax, -40(%rbp)	# x
	movq	-8(%rbp), %rax
	addq	%rax, -16(%rbp)
	movq	-40(%rbp), %rax
	movq	%rax, -8(%rbp)
	addq	$1, -24(%rbp)
.J2:
	movq	-24(%rbp), %rax
	cmpq	-32(%rbp), %rax	# i≤n ?
	jle	.J1		# jump if %rax≤-32(%rbp)
/* end of loop */

	movq	$10, %rax
	addq	$72, %rsp
	popq	%rbp
	ret

To understand the operation of the loop it is essential to know the status register or flags register that indicates the current state of an x86/x64 CPU, and especially the status flags (e.g. carry, parity, zero, sign, overflow etc. flags) which usually characterize the result of arithmetic operations.

Note that the return value of the 'main' function was now set to 10 (instead of 0) to see the difference from the compiled fibonacci.c program.

Compile, link and run the assembly program as follows:

Compile and run fibonacci.s using GCC

Finally, it is very instructive to see the machine code representation of the 'fibonacci.s' assembly program. For that purpose, first compile the program into an object file with debugging information entering the
gcc -g -c fibonacci.s -o fibonacci.o
command, and then create the 'memory dump' of the object file typing the
objdump -d -M -S fibonacci.o
command in the 'cmd' window. We shall get something like this:

The dump of fibonacci.o using GCC

The result of the is as follows:

fibonacci.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 
: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 48 sub $0x48,%rsp 8: 48 c7 45 e0 0a 00 00 movq $0xa,-0x20(%rbp) f: 00 10: 48 c7 45 f8 01 00 00 movq $0x1,-0x8(%rbp) 17: 00 18: 48 c7 45 f0 01 00 00 movq $0x1,-0x10(%rbp) 1f: 00 20: 48 8d 04 25 00 00 00 lea 0x0,%rax 27: 00 28: 48 89 c1 mov %rax,%rcx 2b: e8 00 00 00 00 callq 30 30: 48 8b 45 f8 mov -0x8(%rbp),%rax 34: 48 89 c2 mov %rax,%rdx 37: 48 8d 04 25 13 00 00 lea 0x13,%rax 3e: 00 3f: 48 89 c1 mov %rax,%rcx 42: e8 00 00 00 00 callq 47 47: 48 c7 45 e8 02 00 00 movq $0x2,-0x18(%rbp) 4e: 00 4f: eb 34 jmp 85 <.J2> 0000000000000051 <.J1>: 51: 48 8b 45 f0 mov -0x10(%rbp),%rax 55: 48 89 c2 mov %rax,%rdx 58: 48 8d 04 25 13 00 00 lea 0x13,%rax 5f: 00 60: 48 89 c1 mov %rax,%rcx 63: e8 00 00 00 00 callq 68 <.J1+0x17> 68: 48 8b 45 f0 mov -0x10(%rbp),%rax 6c: 48 89 45 d8 mov %rax,-0x28(%rbp) 70: 48 8b 45 f8 mov -0x8(%rbp),%rax 74: 48 01 45 f0 add %rax,-0x10(%rbp) 78: 48 8b 45 d8 mov -0x28(%rbp),%rax 7c: 48 89 45 f8 mov %rax,-0x8(%rbp) 80: 48 83 45 e8 01 addq $0x1,-0x18(%rbp) 0000000000000085 <.J2>: 85: 48 8b 45 e8 mov -0x18(%rbp),%rax 89: 48 3b 45 e0 cmp -0x20(%rbp),%rax 8d: 7e c2 jle 51 <.J1> 8f: 48 c7 c0 0a 00 00 00 mov $0xa,%rax 96: 48 83 c4 48 add $0x48,%rsp 9a: 5d pop %rbp 9b: c3 retq 9c: 90 nop 9d: 90 nop 9e: 90 nop 9f: 90 nop

Note that we can see in the first column the relative memory address of each instruction, followed by the machine code of the corresponding instruction.

In order to organize a loop in an assembly program we need both comparison and control instructions. Let us review some of them via examples:

Note that after the operating codes of some isntructions we can use certain suffixes to indicate the length of the operands.

suffix length of operands example(s)
-b byte movb $5, -1(%rbp)
movl -1(%rbp), %al
-w word
(2 bytes = 16 bits)
movw $5, -2(%rbp)
movw -2(%rbp), %ax
-l doubleword
(4 bytes = 32 bits)
movl $5, -4(%rbp)
movl -4(%rbp), %eax
-q quadword
(8 bytes = 64 bits)
movq $5, -8(%rbp)
movq -8(%rbp), %rax
leaq pattern, %rcx

Note that in the C programming language we can use the following format specifiers of the printf() function: %hi (for short integers), %d or %i (for doubleword integers), %ld (for quadword integers).

Simple examples of using suffixes

(1) Using signed bytes (or characters):

.globl	main

.data
pattern:
	.ascii "a=%hi, b=%hi, a+b=%hi\12\0"

.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movb	$5, -1(%rbp)	# a
	movb	$-8, -2(%rbp)	# b
	movb	-1(%rbp), %al
	addb	-2(%rbp), %al
	cbtw
	movw	%ax, %r9w	# a+b
	
	movb	-1(%rbp), %al
	cbtw
	movw	%ax, %dx	# a

	movb	-2(%rbp), %al
	cbtw
	movw	%ax, %r8w	# b

	leaq	pattern, %rcx
	call	printf

	movq	$0, %rax
	addq	$48, %rsp
	popq	%rbp
	ret

Note that the assembly instruction cbtw converts the 'al' register containing a signed 8-bit integer value to the word-length 'ax' register.

(2) Using short integers:

.globl	main

.data
pattern:
	.ascii "a=%hi, b=%hi, a+b=%hi\12\0"

.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movw	$5, -2(%rbp)	# a
	movw	$-8, -4(%rbp)	# b
	movw	-2(%rbp), %ax
	addw	-4(%rbp), %ax

	movw	-2(%rbp), %dx	# a
	movw	-4(%rbp), %r8w	# b
	movw	%ax, %r9w	# a+b
	leaq	pattern, %rcx
	call	printf

	movq	$0, %rax
	addq	$48, %rsp
	popq	%rbp
	ret

(3) Using doubleword integers:

.globl	main

.data
pattern:
	.ascii "a=%d, b=%d, a+b=%d\12\0"

.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	$5, -4(%rbp)	# a
	movl	$-8, -8(%rbp)	# b
	movl	-4(%rbp), %eax
	addl	-8(%rbp), %eax

	movl	-4(%rbp), %edx	# a
	movl	-8(%rbp), %r8d	# b
	movl	%eax, %r9d	# a+b
	leaq	pattern, %rcx
	call	printf

	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret

(4) Using quadword integers:

.globl	main

.data
pattern:
	.ascii "a=%ld, b=%ld, a+b=%ld\12\0"

.text

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movq	$5, -8(%rbp)	# a
	movq	$-8, -16(%rbp)	# b
	movq	-8(%rbp), %rax
	addq	-16(%rbp), %rax

	movq	-8(%rbp), %rdx	# a
	movq	-16(%rbp), %r8	# b
	movq	%rax, %r9	# a+b
	leaq	pattern, %rcx
	call	printf

	movq	$0, %rax
	addq	$48, %rsp
	popq	%rbp
	ret

It is important to study very carefully in the simple examples presented above the correct use of suffixes for the assembly instructions as well as the format specifiers used in the printf() function.

Simple examples for demonstrating comparisons and jumps

First, let us see a simple assembly program that subtracts two (doubleword) integers and prints the result of the subtraction using a function 'disp'. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad comp.s command, and create a new file named 'comp.s' with the following content:

.globl	main
.globl	disp

.data
pattern:
	.ascii "%d-%d=%d\12\0"

.text
disp:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	%edx, -4(%rbp)
	movl	%r8d, -8(%rbp)
	movl	-4(%rbp), %eax
	subl	-8(%rbp), %eax
	movl	%eax, -12(%rbp)
	movl	-12(%rbp), %r9d
	leaq	pattern, %rcx
	call	printf

	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	$10, %edx
	movl	$4, %r8d
	call	disp

	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the compiled program as follows:

Compile and run comp.s using GCC

Nw let us create an assembly program that compares two (doubleword) integers and prints both the result of the subtraction and the values of the OF, SF and ZF flags which are set by the 'cmp' instruction. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad comp1.s command, and create a new file named 'comp1.s' with the following content:

.globl	main
.globl	disp

.data
p1:
	.ascii "%d-%d=%d\12\0"
p2:
	.ascii "OF=%d, SF=%d, ZF=%d\12\0"

.text
disp:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	%edx, -4(%rbp)
	movl	%r8d, -8(%rbp)
	movl	-4(%rbp), %eax
	subl	-8(%rbp), %eax
	movl	%eax, -12(%rbp)
	movl	-12(%rbp), %r9d
	leaq	p1, %rcx
	call	printf

	movl	$1, %edx	# OF=1 by default
	movl	$1, %r8d	# SF=1 by default
	movl	$1, %r9d	# ZF=1 by default
	movl	-4(%rbp), %eax
	cmpl	-8(%rbp), %eax
	jz	.c1
	movl	$0, %r9d	# ZF=0
.c1:
	js	.c2
	movl	$0, %r8d	# SF=0
.c2:
	jo	.c3
	movl	$0, %edx	# OF=0

.c3:
	leaq	p2, %rcx
	call	printf

	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	$-10, %edx
	movl	$4, %r8d
	call	disp

	movl	$0, %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the compiled program as follows:

Compile and run comp1.s using GCC

Note that a cmp op1, op2 instruction calculates the op2−op1 subtraction in the background, and sets the flags accordingly.

Trying different integer values to be subtracted in the 'main' section of the comp1.s assembly program, we can get the following results:

Subtraction (b−a=c) OF SF ZF comp a,b
10−4=4 0 0 0 OF=0 & SF=0 ⇒ c>0 ⇒ b>a
−10−4=−14 0 1 0 OF=0 & SF=1 ⇒ c<0 ⇒ b<a
4−4=0 0 0 1 ZF==0 ⇒ c=0 ⇒ b=a
−2147483648−1=2147483647(1) 1 0 0 OF=1 & SF=0 ⇒ b<a
0−(−2147483648)=−2147483648(2) 1 1 0 OF=1 & SF=1 ⇒ b>a

Remarks:
(1) −2147483648=−231 is the least possible negative integer that can be represented in the two's complement representation of integers in 32 bits. (When creating the assembly code it corresponds to the $0x80000000 hexadecimal value).
(2) +2147483647=231−1 is the least possible positive integer that can be represented in the two's complement representation of integers in 32 bits.
Note that when overflow occurs, the result of the subtractions cannot be formally correct.

Summarizing all that:

Listing the first 10 natural numbers

First, let us see a C program that prints the first 10 natural numbers (starting with 1, then 2, 3, 4, ..., 10). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad natural.c command, and create a new file named 'natural.c' with the following content:

#include <stdio.h>

int main() {
 int x=1;
 int i, n=10;
 for(i=1;i<=n;i++) {
  printf("%d\n",x);
  x++;
  }
 return i;
 }

Compile, link and run the compiled C program as follows:

Compile and run natural.c using GCC

Now it can be very instructive to see the compiled assembly version of the C program. Type and run in the 'cmd' window the gcc natural.c -S -o nat.s command. After making some changes (omitting some parts, commenting some of the instructions etc.), the resulting file will look like something like this:

.data
.pattern:
	.ascii "%d\12\0"

.text
printf:
	pushq	%rbp
	pushq	%rbx		# callee saved
	subq	$56, %rsp
	leaq	48(%rsp), %rbp
/* 56=48+8; pushing %rbx allocates +8 bytes at the stack */

	movq	%rcx, 32(%rbp)	# 4th argument stored
	movq	%rdx, 40(%rbp)	# 3rd argument stored
	movq	%r8, 48(%rbp)	# 5th argument stored
	movq	%r9, 56(%rbp)	# 6th argument stored

	leaq	40(%rbp), %rax
	movq	%rax, -16(%rbp)	# local variable
	movq	-16(%rbp), %rbx
	movl	$1, %ecx
	movq	__imp___acrt_iob_func(%rip), %rax
	call	*%rax

	movq	%rax, %rcx
	movq	32(%rbp), %rax
	movq	%rbx, %r8
	movq	%rax, %rdx
	call	__mingw_vfprintf
	movl	%eax, -4(%rbp)
	movl	-4(%rbp), %eax

	addq	$56, %rsp
	popq	%rbx
	popq	%rbp
	ret

.text
.globl	main
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp	# stack frame (48 byte)
	call	__main

	movl	$1, -4(%rbp)	# variable x
	movl	$10, -12(%rbp)	# variable n
	movl	$1, -8(%rbp)	# variable i
	jmp	.L4

/* begin of loop */
.L5:
	movl	-4(%rbp), %eax	# variable x to print
	movl	%eax, %edx	# 3rd parameter for printf
	leaq	.pattern, %rax	# address of .pattern
	movq	%rax, %rcx	# 4th parameter for printf
	call	printf

	addl	$1, -4(%rbp)	# x++
	addl	$1, -8(%rbp)	# i++
.L4:
	movl	-8(%rbp), %eax	# i → %eax
	cmpl	-12(%rbp), %eax	# %eax≤-12(%rbp) ?
	// -12(%rbp) is a reference to variable n
	jle	.L5		# jump if i≤n
/* end of loop */

	movl	-8(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Before the 'printf' function is called and the local variables are "declared" (i.e. before the 'jmp .L4' instruction), the content of the stack frame created by the 'main' function is as follows:

address content
0(%rsp) -48(%rbp) allocated space for the four parameters (or arguments) for the function 'printf'
8(%rsp) -40(%rbp)
16(%rsp) -32(%rbp)
24(%rsp) -24(%rbp)
(not used)
-12(%rbp) variable n (initially n=10)
-8(%rbp) variable i (initially i=1)
-4(%rbp) variable x (initially x=1)
48(%rsp) 0(%rbp) previous value of %rbp (pushed by the first instruction of main)
8(%rbp) return address for the caller of 'main' (for 'ret' in main)

After the 'printf' function is called for the first time by the 'main' function, the content of the stack frame created by the 'printf' function is as follows:

address content
0(%rsp) -48(%rbp)
... ... ...
48(%rsp) 0(%rbp)
56(%rsp) 8(%rbp) previous value of %rbx (pushed by the second instruction of printf)
16(%rbp) previous value of %rbp (pushed by the first instruction of printf)
24(%rbp) return address for the caller of 'printf', i.e. the address of the next instruction of 'main' after the 'call printf' instruction
(the following part of the stack is the same space for the arguments (or parameters) of the 'printf' function that has been allocated by the 'main' function, see the top of its stack frame in the table above)
32(%rbp) 4th argument of the function 'printf'
40(%rbp) 3rd argument of the function 'printf'
48(%rbp) 5th argument of the function 'printf'
56(%rbp) 6th argument of the function 'printf'

Let us now create an equivalent of the 'natural.c' program in Intel x86/x64 assembly language which prints the first 10 natural numbers. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad natural.s command, and create a new file named 'natural.s' with the following content:

.globl	main

.data 
msg:
	.ascii "The first %d natural numbers:\12\0"

pattern:
	.ascii "%d\12\0"

.text
main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	$1, -4(%rbp)	# x
	movl	$10, -12(%rbp)	# n
	movl	$1, -8(%rbp)	# i

	leaq	msg, %rcx
	movl	-12(%rbp), %edx
	call	printf

.L0:	
	movl	-8(%rbp), %eax
	cmpl	-12(%rbp), %eax	# i>n ?
	jg	.L1

	leaq	pattern, %rcx
	movl	-4(%rbp), %edx
	call	printf
	incl	-4(%rbp)
	incl	-8(%rbp)
	jmp	.L0

.L1:
	movl	-8(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run natural.s using GCC

Listing the first 10 powers of 2

First, let us see a C program that prints the first 10 powers of 2 (starting with 1, then 2, 4, 8 etc.). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad powers.c command, and create a new file named 'powers.c' with the following content:

#include <stdio.h>

int nextpow(int x) {
 int p=x+x;
 return p;
 }

int main() {
 int x=1;
 int i=1, n=10;
 do {
  printf("%d\n",x);
  x=nextpow(x);
  i++;
  } while(i<=n);
 return i;
 }

Compile, link and run the compiled C program as follows:

Compile and run powers.c using GCC

Let us now create an equivalent of the 'powers.c' program in Intel x86/x64 assembly language which prints the first 10 powers of 2. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad powers.s command, and create a new file named 'powers.s' with the following content:

.globl	main

.data
pattern:
	.ascii "%d\12\0"

.text
nextpow:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$16, %rsp	# stack frame size
	movl	%ecx, 16(%rbp)	# parameter x
	movl	16(%rbp), %eax
	addl	%eax, %eax
	movl	%eax, -4(%rbp)	# local variable p
	movl	-4(%rbp), %eax
	addq	$16, %rsp
	popq	%rbp
	ret

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp
	movl	$1, -4(%rbp)	# variable x
	movl	$1, -8(%rbp)	# variable i
	movl	$10, -12(%rbp)	# variable n

.loop:
	movl	-4(%rbp), %edx
	leaq	pattern, %rcx
	call	printf
	movl	-4(%rbp), %ecx
	call	nextpow
	movl	%eax, -4(%rbp)
	addl	$1, -8(%rbp)	# i++
	movl	-8(%rbp), %eax
	cmpl	-12(%rbp), %eax	# i<=n ?
	jle	.loop

	movl	-8(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Compile, link and run the assembly program as follows:

Compile and run powers.s using GCC

Listing the first 10 factorials

First, let us see a C program that prints the first 10 factorials (starting with 1, then 2, 6, 24 etc.). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fact.c command, and create a new file named 'fact.c' with the following content:

#include <stdio.h>

int f(int n) {
 int temp;
 temp=1;
 for(int i=2;i<=n;i++) {
  temp=temp*i;
  }
 return temp;
 }

int main() {
 int n=10;
 printf("List of the first %d factorials:\n",n);
 int i=1;
 while(i<=n) {
  printf("%d\n",f(i));
  i=i+1;
  };
 return i;
 }

Compile, link and run the compiled C program as follows:

Compile and run powers.c using GCC

Let us now create an equivalent of the 'fact.c' program in Intel x86/x64 assembly language which prints the first 10 factorials. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fact.s command, and create a new file named 'fact.s' with the following content:

.globl	main
.globl	f

.data
.LC0:
	.ascii "List of the first %d factorials:\12\0"
.LC1:
	.ascii "%d\12\0"

.text

f:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$16, %rsp

	movl	%ecx, 16(%rbp)
	movl	$1, -4(%rbp)
	movl	$2, -8(%rbp)
	jmp	.L4
.L5:
	movl	-4(%rbp), %eax
	imull	-8(%rbp), %eax
	movl	%eax, -4(%rbp)
	addl	$1, -8(%rbp)
.L4:
	movl	-8(%rbp), %eax
	cmpl	16(%rbp), %eax
	jle	.L5

	movl	-4(%rbp), %eax
	addq	$16, %rsp
	popq	%rbp
	ret

main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$48, %rsp

	movl	$10, -8(%rbp)
	movl	-8(%rbp), %eax
	movl	%eax, %edx
	leaq	.LC0, %rax
	movq	%rax, %rcx
	call	printf

	movl	$1, -4(%rbp)
	jmp	.L8
.L9:
	movl	-4(%rbp), %eax
	movl	%eax, %ecx
	call	f
	movl	%eax, %edx
	leaq	.LC1, %rax
	movq	%rax, %rcx
	call	printf
	addl	$1, -4(%rbp)
.L8:
	movl	-4(%rbp), %eax
	cmpl	-8(%rbp), %eax
	jle	.L9

	movl	-4(%rbp), %eax
	addq	$48, %rsp
	popq	%rbp
	ret

Note that the imul assembly instruction executes a signed multiplication or product of the first operand (which can be either a register or a word-length or doubleword-length memory content) and the second operand (a register), and stores the resulting product in the the register specified as the second operand.

Compile, link and run the assembly program as follows:

Compile and run fact.s using GCC



Review questions and exercises

Questions:

Create programs in x86/x64 assembly language which performs the following tasks:

Each program should start with printing its function and end with the name of the programmer (as well as the current date).



Interrupts and the extension of the fetch-execute cycle (cf. Stallings 2018: 35-40)

Essentially all computers provide a mechanism by which other modules, processes or events may interrupt the normal sequencing of the processor. The table below lists the most common classes of interrupts.

Main classes of interrupts
Programs e.g. arithmetic overflow, division by zero, attempt to execute an illegal machine instruction, or reference outside the program's allowed memory space etc.
Timer e.g. end of the time slice allowed for the program (task, process) to run
I/O e.g. sending an I/O request; receiving a signal generated by an I/O controller to indicate the normal completion of an I/O operation or the occurrence of an I/O error
Hardware failure e.g. memory parity error

Why are interrupts important?

Interrupts are provided primarily as a way to improve processor utilization. For example, most I/O devices are much slower than the processor. When an I/O operation initiated, the processor must wait until the operation is completed to continue with the execution of the next instruction of a sequential program.

Suppose that the processor is transferring data to a printer using the simple fetch-execute instruction cycle scheme. After each write operation, the processor must pause and remain idle until the printer catches up. The length of this pause may be on the order of many thousands or even millions of instruction cycles. Clearly, this is a very wasteful use of the processor.


(1) Let us suppose that a user program calls an I/O routine that performs the requested I/O operation.

Performing an I/O call

In the example above the sequential execution of the user program follows the 1 4 5 2 4 5 3 stages where the 4 5 stages correspond to the called I/O routine. During the I/O command, which presumably takes some time, without the interrupting the user program, the processor must wait until the operation is successfully completed.

Without an interrupt mechanism, the program that waits for the I/O device to perform the requested function should periodically check (or poll) the status of the I/O device. In other words, the waiting program repeatedly performs a test operation to determine if the I/O operation is done. When the I/O operation is completed, it sets a flag indicating the success or failure of the operation. The change of the status flag tells the program that the I/O operation is completed and so the execution of the program can be continued.

In parallel processing, the synchronization of the execution of concurrent processes can normally be implemented using wait/signal semaphores to temporarily suspend and resume the execution of the concurrent processes.


(2) With interrupts, the processor can be engaged in executing instructions from another program (or process) while the requested I/O operation is in progress. Let us suppose that there are two user programs to be executed, and currently the first user program (denoted by program(1)) is running, and the second user program (denoted by program(2)) is waiting to be executed.

When the user program(1) reaches a point at which an I/O operation should be executed, it makes a system call. The I/O program (called an interrupt handler) that is invoked in this case consists only of some preparation code and the actual I/O command. After these few instructions have been executed, the control transfers to the second user program(2) while the first, interrupted program(1) is temporarily suspended waiting for the invoked I/O operation to be completed. Meanwhile the addressed external device is busy (e.g. accepting data from computer memory, processing it etc.), and the requested I/O operation is conducted concurrently with the execution of instructions from the second user program(2).

When the invoked I/O operation is complete, the I/O module for the external device sends an interrupt request signal to the processor. If the interruption of the currently running second program(2) is enabled, the processor responds to the interrupt request signal by suspending the operation of the second program(2), executing again some preparation code, and continuing the interrupted user program(1) immediately after the instruction which invoked the now successfully completed I/O operation.

For the user program, an interrupt suspends the normal sequence of execution. When the interrupt processing is completed, execution resumes. Thus, the user program does not have to contain any special code to accommodate interrupts; the processor and the OS are responsible for suspending the user program, then resuming it at the same point.

To accommodate interrupts, a new stage called an interrupt stage is added to the instruction cycle:

Instruction Cycle with Interrupts

In the interrupt stage, the processor checks to see if any interrupts have occurred, indicated by the presence of an interrupt signal. If no interrupts are pending, the processor restart the fetch-execute stage and fetches the next instruction of the current program. If an interrupt is pending, the processor suspends execution of the current program and executes an interrupt-handler routine.

The interrupt-handler routine is generally part of the OS. Typically, this routine determines the nature of the interrupt and performs whatever actions are needed. In the above example, the handler determines which I/O module generated the interrupt, and which program is waiting for the answer of that I/O module. When the interrupt-handler routine is completed, the processor can resume execution of the interrupted user program at the point of interruption.

It is clear that there is some overhead involved in this process. For example, extra instructions must be executed (in the interrupt handler) to determine the nature of the interrupt and to decide on the appropriate action etc. Nevertheless, because of the relatively large amount of time that would be wasted by simply waiting on an I/O operation, the processor can be employed much more efficiently with the use of interrupts.



Interrupt processing (cf. Stallings 2018: 41-45)

An interrupt triggers a number of events, both in the processor hardware and in software.

Simple Interrupt Processing


Let us examine in detail the interrupt processing mechanism after an I/O operation has been initiated by a user program. The addressed I/O device does what it has to do as a parallel background process, and when it completes the requested I/O operation, generally the following sequence of hardware events occurs:

  1. The device issues an interrupt signal to the processor (indicating that the operation is completed, e.g. some data have been stored, retrieved, printed etc.). It will be pending as an interrupt request waiting for the processor to acknowledge it.
  2. The processor finishes execution of the current instruction in the fetch-execute cycle before responding to the interrupt (i.e. before proceeds to the third stage of the instruction cycle).
  3. If interrupts are enabled, the processor tests for a pending interrupt request. When there is one (e.g. because of event 1), the processor sends an acknowledgment signal to the device that issued the interrupt. The acknowledgment allows the device to remove its interrupt signal.
  4. The processor next prepares to transfer control to the interrupt routine or handler. To begin, it saves information needed to resume the current program at the point of interrupt. The minimum information required is the program status word (PSW) and the location of the next instruction to be executed, which is contained in the program counter (PC). These data will be stored in a designated memory area of the operating system.
  5. The processor then loads the program counter with the entry location of the interrupt-handling routine (i.e. the address of the first instruction of the handler to be executed). The purpose of the interrupt handler is to respond to the interrupt (indicated by the interrupt signal, see event 1).
    In order to handle the possible interrupts, depending on the computer architecture and OS design, the implementation of the interrupt handling routines may be
    – either a single program, i.e. one program for each type of interrupt,
    – or a number of separate programs, one for each device and each type of interrupt.
    If there is more than one interrupt-handling routine, the processor must determine which one to invoke. This information may have been included in the original interrupt signal, or the processor may have to issue a request to the device that issued the interrupt to get a response that contains the needed information.

Simple Interrupt Processing

Note that more or less the same mechanism occurs when a user program initiates an I/O operation issuing a system call (often called 'trap' e.g. in the Intel x64 architecture). One of the differences is that hardware events occur asynchronously but system calls are part of the normal synchronous process of program execution (i.e. the I/O operation to be performed in a parallel, separate background process is initiated by the user program itself). Therefore in the case of system calls only event 4 should be implemented by the called software routine before interrupting the user program by running the appropriate interrupt routine (in event 5) which then transfers control (or switches) to a new process, e.g. to another user program (in event 9, see later).

Once the program counter has been loaded, the processor proceeds to the the next instruction cycle, which begins with an instruction fetch. Because the instruction fetch is determined by the contents of the program counter, the control is transferred to the interrupt-handler program.


The execution of the interrupt handler results in the following operations:

  1. At this point, the program counter and PSW relating to the interrupted program have been saved on the control stack. However, there is other information that is considered part of the state of the executing program. In particular, the contents of the processor registers need to be saved, because these registers may be used by the interrupt handler. So all of these values, plus any other state information, need to be saved. Typically, the interrupt handler will begin by saving the contents of all registers on the stack.
  2. The interrupt handler may now proceed to process the interrupt. This includes an examination of status information relating to the I/O operation or other event that caused an interrupt. It may also involve sending additional commands or acknowledgments to the I/O device.
  3. When interrupt processing is complete, the handler (or the dedicated routine of the OS) selects one of the previously interrupted programs and prepares to switch control to it. First the saved register values of the selected program are retrieved from the stack and restored to the registers.
  4. The final act is to restore the PSW and program counter values of the selected program. As a result, the next instruction to be executed will be from the selected and previously interrupted program.

Simple Interrupt Processing

It is important to save all of the state information about the interrupted programs for later resumption. This is because the interrupt is not a routine called from the program. Rather, the interrupt can occur at any time, and therefore at any point in the execution of a user program. Its occurrence is unpredictable (i.e. it is an asynchronous event which occurs anytime during the synchronized process of the fetch-execute cycle).

Finally, let us discuss briefly the case of multiple interrupts. Suppose that one or more interrupts occur while an interrupt is being processed. Two approaches can be taken to dealing with multiple interrupts.

In the following, we shall have an overview of the Windows operating system which implements all the features we have discussed before.



Boda István, 2025.