Recommended reading:
Jim Ledin: Modern Computer Architecture and Organization.
Birmingham – Mumbai: Packt, 2020.
Wikipedia, selected entries. (2025-09-28)
Computers have become an integral part of our daily lives. They power everything from smartphones to hospital systems and have shaped society to such an extent that many people simply couldn't live without the hardware and software that defines the digital world.
Despite this, the majority of people still have no idea how computers work and the role of hardware and software in powering the modern technologies we use today.
Behind the sleek* screens and intricate interfaces, computer architecture forms the fundamental components and processes that make our computers tick. (Stewart 2005)
Computer architecture (CA) is the structure of a computer system made from component parts.⇒ At the highest level, the computer can be considered as a black box,⇒ while at the lowest level as a complex network of physical components like combinational and sequential circuits and logic gates.⇒
At each level, CA describes the internal organization of a computer in an abstract way that ignores details of the implementation at the lower level. At the highest level, CA defines the capabilities of the computer (from the user's viewpoint) and its programming model (from the programmer's viewpoint).
CA is the science and art of designing computers by defining the functional behavior and organization of hardware components like the CPU, memory, storage and I/O devices etc., including how they interact. CA establishes*
– the Instruction Set Architecture (ISA);
– the Microarchitecture;
– the Hardware System Architecture (HSA);
– the Macroarchitecture.
Key Components of Computer Architecture
- Instruction Set Architecture (ISA):
ISA specifies the machine's low-level programming interface at an abstract level (i.e. in a hardware-independent way).⇒ ISA defines the set of instructions a computer's processor (CPU) can execute. It acts as an interface between the software and the hardware, specifying the registers, memory addressing modes, data formats and operations. Instead of machine codes which are CPU-dependent it uses symbolic mnemonic codes and syntax. They are fully described in assembly language which provides a rather convenient representation of machine-code programs in human-readable terms.⇒- Microarchitecture:
Microarchitecture involves the detailed design and organization of how the CPU's functional units, memory hierarchy, control structures etc. are built and interconnected to implement the ISA.⇒ This level includes details like the size of caches, the organization of the CPU's pipelines, the data flow etc.
- Logic design:
Logic design is the lower-level implementation of the high-level concepts defined in the microarchitecture, focusing on the actual circuitry required to make the microarchitecture function. Logic design creates the specific logic-gate-level circuits and blocks (such as the arithmetic logic unit etc.) that implement the functions specified by the microarchitecture.- Hardware System Architecture (HSA):
HSA covers the functional organization of the major hardware subsystems of the computer system, including the CPU, memory, storage units, input/output (I/O) devices and interfaces, etc., and how they communicate through buses and control signals. Hardware architecture design is a fundamental part of systems design focusing on the blueprint* of a computer system's physical components, how these hardware elements are interconnected, their specifications, and their interactions to meet performance, reliability, and scalability requirements. Thus HSA forms the underlying foundation for the entire functionality of the computer system. programmer-visible macroarchitecture- Macroarchitecture:
The macroarchitecture is the "visible" layer of the computer system. It reflects the user's or the programmer's view of the system. They both can interact with the computer and depend on its behavior, but they don't really need to know how it's internally constructed.
The macroarchitecture consists of two things: the user-visible macroarchitecture and the programmer-visible macroarchitecture.
- The programmer-visible macroarchitecture includes high level programming languages and tools (such as compilers, interpreters, integrated development environments (IDE), development utilities and tools, software libraries, etc.) in order to provide a consistent interface to programmers. It directly affects the programmers' ability to write effective and robust applications for the computer.
- The user-visible macroarchitecture refers to the high-level structure and design of a computer system that is visible to the user, defining the interface and behavior without the intricate details of its internal implementation. This includes the hardware and software components available to the user to do specific tasks, and the interactions of these components in a large software system (e.g. in cloud computing which allows the users to access and manage their data and services from the internet, rather than relying on the local storage of a single computer).
Why Computer Architecture Matters
- Software Compatibility:
A consistent ISA ensures that the software written for that architecture can run on different computer implementations. You can have two computers that have been constructed by different companies, in different ways, with different technologies etc. but with the same architecture.- Performance and Efficiency:
CA directly impacts a computer's basic characteristics, making it crucial for designing efficient systems. For example, it highly influences the computer's speed and power, reliability, functionality, power consumption, etc., and the software that runs on it. Another characteristic of an efficient computer system is scalability which establishes* the ability of the system to handle a growing amount of workload, by usually adding extra resources to the system to maintain performance.- Innovation and Development:
Understanding architecture helps in designing specialized hardware for specific purposes (such as machine learning, neural networks, pattern recognition, artificial intelligence (AI) etc.), and creating new computing solutions.Vocabulary:
Pronunciation symbols:
ae as in cap [kaep], hat [haet], valid [vaelid]
oe as in beggar [begoer], altar [awltoer], signal [sign(oe)l]
aw as in all [awl], fall [fawl], law [law], raw [raw]
ow as in how [how], howl [howl], power [powoer]
uo as in poor [puor], cure [kyuor], pure [pyuor]establish [i staeblish] = to build or bring into being sth on a stable basis (Webster 2009)
syn/rel: ground, base, be the basis forScalability establishes the ability of the system to handle a growing amount of workload.
blueprint [blu:print] = a design plan or other technical drawing (e.g. a system diagram, a data flow diagram⇒ etc.)
sleek [sli:k] = having a smooth attractive shape (Longman 2009); finely contoured [kontuord]; streamlined (Webster 2009)
a sleek computer screen
References:
AI about "computer architecture" Google Search. (2025-09-11)
Stewart, Ellis 2025. What is Computer Architecture? Definition, Types, Structure.
https://em360tech.com/tech-articles/what-computer-architecture-definition-types-structure (2025-09-11)Illingworth, Valerie – Pyle, Ian 1996-1997. A Dictionary of Computing. Oxford – New York etc.: Oxford University Press.
Ledin, Jim 2020. Modern Computer Architecture and Organization. Birmingham – Mumbai: Packt.
Stallings, William 2018. Operating Systems. Internals and Design Principles. Edinburgh: Pearson, 2018.
Wikipedia entries: Computer Architecture etc.
Brief overview of computer system hardware (cf. Stallings 2018: 30-32)
In general, a computer consists of a processor, a main memory, and several input-output (I/O) components.
![]()
Block diagram of a computer with uniprocessor CPU
(black lines indicate the flow of control signals, whereas red lines indicate the flow of processor instructions, address information and data. Arrows indicate the direction of flow)
![]()
Single system bus architecture
- The processor or central processing unit (CPU) controls the operation of the computer and performs a few data processing functions. To achieve these purposes, the CPU has a few cooperating components (e.g. some special-purpose and general-purpose registers, the control unit, the arithmetic and logic unit etc.).
Its main purpose is to execute the machine-level instructions of programs.
- Each processor has a dedicated instruction set which contains all the machine-level instructions that a given processor can execute.
- The processor contains some registers for its operation. For example,
- the program counter (PC) or instruction pointer (IP) specifies (holds, "points to") the address of the next instruction of the currently running program to be executed;
- the instruction register (IR) holds the instruction of the currently running program to be decoded and executed.
- The processor has a special unit called execution unit (EU) containing, among others, an arithmetic-logic unit (ALU), a floating-point unit (FPU), some general-purpose registers (e.g. the accumulator, AC or AX) and other, special-purpose registers (e.g. the stack pointer, SS). For examle,
The EU is responsible to execute arithmetic and logic operations.
- some registers of the i8086 processor are illustrated here
.)
- the available registers for the assembly programs in the Intel x86/x64 architecture are summarized here⇒
- Finally, the control unit (CU) of the processor manages (or directs, controls) the overall operation of the processor. It is responsible for
- The main memory stores data and programs, or more precisely, those instructions which make up the currently running programs. The main memory is typically volatile, that is, when the computer is shut down, the contents of the memory are lost. (In contrast with the contents of non-volatile memories which are retained, that is, permanently stored, even when the computer is shut down.)
- A memory module consists of a set of memory cells or memory locations, defined by sequentially numbered physical addresses. Each address refers to a location or a group of locations that contain a sequence of bits that can be interpreted as either a machine-level instruction or a certain type of data.
- The processor and the main memory form the central unit of the computer.
- The main function of input-output (I/O) modules is to send, receive, store, display, print etc. data moved, for the most part, between the central unit of the computer and an I/O device. The I/O modules include a great variety of devices, e.g. a monitor, a keyboard, secondary memory devices (e.g. disks), network equipments etc.
- An I/O device usually exchanges data between the central unit of the computer and an (internal or external) buffer memory which temporarily stores data until they can be transferred and processed. A buffer is normally used to accommodate the difference in the rate at which the communicating devices can handle data during the transfer.
- The system bus is responsible for communication among processors, main memory, and I/O modules transferring data, (physical) addresses and control signals.
- In personal computer environment, the system bus is part of the motherboard which also contains the processor, the main memory, and other components (e.g. an interrupt controller, a BIOS or UEFI chip, network adapters, extension slots and cards, USB ports etc.).
The figure above illustrates the logic of the operation of the system bus. The CPU contains some (internal) registers to support data exchange among the CPU, the main memory and the I/O module. These registers and their function are as follows:
- the memory address register (MAR) contains the (physical) address of a specific location in memory for the next read or write operation
- the memory buffer register (MBR) or the memory data register (MDR) contains the data to be written into memory, or receives the data to be read from memory for the next read or write operation
- the I/O address register (I/O AR) specifies the address of a particular I/O device for the next read or write operation
- the I/O buffer register (I/O BR) refers to the data to be written into the I/O device or to be read from the I/O device during the next read or write operation
Execution of instructions (cf. Stallings 2018: 32-35)
A program to be executed by a processor consists of a set of machine-level instructions stored in the memory. In its simplest form, the processing of instructions consists of two basic steps:
– first, the processor reads (or fetches) the instructions from the memory one at a time, and
– second, the processor executes each instruction.
The execution of a program is a repeating process (a cycle or loop) of these two steps: the instruction fetch and the instruction execution. (Note that instruction execution may involve several operations and depends on the nature of the instruction.)The figure below illustrates the instruction cycle:
At the beginning of each instruction cycle, the processor fetches an instruction from memory. In this respect, the program counter (PC) register is of utmost importance: the PC holds the address of the next instruction to be fetched. After the instruction has been fetched, the processor increments the value of the PC so that it will hold the address of the next instruction in the sequence of instructions (i.e. in the program which is currently being executed).
The fetched instruction is loaded into the instruction register (IR). An instruction is normally made up of a combination of an operation code and the specification of the operands that present or refer to the data upon which the operation is to be performed. The operation code of the instruction contains bits that specify the action the processor is to take. The processor (or more specifically, the control unit of the processor) interprets the instruction and performs the required action. In general, these actions fall into four categories:
- Processor-memory: data may be transferred from processor to memory, or from memory to processor.
- Processor-I/O: Data may be transferred to or from a peripheral device by initiating a transfer between the processor and an I/O module.
- Data processing: The processor may perform some arithmetic or logic operation on data.
- Control: An instruction may specify that the sequence of execution should be altered.
The execution of an instruction may involve a certain combination of these actions.
An example of the operation of the fetch-execute cycle (cf. Stallings 2018: 33-35)
Let the memory of a virtual machine be organized with 16-bit length (i.e. word-length) memory cells. Each instruction consists of a 4-bit operation code (opcode) and a 12-bit operand. Note that if the operand contains an address, this allows to directly address a maximum of 212=4096 memory cells.
We shall use four hexadecimal digits to represent the 16-bit (one-word) content of the registers, memory addresses and the content of memory cells. (Note that for the 12-bit long addresses three hexadecimal digits would be enough.) Similarly, we shall use one hexadecimal digit to represent the opcode of each instruction.
In the example we want to add two whole numbers represented by two's complement code. We will use one general-purpose register (the accumulator, AC) and three instructions as follows:
- opcode=1: move data from memory into AC:
AC←M(addr) or MOV M(addr), AC
the 12-bit address of the memory cell involved in the operation is specified by the operand 'addr'- opcode=2: move data from AC into memory:
M(addr)←AC or MOV AC, M(addr)
the memory cell involved in the operation is located at the 12-bit address specified by the operand 'addr'- opcode=5: add the content of the memory cell to the content of the accumulator:
AC←M(addr)+AC; or ADD M(addr), AC
the data to be added to the content of AC is located at the address specified by the operand 'addr'; after the addition, the result will be stored in AC (i.e. the sum overwrites the previous content of AC)We assume that the first instruction to be performed is located at the memory address 300 followed sequentially by the further instructions of the program (located at the addresses 301, 302 etc., respectively). Furthermore, we assume that the data that the program manipulates are stored in the memory locations between addresses 940 and 941.
Memory content Address Content (instructions)
0 3 0 0
1 9 4 0
0 3 0 1
5 9 4 1
0 3 0 2
2 9 4 1 (data)
0 9 4 0
0 0 0 3
0 9 4 1
0 0 0 2 Now let's see how the operation is performed in three fetch-execute cycle.
– In the example we analyze in detail the operation of the fetch-execute cycle. Since the initial value of the program or instruction counter register (PC or IP) is set to location 300, in the first cycle the processor will fetch the instruction at the memory location 300 and then immediately increments the value of PC. On the succeeding instruction cycles, the CPU will fetch instructions from locations 301, 302, and so on. (Note, however, that the sequential execution of instructions can be altered at any time by a certain control instruction.)
– In each cycle the fetched instruction is always loaded into the instruction register (IR). The operation code (opcode) of the instruction will specify the necessary action that the processor is to take. After separating the opcode and the operand, the processor (actually, the control unit) interprets the opcode of the instruction and sends control signals to the appropriate units to perform the required action.
1st. cycle Storage unit Value Comment Fetch stage PC
0 3 0 0 fetch the instruction from M(300) M(300)
1 9 4 0 load the content of M(300) into IR IR
1 9 4 0 interpret the instruction
- opcode=1: move memory data into AC
- operand=940: the data is located at M(940)
PC
0 3 0 1 increment the value of PC with 1 Execute stage: AC←M(940) or MOV AC,M(0940) M(940)
0 0 0 3 load the content of M(940) into AC AC
0 0 0 3 store the content of M(940) in AC
2nd. cycle Storage unit Value Comment Fetch stage PC
0 3 0 1 fetch the instruction from M(301) M(301)
5 9 4 1 load the content of M(300) into IR IR
5 9 4 1 interpret the instruction
- opcode=5: add memory data to AC
- operand=941: the data to be added is located at M(941)
PC
0 3 0 2 increment the value of PC with 1 Execute stage: AC←AC+M(941) or ADD M(0941),AC AC
0 0 0 3 add the content of M(941) to AC M(941)
0 0 0 2 AC
0 0 0 5 store the result of the addition in AC
3rd. cycle Storage unit Value Comment Fetch stage PC
0 3 0 2 fetch the instruction from M(302) M(302)
2 9 4 1 load the content of M(302) into IR IR
2 9 4 1 interpret the instruction
- opcode=2: move the content of AC into a memory cell
- operand=941: the memory cell is located at M(941)
PC
0 3 0 3 increment the value of PC with 1 Execute stage: M(941)←AC or MOV M(0941),AC M(941)
0 0 0 2 move the content of AC into M(941) AC
0 0 0 5 M(941)
0 0 0 5 store the content of AC in M(941)
In this example three instruction cycles were needed, each consisting of a fetch stage and an execute stage. As a result, we added the contents of the memory location 940 to the contents of the memory location 941, and then stored the sum at the memory location 941.
The following figure summarizes the process.
Implementation of the above example in Windows
II.1. Create and compile C files
Table of contents:
- Printing "Hello World!" (hello.c)⇒
- Adding 3+2 (simple.c)⇒
- Creating a batch file to display ERRORLEVEL (err.bat)⇒
- Adding 3+2 with a function (simplef.c)⇒
- Assembly version of 'simplef.c' with quadword-length operands (simplex.s)⇒
- Adding and printing 3+2 (example.c)⇒
- Assembly version of 'example.c' with quadword-length operands (examplex.s)⇒
Printing "Hello World!" (hello.c)
Open a new 'cmd' window in the c:\temp\gcc directory and set the default path running the 'setpath' command (only once). Using the notepad hello.c command, create a new file named 'hello.c' with the following content:
#include <stdio.h> int main() { printf("Hello world!\n"); return 0; }Compile, link and run the C program as follows:
Adding 3+2 (simple.c)
Now let us create another simple C program which implements the former example adding two integers together. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad simple.c command, and create a new file named 'simple.c' with the following content:
#include <stdio.h> int main() { int a=3; int b=2; b=a+b; return 0; }Compile, link and run the C program as follows:
Note that in the 'cmd' window, we can display the returned value of the 'simple.exe' program using the echo %ERRORLEVEL% command.
For the sake of simplicity, let us create a batch file named 'err.bat' using the notepad err.bat command. It is to contain those two lines:
@echo off
echo %ERRORLEVEL%
With that we created a new command called err which will easily display, if entered, the actual value of the ERRORLEVEL environment variable in the 'cmd' window.
It will be instructive for later considerations that using the GCC compiler we can generate easily the assembly code of the 'simple.c' program (as well as any other C programs). For that purpose, we should enter the gcc simple.c -S -o simple.s command in the 'cmd' window.
The generated assembly program is as follows:
.file "simple.c".text.def __main; .scl 2; .type 32; .endef.globl main.def main; .scl 2; .type 32; .endef .seh_proc mainmain: pushq %rbp.seh_pushreg %rbpmovq %rsp, %rbp.seh_setframe %rbp, 0subq $48, %rsp.seh_stackalloc 48 .seh_endprologuecall __mainmovl $3, -4(%rbp) movl $2, -8(%rbp) movl -4(%rbp), %eax addl %eax, -8(%rbp) movl $0, %eax addq $48, %rsp popq %rbp ret.seh_endproc .ident "GCC: (GNU) 13.2.0"The explanation of some important parts of the assembly code:
- the first three bold lines of the 'main' section creates a stack frame⇒ of 48 bytes which is enough to dynamically allocate 12 double-word (i.e. 12*4 byte) length memory space for local variables and parameters
- the push / pop instructions move data to and from a dedicated memory area called stack⇒
- the
movl $3, -4(%rbp)instruction assigns ("moves") '3' as a 4-byte length double-word ("long") value to the local variable at the address [%rbp-4] (which corresponds to the integer type variable 'a' in the simple.c program)- the
movl $2, -8(%rbp)instruction assigns ("moves") '2' as a 4-byte length double-word ("long") value to the local variable at the address [%rbp-8] (which corresponds to the integer type variable 'b' in the simple.c program)- the
movl -4(%rbp), %eaxinstruction assigns ("moves") the content of the double-word length local variable at the address [%rbp-4] to the second half of the quadword (8-byte) length accumulator register (%eax) (and fills the first, "most significant" half of the accumulator register with leading zeros)- the
addl %eax, -8(%rbp)instruction
- first adds the content of the second half of the accumulator register (%eax) to the double-word value of the local variable at the address [%rbp-8],
- then stores the sum in the local variable at the address [%rbp-8] (rewriting its previous content)
- the last two lines of the 'main' section before the 'ret' instruction destroys the stack frame of 48 bytes and restores the previous value of the base pointer (%rbp) register
After such considerations, we can easily create the 'simplex.s' assembly program which contains quadword length operands, and returns the sum of the addition (as an ERRORLEVEL value):
.globl main main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movq $3, -8(%rbp) movq $2, -16(%rbp) movq -8(%rbp), %rax addq %rax, -16(%rbp) # the sum of a and b movq -16(%rbp), %rax # return (ERRORLEVEL) value addq $48, %rsp popq %rbp retCompile, link and run the C program as follows:
Adding 3+2 with a function (simplef.c)
The aim of the 'simple.c' program can also be implemented using a function named 'sum' which adds two integers together. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad simplef.c command, and create a new file named 'simplef.c' with the following content:
#include <stdio.h> int sum(int x,int y) { int temp; temp=x+y; return temp; } int main() { int a=3; int b=2; int c; c=sum(a,b); return c; }Set the default path running the 'setpath' command (remember, only once). Compile, link and run the C program as follows:
In the 'cmd' window, we can display again the returned value of the 'simplef.exe' program using the echo %ERRORLEVEL% command (or running the 'err' batch file).
Adding and printing 3+2 (example.c)
It was not easy to check the 'simple.c' or 'simplef.c' programs because there were no visual output in them. So open a new 'cmd' window again in the c:\temp\gcc directory. Using the notepad example.c command, create a new file named 'example.c' with the following content:
#include <stdio.h> int main() { int a=3; int b=2; int c=a+b; printf("%d + %d = %d\n",a,b,c); return 0; }Set the default path running the 'setpath' command. Compile, link and run the C program as follows:
Like before, we can generate easily the assembly code of the 'example.c' program by entering the gcc example.c -S -o example.s command in the 'cmd' window.
The generated assembly program is as follows:.file "example.c".text.def printf; .scl 3; .type 32; .endef .seh_proc printfprintf: pushq %rbp.seh_pushreg %rbppushq %rbx.seh_pushreg %rbxsubq $56, %rsp.seh_stackalloc 56leaq 48(%rsp), %rbp.seh_setframe %rbp, 48.seh_endprologuemovq %rcx, 32(%rbp) # 4th argument stored movq %rdx, 40(%rbp) # 3rd argument stored movq %r8, 48(%rbp) # 5th argument stored movq %r9, 56(%rbp) # 6th argument storedleaq 40(%rbp), %rax movq %rax, -16(%rbp) movq -16(%rbp), %rbx movl $1, %ecx movq __imp___acrt_iob_func(%rip), %rax call *%rax movq %rax, %rcx movq 32(%rbp), %rax movq %rbx, %r8 movq %rax, %rdx call __mingw_vfprintf movl %eax, -4(%rbp) movl -4(%rbp), %eaxaddq $56, %rsp popq %rbx popq %rbp ret.seh_endproc .def __main; .scl 2; .type 32; .endef.section .rdata,"dr".LC0: .ascii "%d + %d = %d\12\0" .text .globl main.def main; .scl 2; .type 32; .endef .seh_proc mainmain: pushq %rbp.seh_pushreg %rbpmovq %rsp, %rbp.seh_setframe %rbp, 0subq $48, %rsp.seh_stackalloc 48 .seh_endprologue call __mainmovl $3, -4(%rbp) movl $2, -8(%rbp) movl -4(%rbp), %edx movl -8(%rbp), %eax addl %edx, %eax movl %eax, -12(%rbp) movl -12(%rbp), %ecx movl -8(%rbp), %edx movl -4(%rbp), %eax movl %ecx, %r9d movl %edx, %r8d movl %eax, %edx leaq .LC0(%rip), %rax movq %rax, %rcx call printf movl $0, %eax addq $48, %rsp popq %rbp ret.seh_endproc .ident "GCC: (GNU) 13.2.0" .def __mingw_vfprintf; .scl 2; .type 32; .endefBased on the compiled program, we can easily create the 'examplex.s' assembly program which contains only quadword length operands. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad examplex.s command, and create a new file named 'examplex.s' with the following content:
.data .msg: .ascii "%d + %d = %d\12\0" .text .globl main main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movq $3, -8(%rbp) # local variable a movq $2, -16(%rbp) # local variable b movq -8(%rbp), %rdx movq -16(%rbp), %rax addq %rdx, %rax movq %rax, -24(%rbp) # local variable c movq -24(%rbp), %rcx movq -16(%rbp), %rdx movq -8(%rbp), %rax movq %rcx, %r9 # 6th argument (var c) movq %rdx, %r8 # 5th argument (var b) movq %rax, %rdx # 3rd argument (var a) leaq .msg, %rax movq %rax, %rcx # 4th argument (pattern .msg) call printf movq $0, %rax addq $48, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Implementation of the above example in Windows
II.2. Create and compile assembly files
Table of contents:
Printing "Hello World!" (asmh.s)
Now let us create a simple program in Intel x86/x64 assembly language which displays the well-known 'Hello world!' message. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad asmh.s command, and create a new file named 'asmh.s' with the following content:
.globl main // definitions of constants and variables .data hello: .ascii "Hello world!\12\0" // program instructions (code) .text main: pushq %rbp movq %rsp, %rbp subq $32, %rsp leaq hello, %rax /* setting the parameter for the function 'printf' */ movq %rax, %rcx # address of 'hello' call printf /* displayed 'Hello world!' */ movl $0, %eax # set ERRORLEVEL value addq $32, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Note that the size of the stack frame is 32 bytes, even though there are no local variables in the program. The four quadwords allocated at the top of the stack frame can be used for the (possible) parameters of the 'printf' function.Setting the ERRORLEVEL (abc.s)
After we have successfully created and compiled the 'asmh.s' program, let us create another simple program in Intel x86/x64 assembly language which does nothing except returns the value 10 as an ERRORLEVEL value.
Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abc.s command, and create a new file named 'abc.s' with the following content:
.globl main main: enter $0, $0 movq $10, %rax leave retNote that the first two instructions of the 'main' function creates a stack frame which, among others, can contain the values of the local variables (if there are any such variables at all). The basic pointer register (%rbp) is used as a reference to point to the address of those local variables (i.e. the local variables can be addressed relatively to the value of the basic pointer).
The 'enter $0, $0' assembly instruction corresponds to the
pushq %rbpinstructions. It creates the stack frame of the function.
movq %rsp, %rbp
Note that e.g. the 'enter $24, $0 instruction would, after the two instructions shown above, allocate in the stack 24 byte memory space by subtracting 24 from the actual value of the stack pointer. Because 6*4=24 holds, this would be enough for six double-word (i.e. 4 byte=32 bit) length local variables (or for three quadword length local variables, respectively).
The 'leave' assembly instruction corresponds to the
movq %rbp, %rspinstructions. It frees (or destroys) the stack frame of the function.
popq %rbp
The stack is a dedicated and designated part of the memory which can store data according to the current needs of the programs. In this respect, the push and pop instructions are of most importance for adding or removing (as well as retrieving) data to or from the top of the stack. In order to implement and use a stack
– the programs use a dedicated register called stack pointer (%rsp) that always points to the top of the stack by containing the address of the last data item that has been pushed;
– when a data item is pushed into the stack first the stack pointer is decreased by the size of the operand (e.g. by subtracting 8 from the actual value of the stack pointer for a quadword), and then the content of the operand is stored at that address;
– when a data item is popped from the stack first the data from the top of the stack is retrieved from the memory address (and stored in the operand of the 'pop' instruction), and then the stack pointer is increased by the size of the operand (e.g. by adding 8 to the actual value of the stack pointer for a quadword).
The diagram below illustrates the push and pop operations:
![]()
Using the push / pop instructions instead of the enter / leave instructions, we can create another version of the program 'abc.s' as follows:
.globl main main: pushq %rbp movq %rsp, %rbp movq $10, %rax movq %rbp, %rsp popq %rbp retCompile, link and run the assembly program:
Here, like in the case of the 'simple.exe' program⇒ or the 'simplex.exe' program,⇒ we can display the returned value of the 'abc.exe' program using the echo %ERRORLEVEL% command in the 'cmd' window (or we can enter the 'err' command⇒ if the 'err.bat' file exists).
Adding 3+2 (abcs.s)
Now let us create an equivalent of the 'simple.c' program in Intel x86/x64 assembly language which adds two numbers (3 and 2) together as long integer types, stores the sum in another longint variable, and returns the sum as an ERRORLEVEL value. Before that, the program will warn us to check the actual value of the ERRORLEVEL environment variable.
Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abcs.s command, and create a new file named 'abcs.s' with the following content:
.globl main .data hello: .ascii "\12See the ERRORLEVEL value!\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $56, %rsp /* stack frame created */ movq $3, -8(%rbp) movq $2, -16(%rbp) movq -8(%rbp), %rax addq -16(%rbp), %rax movq %rax, -24(%rbp) leaq hello, %rcx call printf movq -24(%rbp), %rax # set ERRORLEVEL value /* stack frame to be destroyed */ addq $56, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Here, like in the case of the 'simple.exe' and 'abc.exe' programs, in the 'cmd' window we can display the returned value of the 'abcs.exe' program using the echo %ERRORLEVEL% or simply the err command. But in this case it returns the sum of the addition 3+2 (i.e. 5).
The size and content of the stack frame needs some explanation. The size of the stack frame is 56 bytes which corresponds to 7 quadwords (i.e. 56=7*8). The structure and content of the stack frame is as follows:
address content 0(%rsp) -56(%rbp) parameters for the function 'printf' 8(%rsp) -48(%rbp) 16(%rsp) -40(%rbp) 24(%rsp) -32(%rbp) -24(%rbp) variable c -16(%rbp) variable b -8(%rbp) variable a 56(%rsp) 0(%rbp) previous value of %rbp (pushed by the first instruction of main) 8(%rbp) return address for the caller of 'main' (for 'ret' in main) The basic pointer (%rbp) register has a special purpose: it points to the bottom of the stack frame of the current function, so local variables can be accessed relative to its value.As for the last row of the table which belongs to the address 8(%rbp) just below the bottom of the stack frame, when the program environment (i.e. the cmd.exe in our case) runs the abcs.exe program, it calls the 'main' global function of the abcs.exe program, and the current value of the instruction pointer is automatically pushed onto the top of the stack. (Thus when the called 'main' function exits and returns, the CPU can continue the execution of the caller program by popping the address of the next instruction to be performed from the stack and loading it into the instruction pointer).
In general, when a specific function of the program is called by another function (from the same or from another program), the return address of the next instruction to be executed after the 'call' instruction is automatically pushed onto the top of the stack.
Note that in the fetch-execute cycle the address of the next instruction is always stored in the %rip instruction pointer or program counter register. Thus the 'call' function, when executed, pushes the current value of the instruction pointer onto the top of the stack. After that the called function (the callee) pushes the value of the basic pointer and creates its stack frame.
The ret instruction is always the last instruction of any function. It "pops" the stored address of the next instruction to be executed from the top of the stack and restores the value the instruction pointer. Then the next fetch-execute cycle will continue the execution of the program immediately after the 'call' instruction.
The called function is named the callee, and the function that calls the callee is named the caller.
The diagram below illustrates the mechanism of the 'call' and the 'ret' (i.e. return) instructions:
![]()
Adding 3+2 with a function (abcf.s)
Let us now create an equivalent of the 'simplef.c' program⇒ in Intel x86/x64 assembly language which adds two numbers (3 and 2) together with a function named 'sum' and returns the sum. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad abcf.s command, and create a new file named 'abcf.s' with the following content:
.globl sum sum: pushq %rbp movq %rsp, %rbp subq $16, %rsp movl %ecx, 16(%rbp) movl %edx, 24(%rbp) movl 16(%rbp), %edx movl 24(%rbp), %eax addl %edx, %eax movl %eax, -4(%rbp) movl -4(%rbp), %eax addq $16, %rsp popq %rbp ret .globl main main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $3, -4(%rbp) movl $2, -8(%rbp) movl -8(%rbp), %edx movl -4(%rbp), %eax movl %eax, %ecx call sum movl %eax, -12(%rbp) // movl $0, %eax movl -12(%rbp), %eax addq $48, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Like in the case of the previous programs, in the 'cmd' window we can display the returned value of the 'abcf.exe' program using either the 'err' command or the echo %ERRORLEVEL% command. Now we can see again the sum of the addition 3+2 (i.e. 5).
So far, we used a lot of still unknown registers. Therefore it is high time to have an overview which registers are available for the assembly programs in the Intel x86/x64 architecture. First note, that using the AT&T assembly syntax,
– the 32 bit wide register names are prefixed with the %e characters, and
– the 64 bit wide register names are prefixed with the %r characters.Note that when we declare an int type variable in C, its length will be 32 bit (i.e. it is double-word wide).
In the Intel x86/x64 architecture the list of some important registers are as follows (cf. X86-64 Architecture Guide, 2025-03-11; Assembly 1: Basics, 2025-03-30):
Register Purpose Size Saved across calls General-purpose registers %raxtemp register for arithmetic or logical calculations etc. (called accumulator)
return value of a function64 bit No %eax the lower half of the 8 byte wide %rax register 32 bit %ax the lower half of the 4 byte wide %eax register 16 bit %ah the higher half of the 2 byte wide %ax register 8 bit %al the lower half of the 2 byte wide %ax register 8 bit %rbx callee-saved 64 bit Yes %ebx the lower half of the 8 byte wide %rbx register 32 bit %bx the lower half of the 4 byte wide %ebx register 16 bit %bh the higher half of the 2 byte wide %bx register 8 bit %bl the lower half of the 2 byte wide %bx register 8 bit %rcx used to pass 4th argument to functions 64 bit No %ecx the lower half of the 8 byte wide %rcx register 32 bit %cx the lower half of the 4 byte wide %ecx register 16 bit %ch the higher half of the 2 byte wide %cx register 8 bit %cl the lower half of the 2 byte wide %cx register 8 bit %rdx used to pass 3rd argument to functions 64 bit No %edx the lower half of the 8 byte wide %rdx register 32 bit %dx the lower half of the 4 byte wide %edx register 16 bit %dh the higher half of the 2 byte wide %dx register 8 bit %dl the lower half of the 2 byte wide %dx register 8 bit %rsi used to pass 2nd argument to functions 64 bit No %esi the lower half of the 8 byte wide %rsi register 32 bit %si the lower half of the 4 byte wide %esi register 16 bit %sil the lower half of the 2 byte wide %si register 8 bit %rdi used to pass 1st argument to functions 64 bit No %edi the lower half of the 8 byte wide %rdi register 32 bit %di the lower half of the 4 byte wide %edi register 16 bit %dil the lower half of the 2 byte wide %di register 8 bit %r8
used to pass 5th argument to functions
64 bit No
%r8d the lower half of the 8 byte wide %r8 register 32 bit %r8w the lower half of the 4 byte wide %r8d register 16 bit %r8b the lower half of the 2 byte wide %r8w register 8 bit %r9
used to pass 6th argument to functions
64 bit No
%r9d the lower half of the 8 byte wide %r9 register 32 bit %r9w the lower half of the 4 byte wide %r9d register 16 bit %r9b the lower half of the 2 byte wide %r9w register 8 bit %r10 temporary 64 bit No
%r11 temporary 64 bit No
%r12 callee-saved 64 bit Yes
%r13 callee-saved 64 bit Yes
%r14 callee-saved 64 bit Yes
%r15 callee-saved 64 bit Yes
Special-purpose registers %rspstack pointer 64 bit Yes %esp the lower half of the 8 byte wide %rsp register 32 bit %sp the lower half of the 4 byte wide %esp register 16 bit %spl the lower half of the 2 byte wide %sp register 8 bit %rbpbase pointer; callee-saved 64 bit Yes %ebp the lower half of the 8 byte wide %rbp register 32 bit %bp the lower half of the 4 byte wide %ebp register 16 bit %bpl the lower half of the 2 byte wide %bp register 8 bit %ripinstruction pointer or program counter 64 bit (call↔ret) %eip the lower half of the 8 byte wide %rip register 32 bit %ip the lower half of the 4 byte wide %eip register 16 bit %rflagsstatus or control flags 64 bit No %eflags the lower half of the 8 byte wide %rflags register 32 bit %flags the lower half of the 4 byte wide %eflags register 16 bit The status (or flags) register contains mostly one-bit storage units ("flags") that reflect the current state of an x86/x64 CPU. For example, some flags show some important characteristics of the result of arithmetic or logical operations (including comparisons etc.). Some usual flags are illustrated below within a 64-bit %rflags register:
63 ... 11 ... 7 6 5 4 3 2 1 0 OF SF ZF AF PF CF The flag names are abbreviated as follows:
- CF: Carry Flag (CF=1 when an arithmetic carry has been generated)
- PF: Parity Flag (PF=1 indicates that the number of the 1 bits in the result of the last operation is even; otherwise, i.e. when the number of 1 bits is odd, PF=0)
- AF: Auxiliary Carry Flag (AF=1 when an arithmetic carry has been generated using binary-coded decimal (BCD) arithmetic)
- ZF: Zero Flag (the zero flag is a central feature on most conventional CPU architectures: it is used to check the result of an arithmetic operation, including comparisons; ZF=1 if the result of the operation is zero, otherwise ZF=0)
- SF: Sign Flag or Negative Flag (SF=1 indicates that the result of the last mathematical operation produced value 1 in the most significant bit (MSB) position, i.e. the leftmost or sign bit of the result was set)
- OF: Overflow Bit (OF=1 shows that an overflow has occurred in the last arithmetic operation)
- Note that in two's complement coding, the operation C=A+B produces an overflow if the
(SA∧SB∧⌝ SC) ∨ (⌝ SA∧⌝ SB∧SC)
logical expression is true (where SA is the sign bit of the operand 'A' etc.).Formerly (e.g. in the mainframe age) the program counter and the status register were collectively called PSW (program status word) register. Nevertheless, this term can also be used for modern computers as well, including e.g. the content of the flags register. "The PSW contains status information about the currently running process, including memory usage information, condition codes, and other status information such as an interrupt enable/disable bit and a kernel/ user-mode bit." (Stallings 2018: 41)
GNU x86/x64 assembly: Summary and further examples
Table of contents:
Calculating the first 10 elements of the Fibonacci sequence (fib.s)
First, let us see a C program that prints the first 10 elements of the Fibonacci sequence. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fibonacci.c command, and create a new file named 'fibonacci.c' with the following content:
#include <stdio.h> int main() { int k1, k2, i; int n=10; k1=1; k2=1; printf("Finonacci numbers\n"); printf("%d\n",k1); i=2; while(i<=n) { printf("%d\n",k2); int x=k2; k2=k1+k2; k1=x; i++; } return 0; }Compile, link and run the compiled C program as follows:
Let us now create an equivalent of the 'fibonacci.c' program in Intel x86/x64 assembly language which prints the first 10 elements of the Fibonacci sequence using quadwords length variables. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fibonacci.s command, and create a new file named 'fibonacci.s' with the following content:
.globl main .data .P1: .ascii "Finonacci numbers\12\0" .P2: .ascii "%d\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $72, %rsp movq $10, -32(%rbp) # n movq $1, -8(%rbp) # k1 movq $1, -16(%rbp) # k2 leaq .P1, %rax movq %rax, %rcx call printf movq -8(%rbp), %rax movq %rax, %rdx leaq .P2, %rax movq %rax, %rcx call printf movq $2, -24(%rbp) # i jmp .J2 /* begin loop */ .J1: movq -16(%rbp), %rax movq %rax, %rdx leaq .P2, %rax movq %rax, %rcx call printf movq -16(%rbp), %rax movq %rax, -40(%rbp) # x movq -8(%rbp), %rax addq %rax, -16(%rbp) movq -40(%rbp), %rax movq %rax, -8(%rbp) addq $1, -24(%rbp) .J2: movq -24(%rbp), %rax cmpq -32(%rbp), %rax # i≤n ? jle .J1 # jump if %rax≤-32(%rbp) /* end of loop */ movq $10, %rax addq $72, %rsp popq %rbp retTo understand the operation of the loop it is essential to know the status register or flags register⇒ that indicates the current state of an x86/x64 CPU, and especially the status flags (e.g. carry, parity, zero, sign, overflow etc. flags) which usually characterize the result of arithmetic operations.
Note that the return value of the 'main' function was now set to 10 (instead of 0) to see the difference from the compiled fibonacci.c program.
Compile, link and run the assembly program as follows:
Finally, it is very instructive to see the machine code representation of the 'fibonacci.s' assembly program. For that purpose, first compile the program into an object file with debugging information entering the
gcc -g -c fibonacci.s -o fibonacci.o
command, and then create the 'memory dump' of the object file typing the
objdump -d -M -S fibonacci.o
command in the 'cmd' window. We shall get something like this:The result of the is as follows:
fibonacci.o: file format pe-x86-64 Disassembly of section .text: 0000000000000000: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 48 sub $0x48,%rsp 8: 48 c7 45 e0 0a 00 00 movq $0xa,-0x20(%rbp) f: 00 10: 48 c7 45 f8 01 00 00 movq $0x1,-0x8(%rbp) 17: 00 18: 48 c7 45 f0 01 00 00 movq $0x1,-0x10(%rbp) 1f: 00 20: 48 8d 04 25 00 00 00 lea 0x0,%rax 27: 00 28: 48 89 c1 mov %rax,%rcx 2b: e8 00 00 00 00 callq 30 30: 48 8b 45 f8 mov -0x8(%rbp),%rax 34: 48 89 c2 mov %rax,%rdx 37: 48 8d 04 25 13 00 00 lea 0x13,%rax 3e: 00 3f: 48 89 c1 mov %rax,%rcx 42: e8 00 00 00 00 callq 47 47: 48 c7 45 e8 02 00 00 movq $0x2,-0x18(%rbp) 4e: 00 4f: eb 34 jmp 85 <.J2> 0000000000000051 <.J1>: 51: 48 8b 45 f0 mov -0x10(%rbp),%rax 55: 48 89 c2 mov %rax,%rdx 58: 48 8d 04 25 13 00 00 lea 0x13,%rax 5f: 00 60: 48 89 c1 mov %rax,%rcx 63: e8 00 00 00 00 callq 68 <.J1+0x17> 68: 48 8b 45 f0 mov -0x10(%rbp),%rax 6c: 48 89 45 d8 mov %rax,-0x28(%rbp) 70: 48 8b 45 f8 mov -0x8(%rbp),%rax 74: 48 01 45 f0 add %rax,-0x10(%rbp) 78: 48 8b 45 d8 mov -0x28(%rbp),%rax 7c: 48 89 45 f8 mov %rax,-0x8(%rbp) 80: 48 83 45 e8 01 addq $0x1,-0x18(%rbp) 0000000000000085 <.J2>: 85: 48 8b 45 e8 mov -0x18(%rbp),%rax 89: 48 3b 45 e0 cmp -0x20(%rbp),%rax 8d: 7e c2 jle 51 <.J1> 8f: 48 c7 c0 0a 00 00 00 mov $0xa,%rax 96: 48 83 c4 48 add $0x48,%rsp 9a: 5d pop %rbp 9b: c3 retq 9c: 90 nop 9d: 90 nop 9e: 90 nop 9f: 90 nop Note that we can see in the first column the relative memory address of each instruction, followed by the machine code of the corresponding instruction.
In order to organize a loop in an assembly program we need both comparison and control instructions. Let us review some of them via examples:
- cmp %rsi, %rax, cmp $10, %rax or cmp -24(%rbp), %rax (performs a comparison operation between the two operands subtracting the first op from the second; sets the ZF, SF, PF, CF and OF status flags)
- Example 1 (e.g. for pre-tested loops with a loop control variable named 'i'):
- cmpl -12(%rbp), %eax # i>n ?
- jg .L1
- (these instructions compare the data located in the memory address %rbp−12 ('n') with the content of the accumulator ('i'); if the content of the accumulator is greater than 'n', then a conditional jump occurs to the label .L1)
- Example 2 (e.g. for post-tested loops with a loop control variable named 'i'):
- cmpl -12(%rbp), %eax # i<=n ?
- jle .loop
- (these instructions compare the data located in the memory address %rbp−12 ('n') with the content of the accumulator ('i'); if the content of the accumulator is less or equal than 'n', then a conditional jump occurs to the label .loop)
- test %rax, %rax (sets the ZF, SF and PF status flags; note that this 'test' instruction is equivalent to the cmp $0, %rax instruction)
- Example:
- andl $1,%eax # bitwise 'and' operation is performed
- testl %eax,%eax # %eax==0 ?
- jz .cont
- (these instructions test whether the content of the accumulator is even or not; if 'yes', then a conditional jump occurs to the label .cont)
- jmp .J2 (jumps directly to the instruction labelled by .J2 by setting the content of the %rip register)
- je .L1 or jz .L1 (jumps to the instruction labelled by .L1 if ZF equals 1)
- jne .L2 or jnz .L2 (jumps to the instruction labelled by .L2 if ZF is not equal to 1, i.e. ZF equals 0)
- jl .J2 (jumps to the instruction labelled by .J2 if SF is not equal to OF, i.e. the second op of the previous 'cmp' instruction is less than the first op)
- jle .J2 (jumps to the instruction labelled by .J2 if the second op of the previous 'cmp' instruction is less than or equal to the first op)
- jg .J2 (jumps to the instruction labelled by .J2 if the second operand of the previous 'cmp' instruction is greater than the first operand)
- jge .J2 (jumps to the instruction labelled by .J2 if the second op of the previous 'cmp' instruction is greater than or equal to the first op)
Note that after the operating codes of some isntructions we can use certain suffixes to indicate the length of the operands.
suffix length of operands example(s) -b byte movb $5, -1(%rbp)
movl -1(%rbp), %al-w word
(2 bytes = 16 bits)movw $5, -2(%rbp)
movw -2(%rbp), %ax-l doubleword
(4 bytes = 32 bits)movl $5, -4(%rbp)
movl -4(%rbp), %eax-q quadword
(8 bytes = 64 bits)movq $5, -8(%rbp)
movq -8(%rbp), %rax
leaq pattern, %rcxNote that in the C programming language we can use the following format specifiers of the printf() function: %hi (for short integers), %d or %i (for doubleword integers), %ld (for quadword integers).
Simple examples of using suffixes
(1) Using signed bytes (or characters):
.globl main .data pattern: .ascii "a=%hi, b=%hi, a+b=%hi\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movb $5, -1(%rbp) # a movb $-8, -2(%rbp) # b movb -1(%rbp), %al addb -2(%rbp), %al cbtw movw %ax, %r9w # a+b movb -1(%rbp), %al cbtw movw %ax, %dx # a movb -2(%rbp), %al cbtw movw %ax, %r8w # b leaq pattern, %rcx call printf movq $0, %rax addq $48, %rsp popq %rbp retNote that the assembly instruction cbtw converts the 'al' register containing a signed 8-bit integer value to the word-length 'ax' register.
(2) Using short integers:
.globl main .data pattern: .ascii "a=%hi, b=%hi, a+b=%hi\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movw $5, -2(%rbp) # a movw $-8, -4(%rbp) # b movw -2(%rbp), %ax addw -4(%rbp), %ax movw -2(%rbp), %dx # a movw -4(%rbp), %r8w # b movw %ax, %r9w # a+b leaq pattern, %rcx call printf movq $0, %rax addq $48, %rsp popq %rbp ret(3) Using doubleword integers:
.globl main .data pattern: .ascii "a=%d, b=%d, a+b=%d\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $5, -4(%rbp) # a movl $-8, -8(%rbp) # b movl -4(%rbp), %eax addl -8(%rbp), %eax movl -4(%rbp), %edx # a movl -8(%rbp), %r8d # b movl %eax, %r9d # a+b leaq pattern, %rcx call printf movl $0, %eax addq $48, %rsp popq %rbp ret(4) Using quadword integers:
.globl main .data pattern: .ascii "a=%ld, b=%ld, a+b=%ld\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movq $5, -8(%rbp) # a movq $-8, -16(%rbp) # b movq -8(%rbp), %rax addq -16(%rbp), %rax movq -8(%rbp), %rdx # a movq -16(%rbp), %r8 # b movq %rax, %r9 # a+b leaq pattern, %rcx call printf movq $0, %rax addq $48, %rsp popq %rbp retIt is important to study very carefully in the simple examples presented above the correct use of suffixes for the assembly instructions as well as the format specifiers used in the printf() function.
Simple examples for demonstrating comparisons and jumps
First, let us see a simple assembly program that subtracts two (doubleword) integers and prints the result of the subtraction using a function 'disp'. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad comp.s command, and create a new file named 'comp.s' with the following content:
.globl main .globl disp .data pattern: .ascii "%d-%d=%d\12\0" .text disp: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl %edx, -4(%rbp) movl %r8d, -8(%rbp) movl -4(%rbp), %eax subl -8(%rbp), %eax movl %eax, -12(%rbp) movl -12(%rbp), %r9d leaq pattern, %rcx call printf movl $0, %eax addq $48, %rsp popq %rbp ret main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $10, %edx movl $4, %r8d call disp movl $0, %eax addq $48, %rsp popq %rbp retCompile, link and run the compiled program as follows:
Nw let us create an assembly program that compares two (doubleword) integers and prints both the result of the subtraction and the values of the OF, SF and ZF flags which are set by the 'cmp' instruction. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad comp1.s command, and create a new file named 'comp1.s' with the following content:
.globl main .globl disp .data p1: .ascii "%d-%d=%d\12\0" p2: .ascii "OF=%d, SF=%d, ZF=%d\12\0" .text disp: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl %edx, -4(%rbp) movl %r8d, -8(%rbp) movl -4(%rbp), %eax subl -8(%rbp), %eax movl %eax, -12(%rbp) movl -12(%rbp), %r9d leaq p1, %rcx call printf movl $1, %edx # OF=1 by default movl $1, %r8d # SF=1 by default movl $1, %r9d # ZF=1 by default movl -4(%rbp), %eax cmpl -8(%rbp), %eax jz .c1 movl $0, %r9d # ZF=0 .c1: js .c2 movl $0, %r8d # SF=0 .c2: jo .c3 movl $0, %edx # OF=0 .c3: leaq p2, %rcx call printf movl $0, %eax addq $48, %rsp popq %rbp ret main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $-10, %edx movl $4, %r8d call disp movl $0, %eax addq $48, %rsp popq %rbp retCompile, link and run the compiled program as follows:
Note that a cmp op1, op2 instruction calculates the op2−op1 subtraction in the background, and sets the flags accordingly.
Trying different integer values to be subtracted in the 'main' section of the comp1.s assembly program, we can get the following results:
Subtraction (b−a=c) OF SF ZF comp a,b 10−4=4 0 0 0 OF=0 & SF=0 ⇒ c>0 ⇒ b>a −10−4=−14 0 1 0 OF=0 & SF=1 ⇒ c<0 ⇒ b<a 4−4=0 0 0 1 ZF==0 ⇒ c=0 ⇒ b=a −2147483648−1=2147483647(1) 1 0 0 OF=1 & SF=0 ⇒ b<a 0−(−2147483648)=−2147483648(2) 1 1 0 OF=1 & SF=1 ⇒ b>a Remarks:
(1) −2147483648=−231 is the least possible negative integer that can be represented in the two's complement representation of integers in 32 bits. (When creating the assembly code it corresponds to the $0x80000000 hexadecimal value).
(2) +2147483647=231−1 is the least possible positive integer that can be represented in the two's complement representation of integers in 32 bits.
Note that when overflow occurs, the result of the subtractions cannot be formally correct.Summarizing all that:
- ZF=1 ⇔ a=b
- ZF=0 & OF=SF ⇔ b>a
- OF≠SF ⇔ b<a
Listing the first 10 natural numbers
First, let us see a C program that prints the first 10 natural numbers (starting with 1, then 2, 3, 4, ..., 10). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad natural.c command, and create a new file named 'natural.c' with the following content:
#include <stdio.h> int main() { int x=1; int i, n=10; for(i=1;i<=n;i++) { printf("%d\n",x); x++; } return i; }Compile, link and run the compiled C program as follows:
Now it can be very instructive to see the compiled assembly version of the C program. Type and run in the 'cmd' window the gcc natural.c -S -o nat.s command. After making some changes (omitting some parts, commenting some of the instructions etc.), the resulting file will look like something like this:
.data .pattern: .ascii "%d\12\0" .text printf: pushq %rbp pushq %rbx # callee saved subq $56, %rsp leaq 48(%rsp), %rbp /* 56=48+8; pushing %rbx allocates +8 bytes at the stack⇒ */ movq %rcx, 32(%rbp) # 4th argument stored movq %rdx, 40(%rbp) # 3rd argument stored movq %r8, 48(%rbp) # 5th argument stored movq %r9, 56(%rbp) # 6th argument storedleaq 40(%rbp), %rax movq %rax, -16(%rbp) # local variable movq -16(%rbp), %rbx movl $1, %ecx movq __imp___acrt_iob_func(%rip), %rax call *%raxmovq %rax, %rcx movq 32(%rbp), %rax movq %rbx, %r8 movq %rax, %rdx call __mingw_vfprintf movl %eax, -4(%rbp) movl -4(%rbp), %eaxaddq $56, %rsp popq %rbx popq %rbp ret .text .globl main main: pushq %rbp movq %rsp, %rbp subq $48, %rsp # stack frame (48 byte)call __mainmovl $1, -4(%rbp) # variable x movl $10, -12(%rbp) # variable n movl $1, -8(%rbp) # variable i jmp .L4 /* begin of loop */ .L5: movl -4(%rbp), %eax # variable x to print movl %eax, %edx # 3rd parameter for printf leaq .pattern, %rax # address of .pattern movq %rax, %rcx # 4th parameter for printf call printf addl $1, -4(%rbp) # x++ addl $1, -8(%rbp) # i++ .L4: movl -8(%rbp), %eax # i → %eax cmpl -12(%rbp), %eax # %eax≤-12(%rbp) ? // -12(%rbp) is a reference to variable n jle .L5 # jump if i≤n /* end of loop */ movl -8(%rbp), %eax addq $48, %rsp popq %rbp retBefore the 'printf' function is called and the local variables are "declared" (i.e. before the 'jmp .L4' instruction), the content of the stack frame created by the 'main' function is as follows:
address content 0(%rsp) -48(%rbp) allocated space for the four parameters (or arguments) for the function 'printf' 8(%rsp) -40(%rbp) 16(%rsp) -32(%rbp) 24(%rsp) -24(%rbp) (not used) -12(%rbp) variable n (initially n=10) -8(%rbp) variable i (initially i=1) -4(%rbp) variable x (initially x=1) 48(%rsp) 0(%rbp) previous value of %rbp (pushed by the first instruction of main) 8(%rbp) return address for the caller of 'main' (for 'ret' in main) After the 'printf' function is called for the first time by the 'main' function, the content of the stack frame created by the 'printf' function is as follows:
address content 0(%rsp) -48(%rbp) ... ... ... 48(%rsp) 0(%rbp) 56(%rsp) 8(%rbp) previous value of %rbx (pushed by the second instruction of printf) 16(%rbp) previous value of %rbp (pushed by the first instruction of printf) 24(%rbp) return address for the caller of 'printf', i.e. the address of the next instruction of 'main' after the 'call printf' instruction (the following part of the stack is the same space for the arguments (or parameters) of the 'printf' function that has been allocated by the 'main' function, see the top of its stack frame in the table above) 32(%rbp) 4th argument of the function 'printf' 40(%rbp) 3rd argument of the function 'printf' 48(%rbp) 5th argument of the function 'printf' 56(%rbp) 6th argument of the function 'printf' Let us now create an equivalent of the 'natural.c' program⇒ in Intel x86/x64 assembly language which prints the first 10 natural numbers. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad natural.s command, and create a new file named 'natural.s' with the following content:
.globl main .data msg: .ascii "The first %d natural numbers:\12\0" pattern: .ascii "%d\12\0" .text main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $1, -4(%rbp) # x movl $10, -12(%rbp) # n movl $1, -8(%rbp) # i leaq msg, %rcx movl -12(%rbp), %edx call printf .L0: movl -8(%rbp), %eax cmpl -12(%rbp), %eax # i>n ? jg .L1 leaq pattern, %rcx movl -4(%rbp), %edx call printf incl -4(%rbp) incl -8(%rbp) jmp .L0 .L1: movl -8(%rbp), %eax addq $48, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Listing the first 10 powers of 2
First, let us see a C program that prints the first 10 powers of 2 (starting with 1, then 2, 4, 8 etc.). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad powers.c command, and create a new file named 'powers.c' with the following content:
#include <stdio.h> int nextpow(int x) { int p=x+x; return p; } int main() { int x=1; int i=1, n=10; do { printf("%d\n",x); x=nextpow(x); i++; } while(i<=n); return i; }Compile, link and run the compiled C program as follows:
Let us now create an equivalent of the 'powers.c' program in Intel x86/x64 assembly language which prints the first 10 powers of 2. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad powers.s command, and create a new file named 'powers.s' with the following content:
.globl main .data pattern: .ascii "%d\12\0" .text nextpow: pushq %rbp movq %rsp, %rbp subq $16, %rsp # stack frame size movl %ecx, 16(%rbp) # parameter x movl 16(%rbp), %eax addl %eax, %eax movl %eax, -4(%rbp) # local variable p movl -4(%rbp), %eax addq $16, %rsp popq %rbp ret main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $1, -4(%rbp) # variable x movl $1, -8(%rbp) # variable i movl $10, -12(%rbp) # variable n .loop: movl -4(%rbp), %edx leaq pattern, %rcx call printf movl -4(%rbp), %ecx call nextpow movl %eax, -4(%rbp) addl $1, -8(%rbp) # i++ movl -8(%rbp), %eax cmpl -12(%rbp), %eax # i<=n ? jle .loop movl -8(%rbp), %eax addq $48, %rsp popq %rbp retCompile, link and run the assembly program as follows:
Listing the first 10 factorials
First, let us see a C program that prints the first 10 factorials (starting with 1, then 2, 6, 24 etc.). Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fact.c command, and create a new file named 'fact.c' with the following content:
#include <stdio.h> int f(int n) { int temp; temp=1; for(int i=2;i<=n;i++) { temp=temp*i; } return temp; } int main() { int n=10; printf("List of the first %d factorials:\n",n); int i=1; while(i<=n) { printf("%d\n",f(i)); i=i+1; }; return i; }Compile, link and run the compiled C program as follows:
Let us now create an equivalent of the 'fact.c' program in Intel x86/x64 assembly language which prints the first 10 factorials. Open a new 'cmd' window in the c:\temp\gcc directory, run the notepad fact.s command, and create a new file named 'fact.s' with the following content:
.globl main .globl f .data .LC0: .ascii "List of the first %d factorials:\12\0" .LC1: .ascii "%d\12\0" .text f: pushq %rbp movq %rsp, %rbp subq $16, %rsp movl %ecx, 16(%rbp) movl $1, -4(%rbp) movl $2, -8(%rbp) jmp .L4 .L5: movl -4(%rbp), %eax imull -8(%rbp), %eax movl %eax, -4(%rbp) addl $1, -8(%rbp) .L4: movl -8(%rbp), %eax cmpl 16(%rbp), %eax jle .L5 movl -4(%rbp), %eax addq $16, %rsp popq %rbp ret main: pushq %rbp movq %rsp, %rbp subq $48, %rsp movl $10, -8(%rbp) movl -8(%rbp), %eax movl %eax, %edx leaq .LC0, %rax movq %rax, %rcx call printf movl $1, -4(%rbp) jmp .L8 .L9: movl -4(%rbp), %eax movl %eax, %ecx call f movl %eax, %edx leaq .LC1, %rax movq %rax, %rcx call printf addl $1, -4(%rbp) .L8: movl -4(%rbp), %eax cmpl -8(%rbp), %eax jle .L9 movl -4(%rbp), %eax addq $48, %rsp popq %rbp retNote that the imul assembly instruction executes a signed multiplication or product of the first operand (which can be either a register or a word-length or doubleword-length memory content) and the second operand (a register), and stores the resulting product in the the register specified as the second operand.
Compile, link and run the assembly program as follows:
Review questions and exercises
Questions:
- What are the basic functions of the OS?
- What are the main hw elements of the Von Neumann architecture?
- What are the main components of the CPU, and what are their basic functions?
- What units does the main memory consist of and how are they identified?
- What is buffer memory and why is it used?
- Illustrate and explain briefly the operation of the instruction (or the fetch-execute) cycle!
- What is the main purpose of the PC (or IP) and the IR registers?
- List the 64-bit length general-purpose registers and their symbolic notations in the x86/x64 assembly language!
- List the 64-bit length special-purpose registers and their symbolic notations in the x86/x64 assembly language!
Create programs in x86/x64 assembly language which performs the following tasks:
- Add two integers (e.g. 5 and 8) together using a function named 'addint'. Print the result in the following form: 5 + 8 = 13. Set the return value of the main function to the sum of the addition.
- use 64-bit length registers and memory locations for the local variables
- use 32-bit length registers and memory locations for the local variables
- Subtract two integers (e.g. 5 and 8) together using a function named 'subint'. Print the result in the following form: 5 − 8 = 13. Set the return value of the main function to the sum of the subtraction.
- use 64-bit length registers and memory locations for the local variables
- use 32-bit length registers and memory locations for the local variables
- List the first 'n' elements (e.g. n=10) of a given number sequence (e.g. 2, 4, 6, 8, ...) with their sequence number in the following form:
--------------
1. element = 2
2. element = 4
...
10. element = 20
--------------
The given number sequence can be as follows:Each program should start with printing its function and end with the name of the programmer (as well as the current date).
Interrupts and the extension of the fetch-execute cycle (cf. Stallings 2018: 35-40)
Essentially all computers provide a mechanism by which other modules, processes or events may interrupt the normal sequencing of the processor. The table below lists the most common classes of interrupts.
Main classes of interrupts Programs e.g. arithmetic overflow, division by zero, attempt to execute an illegal machine instruction, or reference outside the program's allowed memory space etc. Timer e.g. end of the time slice allowed for the program (task, process) to run I/O e.g. sending an I/O request; receiving a signal generated by an I/O controller to indicate the normal completion of an I/O operation or the occurrence of an I/O error Hardware failure e.g. memory parity error Why are interrupts important?
Interrupts are provided primarily as a way to improve processor utilization. For example, most I/O devices are much slower than the processor. When an I/O operation initiated, the processor must wait until the operation is completed to continue with the execution of the next instruction of a sequential program.Suppose that the processor is transferring data to a printer using the simple fetch-execute instruction cycle scheme.⇒ After each write operation, the processor must pause and remain idle until the printer catches up. The length of this pause may be on the order of many thousands or even millions of instruction cycles. Clearly, this is a very wasteful use of the processor.
(1) Let us suppose that a user program calls an I/O routine that performs the requested I/O operation.
In the example above the sequential execution of the user program follows the 1 4 5 2 4 5 3 stages where the 4 5 stages correspond to the called I/O routine. During the I/O command, which presumably takes some time, without the interrupting the user program, the processor must wait until the operation is successfully completed.
Without an interrupt mechanism, the program that waits for the I/O device to perform the requested function should periodically check (or poll) the status of the I/O device. In other words, the waiting program repeatedly performs a test operation to determine if the I/O operation is done. When the I/O operation is completed, it sets a flag indicating the success or failure of the operation. The change of the status flag tells the program that the I/O operation is completed and so the execution of the program can be continued.In parallel processing, the synchronization of the execution of concurrent processes can normally be implemented using wait/signal semaphores to temporarily suspend and resume the execution of the concurrent processes.
(2) With interrupts, the processor can be engaged in executing instructions from another program (or process) while the requested I/O operation is in progress. Let us suppose that there are two user programs to be executed, and currently the first user program (denoted by program(1)) is running, and the second user program (denoted by program(2)) is waiting to be executed.
When the user program(1) reaches a point at which an I/O operation should be executed, it makes a system call. The I/O program (called an interrupt handler) that is invoked in this case consists only of some preparation code and the actual I/O command. After these few instructions have been executed, the control transfers to the second user program(2) while the first, interrupted program(1) is temporarily suspended waiting for the invoked I/O operation to be completed. Meanwhile the addressed external device is busy (e.g. accepting data from computer memory, processing it etc.), and the requested I/O operation is conducted concurrently with the execution of instructions from the second user program(2).
When the invoked I/O operation is complete, the I/O module for the external device sends an interrupt request signal to the processor. If the interruption of the currently running second program(2) is enabled, the processor responds to the interrupt request signal by suspending the operation of the second program(2), executing again some preparation code, and continuing the interrupted user program(1) immediately after the instruction which invoked the now successfully completed I/O operation.
For the user program, an interrupt suspends the normal sequence of execution. When the interrupt processing is completed, execution resumes. Thus, the user program does not have to contain any special code to accommodate interrupts; the processor and the OS are responsible for suspending the user program, then resuming it at the same point.To accommodate interrupts, a new stage called an interrupt stage is added to the instruction cycle:⇒
In the interrupt stage, the processor checks to see if any interrupts have occurred, indicated by the presence of an interrupt signal. If no interrupts are pending, the processor restart the fetch-execute stage and fetches the next instruction of the current program. If an interrupt is pending, the processor suspends execution of the current program and executes an interrupt-handler routine.
The interrupt-handler routine is generally part of the OS. Typically, this routine determines the nature of the interrupt and performs whatever actions are needed. In the above example, the handler determines which I/O module generated the interrupt, and which program is waiting for the answer of that I/O module. When the interrupt-handler routine is completed, the processor can resume execution of the interrupted user program at the point of interruption.
It is clear that there is some overhead involved in this process. For example, extra instructions must be executed (in the interrupt handler) to determine the nature of the interrupt and to decide on the appropriate action etc. Nevertheless, because of the relatively large amount of time that would be wasted by simply waiting on an I/O operation, the processor can be employed much more efficiently with the use of interrupts.
Interrupt processing (cf. Stallings 2018: 41-45)
An interrupt triggers a number of events, both in the processor hardware and in software.
Let us examine in detail the interrupt processing mechanism after an I/O operation has been initiated by a user program. The addressed I/O device does what it has to do as a parallel background process, and when it completes the requested I/O operation, generally the following sequence of hardware events occurs:
- The device issues an interrupt signal to the processor (indicating that the operation is completed, e.g. some data have been stored, retrieved, printed etc.). It will be pending as an interrupt request waiting for the processor to acknowledge it.
- The processor finishes execution of the current instruction in the fetch-execute cycle⇒ before responding to the interrupt (i.e. before proceeds to the third stage of the instruction cycle).
- If interrupts are enabled, the processor tests for a pending interrupt request. When there is one (e.g. because of event 1), the processor sends an acknowledgment signal to the device that issued the interrupt. The acknowledgment allows the device to remove its interrupt signal.
- The processor next prepares to transfer control to the interrupt routine or handler. To begin, it saves information needed to resume the current program at the point of interrupt. The minimum information required is the program status word (PSW)⇒ and the location of the next instruction to be executed, which is contained in the program counter (PC). These data will be stored in a designated memory area of the operating system.
- The processor then loads the program counter with the entry location of the interrupt-handling routine (i.e. the address of the first instruction of the handler to be executed). The purpose of the interrupt handler is to respond to the interrupt (indicated by the interrupt signal, see event 1).
In order to handle the possible interrupts, depending on the computer architecture and OS design, the implementation of the interrupt handling routines may be
– either a single program, i.e. one program for each type of interrupt,
– or a number of separate programs, one for each device and each type of interrupt.
If there is more than one interrupt-handling routine, the processor must determine which one to invoke. This information may have been included in the original interrupt signal, or the processor may have to issue a request to the device that issued the interrupt to get a response that contains the needed information.Note that more or less the same mechanism occurs when a user program initiates an I/O operation issuing a system call (often called 'trap' e.g. in the Intel x64 architecture). One of the differences is that hardware events occur asynchronously but system calls are part of the normal synchronous process of program execution (i.e. the I/O operation to be performed in a parallel, separate background process is initiated by the user program itself). Therefore in the case of system calls only event 4 should be implemented by the called software routine before interrupting the user program by running the appropriate interrupt routine (in event 5) which then transfers control (or switches) to a new process, e.g. to another user program (in event 9, see later).
Once the program counter has been loaded, the processor proceeds to the the next instruction cycle, which begins with an instruction fetch. Because the instruction fetch is determined by the contents of the program counter, the control is transferred to the interrupt-handler program.
The execution of the interrupt handler results in the following operations:
- At this point, the program counter and PSW relating to the interrupted program have been saved on the control stack. However, there is other information that is considered part of the state of the executing program. In particular, the contents of the processor registers need to be saved, because these registers may be used by the interrupt handler. So all of these values, plus any other state information, need to be saved. Typically, the interrupt handler will begin by saving the contents of all registers on the stack.
- The interrupt handler may now proceed to process the interrupt. This includes an examination of status information relating to the I/O operation or other event that caused an interrupt. It may also involve sending additional commands or acknowledgments to the I/O device.
- When interrupt processing is complete, the handler (or the dedicated routine of the OS) selects one of the previously interrupted programs and prepares to switch control to it. First the saved register values of the selected program are retrieved from the stack and restored to the registers.
- The final act is to restore the PSW and program counter values of the selected program. As a result, the next instruction to be executed will be from the selected and previously interrupted program.
It is important to save all of the state information about the interrupted programs for later resumption. This is because the interrupt is not a routine called from the program. Rather, the interrupt can occur at any time, and therefore at any point in the execution of a user program. Its occurrence is unpredictable (i.e. it is an asynchronous event which occurs anytime during the synchronized process of the fetch-execute cycle).Finally, let us discuss briefly the case of multiple interrupts. Suppose that one or more interrupts occur while an interrupt is being processed. Two approaches can be taken to dealing with multiple interrupts.
- The first is to disable interrupts while an interrupt is being processed. A disabled interrupt simply means that the processor ignores any new interrupt request signal during the processing of another interrupt. If an interrupt occurs during this time, it generally remains pending and will be processed only after the processor has reenabled interrupts. Thus,
– if an interrupt occurs any time, then further interrupts are disabled immediately; and
– after the interrupt-handler routine completes, interrupts are reenabled.
For example, if an interrupt occurs when a user program is being executed, first the interrupt is processed, then the processor checks to see if additional interrupts have occurred, and it resumes the user program only when all interrupt requests have been processed. This approach is simple, as interrupts are handled in strict sequential order.
![]()
- The second approach is to define priorities for interrupts and to allow an interrupt of higher priority to cause a lower-priority interrupt handler to be interrupted. This approach is more complex, as interrupts are handled in nested order.
![]()
In the following, we shall have an overview of the Windows operating system⇒ which implements all the features we have discussed before.