Compiling a simple boot image for x86

A few days ago I’ve created a tool called cowboot – a very simple boot image generator for x86 that generates a boot image with a cow saying a dedicated message:

compiling a boot image and launching via qemu

In this post I’m going to go through early boot concepts, and use cowboot as a (very) simple example for such concepts. In addition, I’ll go through cowboot’s code and introduce different tools I’ve used for compiling / debugging.

Early boot concepts

The BIOS scans bootable devices and searches for the byte sequence 0x55, 0xAA (called “boot signature”) in the boot sector at offsets 510 and 511.

When the BIOS find such a boot sector, it loads it to address 0x7c00 and execution is passed to it.

Because of legacy reasons, the computer boots in real mode – a 16-bit mode operation with 20 bit address space. 64-bit systems run in long mode and 32-bit systems run in protected mode. Special procedure is required to make the transition from real mode to protected mode and from protected mode to long mode, which are done at early boot stages.

In real mode, two registers are required in order to store an address: a segment register and an offset register. The address described by segment:offset is calculated as follows:

Physical address = segment * 16 + offset

Meaning the address 0x7c00 can be described in multiple ways. For example:

0x0000:0x7c000x7c0:0x00000x420:0x3a00
Descriptions of address 0x7c00

The BIOS exposes a set of software interrupts that facilitates DOS programs (which operates under real mode) and bootloaders – for example INT10H is used for video services (e.g. writing a character / displaying pixels) and INT13H provides access to sector-based disks.

Example implementation of those interrupts can be found on SeaBIOS‘s code. SeaBIOS is an open-source BIOS implementation for x86 and is the default BIOS for QEMU and KVM.

The data structure that is used for storing the said interrupts handlers is called an Interrupt Vector Table, or IVT, and usually resides at address 0 in memory. In x86 an interrupt is triggered by the instruction int %d where %d is the interrupt number. Execution is then passed to the handler that resides in entry %d in the IVT.

SeaBIOS defines the IVT as an array of pointers, which are defined as follows:

/* src/std/bda.h:13 */

struct rmode_IVT {
    struct segoff_s ivec[256];
}

/* src/types.h:24 */

 // Definition for common 16bit segment/offset pointers.
struct segoff_s {
    union {
        struct {
            u16 offset;
            u16 seg;
        };
        u32 segoff;
    };
 };  

In SeaBIOS The macro SET_IVT is used to set an IVT entry. The function ivt_init is used by SeaBIOS for initialization of most IVT entries.

In addition, most IVT entries in SeaBIOS are named entry_%d and their implementation is named handle_%d . Here is the implementation for INT10H and INT13H.

Going through cowboot’s code

A linker script is used in order to create the boot image:

.text : {
        *(.text)

        /* write cow after code */
        _cow_start = .; 
        *(.cowdata)
        _cow_len = . - _cow_start;

        /* put boot signature at the end of the boot section */
        . = ORIGIN(ROM) + LENGTH(ROM) - 2;
        BYTE(0x55)
        BYTE(0xaa)
} > ROM

If you’re not familiar with LD scripts, you should read my previous post about them. Either way, here is a quick syntax run-down:

An LD script is used during the linkage of the program to create an output object file from several object files used as input.

One can invoke a custom linker script using gcc -T [ld_script] or ld -T [script].

The output binary’s sections are called output sections (for example, the block .text: {...} defines an output section named .text) which are composed from input sections – for example the statement *(.text) joins all the .text sections from the input object files.

The symbol . is called the location counter and holds the next address memory will be mapped to.

The linker script declares a single output section named .text composed from boot code present in cowboot.S and from a .cowdata section which contains the output from cowsay.

The symbols _cow_start and _cow_len are referenced by cowboot.S and are used to reference the message that will be displayed on screen.

In order to create an input file containing a .cowdata section, I’ve created the target boot_message.o in my Makefile that utilize objcopy in the following manner:

boot_message.o: boot_message.cow
	$(OBJCOPY) -I binary -O elf32-i386 --rename-section .data=.cowdata $< $@
  • -I binary tells objcopy that the input format of the input file boot_message.cow, which contains the string to be printed, is “binary”
  • -O elf32-i386 tells objcopy that the output format is ELF. This is needed because ld only works with ELF files.
  • --rename-section .data=.cowdata renames the output section to .cowdata instead of the default .data

As seen at the end of the linker script, the boot signature is placed at the last two bytes of the boot sector:

/* put boot signature at the end of the boot section */
. = ORIGIN(ROM) + LENGTH(ROM) - 2;
BYTE(0x55)
BYTE(0xaa)

The boot sector itself is declared to be 512 byte long:

MEMORY {
    ROM : ORIGIN = 0, LENGTH = 512
}

There are multiple reasons for why I’ve chosen to write the boot signature using the linker instead of the assembler:

a. The size of my boot image is limited to 512 bytes: Linker scripts declares different memory regions using the MEMORY command. Attributes of such section include their length. Linkage will fail if a memory section can not contain all the output sections that are mapped to it:

$ python -c 'print("A"*1337)' > boot_message.cow
$ make
nasm -f elf -o cowboot.o cowboot.S
objcopy -I binary -O elf32-i386 --rename-section .data=.cowdata boot_message.cow boot_message.o
ld -T pack_boot_section.ld cowboot.o boot_message.o -o cowboot.elf
ld:pack_boot_section.ld:20 cannot move location counter backwards (from 0000000000000561 to 00000000000001fe)
make: *** [Makefile:21: cowboot.elf] Error 1

b. I do not think that the boot signature has nothing to do with the boot image’s code: I think that the placing of the boot magic should happen during the build phase. cowboot.S should have a single role – being responsible for the assembly code being run at startup.

As for cowboot.S, it is compiled using nasm – an assembler with 16 bit support.

The first line declares that the file will be compiled to a 16-bit architecture

; computer boots at real mode - declare 16 bits operation
bits 16

After that, cowboot uses a series of calls to INT10H in order to draw the cow:

It must first clear the screen and sets video mode using sub-function 0:

_clear_screen:
    mov ah, 0x00 ; set video mode subfunction
    mov al, 0x03 ; text mode, 16 colors
    
    int 0x10

This is a must because SeaBIOS prints a default message that we want to clear.

Let’s see it in action using gdb:

In order to debug our image, we will run qemu-system-x86_64 with the following arguments:

$ qemu-system-x86_64 -S -gdb tcp::9000 loop.img
  • -S will instruct QEMU to not start the CPU
  • -gdb tcp::9000 will launch a GDB server hosted at localhost:9000

Connecting the server using GDB:

$ gdb 
GNU gdb (GDB) 9.2

(gdb) target remote localhost:9000
Remote debugging using localhost:9000

QEMU tells us that the screen had not been initialized yet (since SeaBIOS did not initialize the display yet):

We will place a breakpoint at address 0x7c00 – which is the entry point of the boot image:

(gdb) b *0x7c00
Breakpoint 1 at 0x7c00
(gdb) c
Continuing.

Breakpoint 1, 0x0000000000007c00 in ?? ()

SeaBIOS had just finished initialization so the IVT should be initialized and the following message is displayed on screen:

As a side note, the address of the IVT can change due to a call to LIDT. Following SeaBIOS’s code and launching qemu in debug mode I’ve confirmed that IVT base stayed at 0x0000.

Let’s calculate the address of INT10H from the IVT:

(gdb) set $entry_10_offset=(unsigned short)*(0x10 * 4)
(gdb) set $entry_10_segment=(unsigned short)*(0x10 * 4 + sizeof(short))
(gdb) p/x ($entry_10_segment * 16 + $entry_10_offset) & 0xfffff
$1 = 0xc5635

Stepping after the first call, we can see that the screen is clear:

Later on, the background color is set to black using sub-function 0x0b and finally a call to sub-function 0x13 prints the string pointed by the address _cow_start :

_set_black_bg_color:
    mov ah, 0x0b ; INT_10H 0bh - set color
    mov bh, 0 ; set background color
    mov bl, black_palette

    int 0x10

_write_cow:
    mov ax, 0x7c0 ; physical boot address
    mov es, ax

    mov bp, _cow_start ; string addr
    mov cx, _cow_len ; string len

    xor bh, bh ; page number 0
    xor dx, dx ; row / col 0
    mov al, 1 ; write mode - move cursor
    mov bl, green_palette ; green character color for cyber effect
    
    mov ah, 0x13 ; INT_10H 13h - write string
    int 0x10

And as expected, the message is printed to the screen:

The program ends with an infinite loop:

_loop:
    jmp $

The token $ evaluates to the address of the beginning of the line, making jmp $ an infinite loop.

Conclusion

Writing cowboot and this blog post made me understand the (very) basics of x86 booting, which was a lot of fun 🙂

Even though subjects like booting, real-mode programming, and even kernel-mode programming are not relevant to most developers, I do think that it is important to have a some understanding of those. There is great code in bootloaders, kernels, and even BIOSes that attempts to create the best interface to its users, which can be used as an inspiration for a higher-level design as well.

Notes concerning GDB

My GDB (and even a newly compiled on) failed to change the architecture of the remote target to i8086.

Turns out that there are open bugs on sourceware and launchpad describing the issue. The suggested workaround was to set the target’s description to a custom one and did not work for me.

I eventually gave up after debugging it after a while 🙁

(gdb) set architecture i8086 
warning: Selected architecture i8086 is not compatible with reported target architecture i386:x86-64
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB.  Attempting to continue with the default i8086 settings.

Architecture `i8086' not recognized.
The target architecture is set automatically (currently i386:x86-64)

As a last attempt, I’ve tried using gdb-multiarch as well with but no success.

From .rodata to .rwdata – introduction to memory mapping and LD scripts

A few days ago a colleague of mine, which had just started to learn C, was wondering about the following piece of code:

char *foo = "AAAAAA";
foo[0] = 'B';

This is described as a valid code, according to the tutorial he followed, yet when running it a segmentation fault occurs:

guy@localhost ~/b/string_elf> gcc sample_1.c
guy@localhost ~/b/string_elf> ./a.out 
fish: “./a.out” terminated by signal SIGSEGV (Address boundary error)

Why there is a segmentation fault?

We can easily find the instruction that caused the segmentation fault by debugging the process under GDB:

guy@localhost ~/b/string_elf> gdb a.out 
...

(gdb) set disassemble-next-line on
(gdb) r
Starting program: /home/guy/blog/string_elf/a.out 

Program received signal SIGSEGV, Segmentation fault.
0x00000000004004dd in main ()
=> 0x00000000004004dd <main+16>:	c6 00 42	movb   $0x42,(%rax)

It is easy to see that writing to address of $rax caused the crash – the instruction

movb   $0x42,(%rax)

was the last instruction that was run before the segmentation fault.

We can see that $rax contains the initial value of the string by using the examine command:

(gdb) x/6c $rax
0x400580:	65 'A'	65 'A'	65 'A'	65 'A'	65 'A'	65 'A'

In order to display the memory mapping of the process, we can use the file /proc/PID/maps. The info proc gdb command returns the PID:

(gdb) info proc
process 8596
cmdline = '/home/guy/blog/string_elf/a.out'
cwd = '/home/guy/blog/string_elf'
exe = '/home/guy/blog/string_elf/a.out'

(gdb) shell cat /proc/8596/maps
00400000-00401000 r-xp 00000000 fd:02 134817229 ...
00600000-00601000 r--p 00000000 fd:02 134817229 ...
00601000-00602000 rw-p 00001000 fd:02 134817229 ...
....

As seen from the memory mapping, the address that is stored in $rax, 0x400580, which resides in range 0x00400000-0x00401000, is marked as non-writable.

00400000-00401000 r-xp

Writing to it would obviously cause a segmentation fault, as the page is marked as non-writable.

An interesting question popped up:

how can we cause the string to be writable?

Before we continue any further, we need to make a quick detour and introduce the ELF file format.

A brief introduction to the ELF file format

The ELF file format, among other things, consists of sections that describe the logical memory layout of the binary.

Some typical sections one may find in an ELF file are:

  1. .text – which stores the instructions that consist of the program itself. It is marked as executable and read-only (r-x).
  2. .data – which is used to store static and global variables (non-static variables are stored on the stack). It is marked as read-write and non-executable (rw-).
  3. .bss – which stores non-initialized variables. It is marked as read-write and non-executable (rw-).
  4. .rodata – which stores constant data. One should expect string literals, and other constant values to reside there. It is marked as read-only (although usually resides in a read and executable segment).
Source: Wikipedia

The section header table stores information on the various sections – including their permissions, virtual address memory range, etc.

One can view the different sections using the readelf tool:

guy@localhost ~/b/string_elf> readelf -S a.out 
There are 31 section headers, starting at offset 0x1908:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
...
  [14] .text             PROGBITS         00000000004003e0  000003e0
...
  [16] .rodata           PROGBITS         0000000000400570  00000570
       000000000000003b  0000000000000000   A       0     0     8
...
  [25] .data             PROGBITS         0000000000601020  00001020
       0000000000000004  0000000000000000  WA       0     0     1
  [26] .bss              NOBITS           0000000000601024  00001024
       0000000000000004  0000000000000000  WA       0     0     1
...

objdump is another great tool. The VMA column stands for Virtual Memory Address and describes the base virtual address of the section.

guy@localhost ~/b/string_elf> objdump -h a.out 

a.out:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
 13 .text         00000182  00000000004003e0  00000000004003e0  000003e0  2**4
...
 15 .rodata       0000003b  0000000000400570  0000000000400570  00000570  2**3
...
 24 .data         00000004  0000000000601020  0000000000601020  00001020  2**0
                  CONTENTS, ALLOC, LOAD, DATA
 25 .bss          00000004  0000000000601024  0000000000601024  00001024  2**0
...

In addition to sections, the elf file format consists of a Program Header table which describes how different sections are grouped into segments in memory. The elf loader creates the memory mapping of the process according to those segments:

When an ELF file format is being loaded into memory, via a call to execve(), the kernel only examines the segments in order to set the memory mapping of the process. The kernel does not care about individual sections.

Here is a snippet of the load_elf_binary function (kernel 3.18) – it can be seen that the kernel only considers the program headers (segments) and calls elf_map (which in turn calls vm_mmap) which mmaps each segment to its VMA with its given flags:

static int load_elf_binary(struct linux_binprm *bprm) {
    ...
    /* iterate over the program headers */
    for(i = 0, elf_ppnt = elf_phdata;
        i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
            int elf_prot = 0, elf_flags;
	    unsigned long k, vaddr;

    ...
    /* store the virtual address of the segment */
        vaddr = elf_ppnt->p_vaddr;
    ...
    /* mmap the segment to its virtual address with the permissions that are specified in the program header table */
        error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,		 
            elf_prot, elf_flags, 0);
    ...	
}	

Let’s find the section in which $rax resides in via GDB (since we already covered readelf and objdump):

(gdb) maintenance info sections
Exec file:
    `/home/guy/blog/string_elf/a.out', file type elf64-x86-64.
    ...
   
    0x00400570->0x004005ab at 0x00000570: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS
    ...

We can then see, that the string was put at the “.rodata” section which is marked as READONLY. Writing to it would obviously cause a segmentation fault.

A run-time patch: mprotect

mprotect is a syscall that sets protection on a region of memory:

#include <sys/mman.h>

int mprotect(void *addr, size_t len, int prot);

It sets permissions of the memory region starting at addr and ending in addr+len with the following permissions (which are passed via prot):

PROT_NONE  The memory cannot be accessed at all.

PROT_READ  The memory can be read.

PROT_WRITE The memory can be modified.

PROT_EXEC  The memory can be executed.

This means that we can call mprotect with the address of the page containing the string and set it to:

PROT_WRITE | PROT_READ | PROT_EXEC

(which is equal to 7).

Why would we mark it as executable? Because “.text” maps to this page as well! We can see that by examining the section-to-segment mapping using readelf:

guy@localhost ~/b/string_elf> readelf -l a.out

Elf file type is EXEC (Executable file)
Entry point 0x4003e0
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000006b4 0x00000000000006b4  R E    200000
...

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag ... .plt.got .text .fini .rodata ... 
   03     .init_array .fini_array ...
   04     .dynamic 
   05     .note.ABI-tag .note.gnu.build-id 
   06     .eh_frame_hdr 
   07     
   08     .init_array .fini_array .jcr .dynamic .got 

We can find the page address by aligning $rax to a page size:

(gdb) $rax - $rax%4096

Here is the continuation of the GDB session, in which a breakpoint was set before the write;

A call to mprotect was then performed, changing the page which $rax resides in to be writable:

(gdb) p mprotect($rax - $rax%4096, 4096, 7)

And success! The binary exited successfully!

(gdb) b
Breakpoint 1 at 0x4004dd
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/guy/blog/string_elf/a.out 

Breakpoint 1, 0x00000000004004dd in main ()
=> 0x00000000004004dd <main+16>:	c6 00 42	movb   $0x42,(%rax)
(gdb) p mprotect($rax - $rax%4096, 4096, 7)
$1 = 0
(gdb) si
0x00000000004004e0 in main ()
=> 0x00000000004004e0 <main+19>:	5d	pop    %rbp
(gdb) x/6c $rax
0x400580:	66 'B'	65 'A'	65 'A'	65 'A'	65 'A'	65 'A'

While this method works, it is not ideal and suffers from a lot of problems:

  1. This is a run-time solution, meaning that additional instructions are required to be run.
  2. We have to know the size of the buffer we want to change in advance – this method will fail when the size of .rodata is over one page.
  3. Since multiple sections can reside in a single page, a call to mprotect would possibly change permissions of other sections.

We can do better: LD scripts

If we could change the segment in which “.rodata” resides in to a writable segment it would have been perfect! That is where LD scripts come into play.

The segment to section mapping, among other things, is determined during the linkage of the program.

GCC, the compiler, creates an object file, which already contains some sections inside – such as the “.text”, “.data”, and “.bss”. According to the GCC documentation, every output must contain, at least, a text section.

The linker, LD, takes a bunch of object files and combines them into an ELF file. Commands are passed to LD using an ld script.

source: wikipedia

The main purpose of the linker script is to describe how the sections in the input files should be mapped into the output file and to control the memory layout of the output file. The linker always uses a linker script.

Using the default linker script

The default linker script can be obtained by running

ld --verbose

The output is quite long so I put it into the following gist.

I’ve removed some parts in order to create a simplified view of the default linker script:

SECTIONS {
   /* Read-only sections, merged into text segment: */
   . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
   .text : {*(.text) }
   .rodata : {*(.rodata .rodata.*) }

   /* Read-write sections */
   . = DATA_SEGMENT_ALIGN (...);
   .data : {*(.data .data.*) }
}

The SECTION block describes how section from the object files, that are given as input to the linker, will map to section in the output ELF file.

For example, the line:

.text : {*(.text) }

Defines an output section named “.text” (left-hand side), which contains the “.text” section from all the input files – using a wildcard:

{*(.text) }

The “.” symbol is used to describe the current memory location and is called the location counter. Output sections are mapped to the location counter and the location counter is incremented by the size of the output section.

The line

. = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;

Sets the location counter to the start of the text segment, at VMA 0x400000. Since the text segment is marked as readable and executable, sections that are put there will be non-writable – which in this case are “.text” and “.rodata”.

It can later be seen that the location counter is set to the data segment, and the data section is put there.

An easy hack would be to put the .rodata section after updating the location to the data segment. Let’s create a segment called “.rwdata” which would replace “.rodata”:

.rwdata : {*(.rodata .rodata.*) }

And we will change the linker script as follows – we will remove that “.rodata” section, and insert our “.rwdata” inside the data section:

SECTIONS {
   /* Read-only sections, merged into text segment: */
   . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
   .text : {*(.text) }

   /* Read-write sections */
   . = DATA_SEGMENT_ALIGN (...);
   .data : {*(.data .data.*) }

   /* ---- our evil hack ---- /*
   .rwdata : {*(.rodata .rodata.*) }
}

The full linker script can be found here.

GCC can take a non-default linker script using the -T option.

Let’s try and compile the following code using our modified linker script:

#include <stdio.h>

int main(void) {
    char *foo = "AAAAAA";

    printf("printing string foo %s\n", foo);
    foo[0] = 'B';
    printf("printing string foo %s\n", foo);
    
    return 0;
}

I’ve written a Makefile that invokes GCC using the modified linker script:

guy@localhost ~/b/ld_script_elf_blog_post> make
mkdir -p build
gcc -T rwdata.ld sample.c -o build/sample  	

And when running:

guy@localhost ~/b/ld_script_elf_blog_post> ./build/sample 
printing string foo AAAAAA
printing string foo BAAAAA

It works!

Listing the section-to-segment mapping of the ELF, we see a new “.rwdata” section and no “.rodata” section:

guy@localhost ~/b/ld_script_elf_blog_post> readelf -l build/sample 

...
             
 Section to Segment mapping:
  Segment Sections...
   ...
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini .eh_frame_hdr .eh_frame 
   03     .init_array .fini_array .jcr .dynamic .got .got.plt .data .rwdata .bss 
   ...

Conclusion

LD scripts can be very useful when a tight control over the memory mapping is needed, which is something that is sometimes needed when programming for an embedded target (see the following example), or for other esoteric needs – for example, making the “.text” section writable for a self-modifying binary.

I hope you found this post interesting. This is my first-ever post, so any comments would be much appreciated 🙂