Representing executable files

Three Standard Sections

Any executable file format will need to specify where the code and data are in the binary file.

One additional component we have not mentioned until now is storage space of uninitialised global variables. If we declare a variable and give it an initial value this obviously needs to be stored in the executable file so that upon execution the value is correct. However many variables are uninitialised (or zero) when the program is first executed. Making space for these in the executable and then simply storing zero or NULL values in it is a waste of space, needlessly bloating the executable file size. Thus each executable file can define a BSS section which simply gives a size for the uninitialised data; on program load the extra memory can be allocated (and set to zero!).[23]

Binary Format

The executable is created by the toolchain from the source code. This file needs to be in a format explicitly defined such that the compiler can create it and the operating system can identify it and load into memory, turning it into a running process that the operating system can manage. This executable file format can be specific to the operating system, as we would not normally expect that a program compiled for one system will execute on another (for example, you don't expect your Windows programs to run on Linux, or your Linux programs to run on OS X).

However, the common thread between all executable file formats is that they include a predefined, standardised header which describes how program code and data are stored in the rest of the file. In words, it would generally describe "the program code starts 20 bytes into this file, and is 50 kilobytes long. The program data follows it and is 20 kilobytes long".

In recent times one particular format has become the defacto standard for executable representation for modern UNIX type systems. It is called the Executable and Linker Format, or ELF for short; we'll be looking at it in more detail soon.

Binary Format History

a.out

ELF was not always the standard; original UNIX systems used a file format called a.out. We can see the vestiges of this if you compile a program without the -o option to specify an output file name; the executable will be created with a default name of a.out[24].

a.out is a very simple header format that only allows a single data, code and bss section. As you will come to see, this is insufficient for modern systems with dynamic libraries.

COFF

The Common Object File Format, or COFF, was the precursor to ELF. It's header format was more flexible, allowing an more (but limited) sections in the file.

COFF also has difficulties with elegant support of shared libraries, and ELF was selected as an alternative implementation on Linux.

However, COFF lives on in Microsoft Windows as the Portable Executable or PE format. PE is to Windows as ELF is to Linux.



[23] BSS probably stands for Block Started by Symbol, an assembly command for a old IBM computer.

[24] In fact, a.out is the default output filename from the linker. The compiler generally uses randomly generated file names as intermediate files for assembly and object code.