A practical example

We can walk through the steps taken to build a simple application step by step.

Note that when you type gcc that actually runs a driver program that hides most of the steps from you. Under normal circumstances this is exactly what you want, because the exact commands and options to get a real life working executable on a real system can be quite complicated and architecture specific.

We will show the compliation process with the two following examples. Both are C source files, one defined the main() function for the inital program entry point, and another declares a helper type function. There is one global variable too, just for illustration.

Example 7.4. Hello World
  1 
                  #include <stdio.h>
    
    /* We need a prototype so the compiler knows what types function() takes */
  5 int function(char *input);
    
    /* Since this is static, we can define it in both hello.c and function.c */
    static int i = 100;
    
 10 /* This is a global variable */
    int global = 10;
    
    int main(void)
    {
 15 	/* function() should return the value of global */
    	int ret = function("Hello, World!");
    	exit(ret);
    }
    
 20             

Example 7.5. Function Example
  1 
                  #include <stdio.h>
    
    static int i = 100;
  5 
    /* Declard as extern since defined in hello.c */
    extern int global;
    
    int function(char *input)
 10 {
    	printf("%s\n", input);
    	return global;
    }
    
 15             

Compiling

All compilers have an option to only execute the first step of compilation. Usually this is something like -S and the output will generally be put into a file with the same name as the input file but with a .s extension.

Thus we can show the first step with gcc -S as illustrated in the example below.

Example 7.6. Compilation Example
  1 
                  ianw@lime:~/programs/csbu/wk7/code$ gcc -S hello.c
    ianw@lime:~/programs/csbu/wk7/code$ gcc -S function.c
    ianw@lime:~/programs/csbu/wk7/code$ cat function.s
  5         .file   "function.c"
            .pred.safe_across_calls p1-p5,p16-p63
            .section        .sdata,"aw",@progbits
            .align 4
            .type   i#, @object
 10         .size   i#, 4
    i:
            data4   100
            .section        .rodata
            .align 8
 15 .LC0:
            stringz "%s\n"
            .text
            .align 16
            .global function#
 20         .proc function#
    function:
            .prologue 14, 33
            .save ar.pfs, r34
            alloc r34 = ar.pfs, 1, 4, 2, 0
 25         .vframe r35
            mov r35 = r12
            adds r12 = -16, r12
            mov r36 = r1
            .save rp, r33
 30         mov r33 = b0
            .body
            ;;
            st8 [r35] = r32
            addl r14 = @ltoffx(.LC0), r1
 35         ;;
            ld8.mov r37 = [r14], .LC0
            ld8 r38 = [r35]
            br.call.sptk.many b0 = printf#
            mov r1 = r36
 40         ;;
            addl r15 = @ltoffx(global#), r1
            ;;
            ld8.mov r14 = [r15], global#
            ;;
 45         ld4 r14 = [r14]
            ;;
            mov r8 = r14
            mov ar.pfs = r34
            mov b0 = r33
 50         .restore sp
            mov r12 = r35
            br.ret.sptk.many b0
            ;;
            .endp function#
 55         .ident  "GCC: (GNU) 3.3.5 (Debian 1:3.3.5-11)"
    
                

The assembly is a little to complex to fully describe, but you should be able to see where i is defined as a data4 (i.e. 4 bytes or 32 bits, the size of an int), where function is defined (function:) and a call to printf().

We now have two assembly files ready to be assembled into machine code!

Assembly

Assembly is a fairly straight forward process. The assembler is usually called as and takes arguments in a similar fasion to gcc

Example 7.7. Assembly Example
  1 
                  ianw@lime:~/programs/csbu/wk7/code$ as -o function.o function.s
    ianw@lime:~/programs/csbu/wk7/code$ as -o hello.o hello.s
    ianw@lime:~/programs/csbu/wk7/code$ ls
  5 function.c  function.o  function.s  hello.c  hello.o  hello.s
    
                

After assembling we have object code, which is ready to be linked together into the final executable. You can usually skip having to use the assembler by hand by calling the compiler with -c, which will directly convert the input file to object code, putting it in a file with the same prefix but .o as an extension.

We can't inspect the object code directly, as it is in a binary format (in future weeks we will learn about this binary format). However we can use some tools to inspect the object files, for example readelf --symbols will show us symbols in the object file.

Example 7.8. Readelf Example
  1 
                  ianw@lime:~/programs/csbu/wk7/code$ readelf --symbols ./hello.o
    
    Symbol table '.symtab' contains 15 entries:
  5    Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
         1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
         2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
         3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
 10      4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
         5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
         6: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    5 i
         7: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
         8: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
 15      9: 0000000000000000     0 SECTION LOCAL  DEFAULT    8
        10: 0000000000000000     0 SECTION LOCAL  DEFAULT   10
        11: 0000000000000004     4 OBJECT  GLOBAL DEFAULT    5 global
        12: 0000000000000000    96 FUNC    GLOBAL DEFAULT    1 main
        13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND function
 20     14: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND exit
    
    ianw@lime:~/programs/csbu/wk7/code$ readelf --symbols ./function.o
    
    Symbol table '.symtab' contains 14 entries:
 25    Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
         1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS function.c
         2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
         3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
 30      4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
         5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
         6: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    5 i
         7: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
         8: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
 35      9: 0000000000000000     0 SECTION LOCAL  DEFAULT    8
        10: 0000000000000000     0 SECTION LOCAL  DEFAULT   10
        11: 0000000000000000   128 FUNC    GLOBAL DEFAULT    1 function
        12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
        13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND global
 40 
                

Although the output is quite complicated (again!) you should be able to understand much of it. For example

  • In the output of hello.o have a look at the symbol with name i. Notice how it says it is LOCAL? That is because we declared it static and as such it has been flagged as being local to this object file.

  • In the same output, notice that the global variable is defined as a GLOBAL, meaning that it is visible outside this file. Similarly the main() function is externally visable.

  • Notice that the function symbol (for the call to function() is left has UND or undefined. This means that it has been left for the linker to find the address of the function.

  • Have a look at the symbols in the function.c file and how they fit into the output.

Linking

Actually invoking the linker, called ld, is a very complicated process on a real system (are you sick of hearing this yet?). This is why we leave the linking process up to gcc.

But of course we can spy on what gcc is doing under the hood with the -v (verbose) flag.

Example 7.9. Linking Example
  1 
                   /usr/lib/gcc-lib/ia64-linux/3.3.5/collect2 -static 
    /usr/lib/gcc-lib/ia64-linux/3.3.5/../../../crt1.o 
    /usr/lib/gcc-lib/ia64-linux/3.3.5/../../../crti.o 
  5 /usr/lib/gcc-lib/ia64-linux/3.3.5/crtbegin.o 
    -L/usr/lib/gcc-lib/ia64-linux/3.3.5 
    -L/usr/lib/gcc-lib/ia64-linux/3.3.5/../../.. 
    hello.o 
    function.o 
 10 --start-group 
    -lgcc 
    -lgcc_eh 
    -lunwind 
    -lc 
 15 --end-group 
    /usr/lib/gcc-lib/ia64-linux/3.3.5/crtend.o 
    /usr/lib/gcc-lib/ia64-linux/3.3.5/../../../crtn.o
    
                

The first thing you notice is that a program called collect2 is being called. This is a simple wrapper around ld that is used internally by gcc.

The next thing you notice is object files starting with crt being specified to the linker. These functions are provided by gcc and the system libraries and contain code required to start the program. In actuality, the main() function is not the first one called when a program runs, but a function called _start which is in the crt object files. This function does some generic setup which application programmers do not need to worry about.

The path heirarchy is quite complicated, but in essence we can see that the final step is to link in some extra object files, namely

  • crt1.o : provided by the system libraries (libc) this object file contains the _start function which is actually the first thing called within the program.

    crti.o : provided by the system libraries

    crtbegin.o

    crtsaveres.o

    crtend.o

    crtn.o

We discuss how these are used to start the program a little later.

Next you can see that we link in our two object files, hello.o and function.o. After that we specify some extra libraries with -l flags. These libraries are system specific and required for every program. The major one is -lc which brings in the C library, which has all common functions like printf().

After that we again link in some more system object files which do some cleanup after programs exit.

Although the details are complicated, the concept is straight forward. All the object files will be linked together into a single executable file, ready to run!

The Executable

We will go into more details about the executable in the short future, but we can do some inspection in a similar fashion to the object files to see what has happened.

Example 7.10. Executable Example
  1 
                  ianw@lime:~/programs/csbu/wk7/code$ gcc -o program hello.c function.c
    ianw@lime:~/programs/csbu/wk7/code$ readelf --symbols ./program
    
  5 Symbol table '.dynsym' contains 11 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
         1: 6000000000000de0     0 OBJECT  GLOBAL DEFAULT  ABS _DYNAMIC
         2: 0000000000000000   176 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2 (2)
 10      3: 600000000000109c     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
         4: 0000000000000000   704 FUNC    GLOBAL DEFAULT  UND exit@GLIBC_2.2 (2)
         5: 600000000000109c     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
         6: 6000000000000fe8     0 OBJECT  GLOBAL DEFAULT  ABS _GLOBAL_OFFSET_TABLE_     7: 60000000000010b0     0 NOTYPE  GLOBAL DEFAULT  ABS _end
         8: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
 15      9: 0000000000000000   544 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2 (2)
        10: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    
    Symbol table '.symtab' contains 127 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
 20      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
         1: 40000000000001c8     0 SECTION LOCAL  DEFAULT    1
         2: 40000000000001e0     0 SECTION LOCAL  DEFAULT    2
         3: 4000000000000200     0 SECTION LOCAL  DEFAULT    3
         4: 4000000000000240     0 SECTION LOCAL  DEFAULT    4
 25      5: 4000000000000348     0 SECTION LOCAL  DEFAULT    5
         6: 40000000000003d8     0 SECTION LOCAL  DEFAULT    6
         7: 40000000000003f0     0 SECTION LOCAL  DEFAULT    7
         8: 4000000000000410     0 SECTION LOCAL  DEFAULT    8
         9: 4000000000000440     0 SECTION LOCAL  DEFAULT    9
 30     10: 40000000000004a0     0 SECTION LOCAL  DEFAULT   10
        11: 40000000000004e0     0 SECTION LOCAL  DEFAULT   11
        12: 40000000000005e0     0 SECTION LOCAL  DEFAULT   12
        13: 4000000000000b00     0 SECTION LOCAL  DEFAULT   13
        14: 4000000000000b40     0 SECTION LOCAL  DEFAULT   14
 35     15: 4000000000000b60     0 SECTION LOCAL  DEFAULT   15
        16: 4000000000000bd0     0 SECTION LOCAL  DEFAULT   16
        17: 4000000000000ce0     0 SECTION LOCAL  DEFAULT   17
        18: 6000000000000db8     0 SECTION LOCAL  DEFAULT   18
        19: 6000000000000dd0     0 SECTION LOCAL  DEFAULT   19
 40     20: 6000000000000dd8     0 SECTION LOCAL  DEFAULT   20
        21: 6000000000000de0     0 SECTION LOCAL  DEFAULT   21
        22: 6000000000000fc0     0 SECTION LOCAL  DEFAULT   22
        23: 6000000000000fd0     0 SECTION LOCAL  DEFAULT   23
        24: 6000000000000fe0     0 SECTION LOCAL  DEFAULT   24
 45     25: 6000000000000fe8     0 SECTION LOCAL  DEFAULT   25
        26: 6000000000001040     0 SECTION LOCAL  DEFAULT   26
        27: 6000000000001080     0 SECTION LOCAL  DEFAULT   27
        28: 60000000000010a0     0 SECTION LOCAL  DEFAULT   28
        29: 60000000000010a8     0 SECTION LOCAL  DEFAULT   29
 50     30: 0000000000000000     0 SECTION LOCAL  DEFAULT   30
        31: 0000000000000000     0 SECTION LOCAL  DEFAULT   31
        32: 0000000000000000     0 SECTION LOCAL  DEFAULT   32
        33: 0000000000000000     0 SECTION LOCAL  DEFAULT   33
        34: 0000000000000000     0 SECTION LOCAL  DEFAULT   34
 55     35: 0000000000000000     0 SECTION LOCAL  DEFAULT   35
        36: 0000000000000000     0 SECTION LOCAL  DEFAULT   36
        37: 0000000000000000     0 SECTION LOCAL  DEFAULT   37
        38: 0000000000000000     0 SECTION LOCAL  DEFAULT   38
        39: 0000000000000000     0 SECTION LOCAL  DEFAULT   39
 60     40: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        41: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        42: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        43: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        44: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
 65     45: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        46: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        47: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        48: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        49: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
 70     50: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS abi-note.S
        51: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        52: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS abi-note.S
        53: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        54: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS abi-note.S
 75     55: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        56: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        57: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        58: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
        59: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS abi-note.S
 80     60: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS init.c
        61: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        62: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        63: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS initfini.c
        64: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
 85     65: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        66: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        67: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        68: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
        69: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
 90     70: 4000000000000670   128 FUNC    LOCAL  DEFAULT   12 gmon_initializer
        71: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        72: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        73: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS initfini.c
        74: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
 95     75: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        76: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        77: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        78: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
        79: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
100     80: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS auto-host.h
        81: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        82: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
        83: 6000000000000fc0     0 NOTYPE  LOCAL  DEFAULT   22 __CTOR_LIST__
        84: 6000000000000fd0     0 NOTYPE  LOCAL  DEFAULT   23 __DTOR_LIST__
105     85: 6000000000000fe0     0 NOTYPE  LOCAL  DEFAULT   24 __JCR_LIST__
        86: 6000000000001088     8 OBJECT  LOCAL  DEFAULT   27 dtor_ptr
        87: 40000000000006f0   128 FUNC    LOCAL  DEFAULT   12 __do_global_dtors_aux    
        88: 4000000000000770   128 FUNC    LOCAL  DEFAULT   12 __do_jv_register_classes
        89: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
110     90: 6000000000001090     4 OBJECT  LOCAL  DEFAULT   27 i
        91: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS function.c
        92: 6000000000001098     4 OBJECT  LOCAL  DEFAULT   27 i
        93: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS auto-host.h
        94: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
115     95: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
        96: 6000000000000fc8     0 NOTYPE  LOCAL  DEFAULT   22 __CTOR_END__
        97: 6000000000000fd8     0 NOTYPE  LOCAL  DEFAULT   23 __DTOR_END__
        98: 6000000000000fe0     0 NOTYPE  LOCAL  DEFAULT   24 __JCR_END__
        99: 6000000000000de0     0 OBJECT  GLOBAL DEFAULT  ABS _DYNAMIC
120    100: 4000000000000a70   144 FUNC    GLOBAL HIDDEN   12 __do_global_ctors_aux
       101: 6000000000000dd8     0 NOTYPE  GLOBAL DEFAULT  ABS __fini_array_end
       102: 60000000000010a8     8 OBJECT  GLOBAL HIDDEN   29 __dso_handle
       103: 40000000000009a0   208 FUNC    GLOBAL DEFAULT   12 __libc_csu_fini
       104: 0000000000000000   176 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.2
125    105: 40000000000004a0    32 FUNC    GLOBAL DEFAULT   10 _init
       106: 4000000000000850   128 FUNC    GLOBAL DEFAULT   12 function
       107: 40000000000005e0   144 FUNC    GLOBAL DEFAULT   12 _start
       108: 6000000000001094     4 OBJECT  GLOBAL DEFAULT   27 global
       109: 6000000000000dd0     0 NOTYPE  GLOBAL DEFAULT  ABS __fini_array_start
130    110: 40000000000008d0   208 FUNC    GLOBAL DEFAULT   12 __libc_csu_init
       111: 600000000000109c     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
       112: 40000000000007f0    96 FUNC    GLOBAL DEFAULT   12 main
       113: 6000000000000dd0     0 NOTYPE  GLOBAL DEFAULT  ABS __init_array_end
       114: 6000000000000dd8     0 NOTYPE  WEAK   DEFAULT   20 data_start
135    115: 4000000000000b00    32 FUNC    GLOBAL DEFAULT   13 _fini
       116: 0000000000000000   704 FUNC    GLOBAL DEFAULT  UND exit@@GLIBC_2.2
       117: 600000000000109c     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
       118: 6000000000000fe8     0 OBJECT  GLOBAL DEFAULT  ABS _GLOBAL_OFFSET_TABLE_   
       119: 60000000000010b0     0 NOTYPE  GLOBAL DEFAULT  ABS _end
140    120: 6000000000000db8     0 NOTYPE  GLOBAL DEFAULT  ABS __init_array_start
       121: 6000000000001080     4 OBJECT  GLOBAL DEFAULT   27 _IO_stdin_used
       122: 60000000000010a0     8 OBJECT  GLOBAL DEFAULT   28 __libc_ia64_register_back
       123: 6000000000000dd8     0 NOTYPE  GLOBAL DEFAULT   20 __data_start
       124: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
145    125: 0000000000000000   544 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
       126: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    
                

Some things to note

  • Note I built the executable the "easy" way!

  • See there are two symbol tables; the dynsym and symtab ones. We explain how the dynsym symbols work soon, but notice that some of them are versioned with an @ symbol.

  • Note the many symbols that have been included from the extra object files. Many of them start with __ to avoid clashing with any names the programmer might choose. Read through and pick out the symbols we mentioned before from the object files and see if they have changed in any way.