Linker

Often in a large program, you will separate out code into multiple files to keep related functions together. Each of these files can be compiled into object code: but your final goal is to create a single executable! There needs to be some way combining each of these object files into a single executable. We call this linking.

Note that even if your program does fit in one file it still needs to be linked against certain system libraries to operate correctly. For example, the printf call is kept in a library which must be combined with your executable to work. So although you do not explicilty have to worry about linking in this case, there is most certainly still a linking process happening to create your executable.

In the following sections we explain some terms essential to understanding linking.

Symbols

Symbols

Variables and functions all have names in source code which we refer to them by. One way of thinking of a statement declaring a variable int a is that you are telling the compiler "set aside some memory of sizeof(int) and from now on when I use a it will refer to this allocated memory. Similarly a function says "store this code in memory, and when I call function() jump to and execute this code".

In this case, we call a and function symbols since they are a symbolic representation of an area of memory.

Symbols help humans to understand programming. You could say that the primary job of the compilation process is to remove symbols -- the processor doesn't know what a represents, all it knows is that it has some data at a particular memory address. The compilation process needs to convert a += 2 to something like "increment the value in memory at 0xABCDE by 2.

Symbol Visibility

In some C programs, you may have seen the terms static and extern used with variables. These modifiers can effect what we call the visibility of symbols.

Imagine you have split up your program in two files, but some functions need to share a variable. You only want one definition (i.e. memory location) of the shared variable (otherwise it wouldn't be shared!), but both files need to reference it.

To enable this, we declare the variable in one file, and then in the other file declare a variable of the same name but with the prefix extern. extern stands for external and to a human means that this variable is declared somewhere else.

What extern says to a compiler is that it should not allocate any space in memory for this variable, and leave this symbol in the object code where it will be fixed up later. The compiler can not possibly know where the symbol is actually defined but the linkerdoes, since it is it's job to look at all object files together and combine them into a single executable. So the linker will see the symbol left over in the second file, and say "I've seen that symbol before in file 1, and I know that it refers to memory location 0x12345". Thus it can modify the symbol value to be the memory value of the variable in the first file.

static is almost the opposite of extern. It places restrictions on the visiblity of the symbol it modifies. If you declare a variable with static that says to the compiler "don't leave any symbols for this in the object code". This means that when the linker is linking together object files it will never see that symbol (and so can't make that "I've seen this before!" connection). static is good for separation and reducing conflicts -- by declaring a variable static you can reuse the variable name in other files and not end up with symbol clashes. We say we are restricting the visiblity of the symbol, because we are not allowing the linker to see it. Contrast this with a more visible symbol (one not declared with static) which can be seen by the linker.

The linking process

Thus the linking process is really two steps; combining all object files into one exectuable file and then going through each object file to resolve any symbols. This usually requires two passes; one to read all the symbol definitions and take note of unresolved symbols and a second to fix up all those unresolved symbols to the right place.

The final executable should end up with no unresolved symbols; the linker will fail with an error if there are any.[22]



[22] We call this static linking. Dynamic linking is a similar concept done at executable runtime, and is described a little later on.