As we already know the compilation process involves four steps,
So the actual compilation process goes like the following:
$ gcc -E prog.c > prog.i # pre-process
$ gcc -s prog.i # assemble
$ as -o prog.o prog.s # compile
$ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
/usr/lib64/crt1.o \
/usr/lib64/crti.o \
/usr/lib64/crtn.o -lc prog.o # link
We will cover each of the process in detail in later sections, but the linker stage is our focus right now.
The options to ld
specify that our object code should be dynamically linked
and the dynamic linker to be used is ld-linux-x86-64.so
. The rest of the
object files are C run time code that gets compiled into the object code. This
is where the _start
comes into out object file. Every C program has a start up
code compiled in unless you explicitly say not to link start up code to the
compiler. Details about the C run time routines will be given in later in this
article.
The job of the linker is to resolve external references present in the program and appropriately map them to the libraries and other object files. In case of static linking the object code present in the libraries that are referenced by the program are directly linked into the program executable. This makes the program load faster but the size of the binary is increased. During the loading of the program, the kernel only needs to relocate the image to its new address by just adding the memory offset with the addresses present in the binary.
Static linking poses a bigger problem when many programs in the system are used the same set of code from the libraries. The combined memory of the duplicated code can impact the available memory considerably. Also when code present in a library is updated, the entire program needs to be re-linked. This is a problem for vendors as the entire binary needs to be packed and shipped.
To overcome this dynamic linking was conceived. Here, when the program linking happens, the undefined references found in the program are searched in library files and only a reference to the symbol is placed in the binary. This means, none of the code in the external libraries are in the binary. Since the references are not loaded into the binary, relocation cannot be performed until the load time.
Continue reading other posts in the series, "Life of a process"Site design and logo and content © fossix.org