01 The C Compilation Pipeline
In C, "compiling" is actually a pipeline process known as Translation. The compiler driver (e.g., GCC) orchestrates four distinct stages to transform human-readable source code into machine-executable binaries.
0. Overview
- Toolchain: We use GCC (GNU Compiler Collection) as the primary example.
- The Pipeline:
Source (.c)\(\to\)Preprocessed (.i)\(\to\)Assembly (.s)\(\to\)Object (.o)\(\to\)Executable.
Stage 1: Preprocessing
Goal: Pure text manipulation and substitution.
Tool: The Preprocessor (cpp).
Key Actions
- Macro Expansion: Replaces all defined macros (e.g.,
#define NUM 100) with their literal values. - Header Inclusion: Copies the entire content of header files (e.g.,
#include <stdio.h>) directly into the source file. This causes the file size to expand significantly. - Conditional Compilation: Evaluates directives like
#ifdef,#if, keeping or removing code blocks. - Cleanup: Strips out all comments (
//,/* ... */) and adds line markers for debugging.
Command Line
gcc -E hello.c -o hello.i
- Flag:
-E(Stop after preprocessing). - Output:
.ifile (Still pure C source code, readable via text editor).
Stage 2: Compilation (The "Proper" Compilation)
Goal: Translate C code into Assembly Language.
Tool: The Compiler (cc1).
Key Actions
- Analysis: Performs lexical, syntactic, and semantic analysis (checks for syntax errors and type mismatches).
- Optimization: If flags like
-O2are used, the compiler optimizes logic (e.g., dead code elimination, loop unrolling). - Code Generation: Translates the logic into the specific Instruction Set Architecture (ISA) of the target machine (e.g., x86_64 or ARM64).
Command Line
gcc -S hello.i -o hello.s
- Flag:
-S(Compile only, do not assemble). - Output:
.sfile (Text file containing Assembly Instructions). - Example Content:
assembly movl $100, %esi call printf
Stage 3: Assembly
Goal: Translate Assembly instructions into Machine Code.
Tool: The Assembler (as).
Key Actions
- Translation: Converts human-readable mnemonics (e.g.,
mov,push) into binary opcodes defined by the ISA. - Generation of "Relocatable Object File":
- Machine Code: The CPU can read it, but it cannot run yet.
- Placeholder Addresses: Addresses for functions/variables are set to
0x0or relative offsets because the final memory layout is unknown. - Symbol Table: A list of symbols defined (exported) and symbols needed (imported/undefined) by this file. [[06 Intermediate Product — Relocatable Object File (.o)#Symbol Table | Symbol Table (brief)]]
Command Line
gcc -c hello.s -o hello.o
- Flag:
-c(Compile and Assemble, but do not link). - Output:
.o(Linux) or.obj(Windows). This is a Binary file. - Verification: Use
objdump -d hello.oto view the machine code.
Stage 4: Linking
Goal: Combine object files and libraries to create the Executable.
Tool: The Linker (ld).
Key Actions
- Section Merging: Combines the
.text(code) and.data(variables) segments from multiple.ofiles into one unified layout. - Symbol Resolution:
- Scans the Symbol Table of all inputs.
- Matches "Undefined References" (e.g., a call to
printfinmain.o) with their "Definitions" (e.g., the actual code ofprintfinlibc.soorlibc.a).
- Relocation:
- Calculates the final virtual memory addresses.
- Patches the machine code: Replaces the placeholder addresses (from Stage 3) with real addresses (Absolute or PC-Relative).
- Entry Point: Links the startup code (CRT - C Runtime, typically
_start) which initializes the environment and callsmain.
Command Line
gcc hello.o -o hello
- Flag: None (defaults to linking) or specific linker flags (
-l,-L). - Output: Final Executable (e.g.,
a.out,hello,hello.exe).
GCC Flexibility (The "Train Station" Concept)
GCC acts as a driver program. It detects the file extension and automatically runs the necessary previous steps. You can "board the train" at any stage:
| Input File Extension | Content | GCC Action |
|---|---|---|
.c |
C Source | Preprocess \(\to\) Compile \(\to\) Assemble \(\to\) Link |
.i |
Preprocessed C | Compile \(\to\) Assemble \(\to\) Link |
.s |
Assembly Source | Assemble \(\to\) Link |
.o |
Machine Code | Link |
Example of Flexibility
You can mix source files and object files in a single command:
# GCC will compile main.c to object code, then link it with the existing utils.o
gcc main.c utils.o -o my_app
Pro Tip: The All-in-One Inspector (--save-temps)
If you want to view all intermediate files (.i, .s, .o) without running GCC three separate times, you can use the --save-temps flag.
By default, GCC deletes these temporary files after each stage to keep your directory clean. This flag tells GCC to keep them.
Command Line
gcc --save-temps hello.c -o hello
Result
After running this single command, your directory will contain:
1. hello.i (Preprocessed source)
2. hello.s (Assembly code)
3. hello.o (Object code)
4. hello (Executable)
Why use it? It is the most efficient way to debug the compilation process or verify what the preprocessor and compiler are actually doing to your code.
Summary Table
| Stage | Input | Action | Output | GCC Flag |
|---|---|---|---|---|
| Preprocessing | Source Code (.c) |
Macro expansion, Text substitution | Pure C Code (.i) |
-E |
| Compilation | Pure C Code (.i) |
Syntax check, Optimize, Target ISA mapping | Assembly (.s) |
-S |
| Assembly | Assembly (.s) |
Translate to Binary, Create Symbol Table | Relocatable Object (.o) |
-c |
| Linking | Object (.o) + Libs |
Merge sections, Resolve symbols, Relocate addresses | Executable | (None) |