05 The Foundation — Instruction Set Architecture (ISA)
1. What is ISA?
The Instruction Set Architecture (ISA) serves as the fundamental "contract" or interface between the software (Compiler/Programmer) and the hardware (CPU).
- Key Components:
- Instructions: The specific operations the CPU can perform (e.g.,
ADD,LOAD,JUMP). - Registers: The internal storage units of the CPU (e.g.,
rax,x0). - Data Types: Definitions of integers, floats, and Endianness (Byte order).
- Addressing Modes: How the CPU calculates memory addresses.
- Instructions: The specific operations the CPU can perform (e.g.,
2. Architecture vs. Micro-architecture
It is critical to distinguish between the design specification and the physical implementation.
- ISA (Architecture): The Abstract Specification.
- Example: x86-64.
- It defines what the CPU does (e.g., "The opcode
0x01adds two numbers"). Intel and AMD both follow this standard so software is compatible.
- Micro-architecture: The Concrete Implementation.
- Example: Intel Skylake, AMD Zen 4.
- It defines how the CPU does it (e.g., pipeline depth, cache size, number of transistors).
- Analogy: All cars share the same interface (Steering wheel, pedals = ISA), but a Ferrari engine differs vastly from a Toyota engine (Micro-architecture).
3. CISC vs. RISC
The two dominant philosophies in processor design.
CISC (Complex Instruction Set Computer)
- Representative: x86 (Intel/AMD).
- Philosophy: "One instruction does many things."
- Characteristics:
- Variable-length instructions.
- Arithmetic operations can directly access memory (e.g.,
ADD EAX, [PTR]). - Complex hardware, simpler compiler.
RISC (Reduced Instruction Set Computer)
- Representative: ARM (Mobile/Apple Silicon), MIPS, RISC-V.
- Philosophy: "Keep instructions simple and fast."
- Characteristics:
- Fixed-length instructions (usually 4 bytes).
- Load-Store Architecture: Arithmetic can only happen between registers. Data must be explicitly loaded from memory to registers first.
- Simpler hardware, complex compiler (needs more instructions to do the same task).
4. How ISA Determines Compilation and Assembly
The ISA dictates the output of the entire translation pipeline. If you change the Target ISA, the output changes completely.
Impact on Compilation (Source -> Assembly)
The compiler acts as a decision-maker limited by the ISA's vocabulary.
* Instruction Selection:
* If the ISA has a specialized instruction (e.g., IMUL on x86), the compiler uses it.
* If not, the compiler must synthesize the operation using a sequence of simpler instructions.
* Register Allocation:
* x86 (32-bit): Has few registers (~8). The compiler must frequently "spill" variables to the stack (Memory).
* ARM/x86-64: Has many registers (16-32). The compiler can keep most variables in registers for speed.
Impact on Assembly (Assembly -> Machine Code)
The assembler acts as a translator using the ISA's encoding table.
* Binary Encoding: Even if two ISAs both have an instruction named MOV, their binary representation (Opcode) will be totally different.
* Endianness: The ISA defines whether the byte 0x1234 is stored in memory as 12 34 (Big Endian) or 34 12 (Little Endian).
5. Compiler Backend Mechanics
How does the compiler know which ISA to target?
The Target Triple
The compiler uses a formatted string called the Target Triple to identify the destination environment.
* Format: Architecture - Vendor - OS - Environment(ABI)
* Examples:
* x86_64-pc-linux-gnu
* aarch64-apple-darwin (Apple Silicon)
* arm-none-eabi (Bare-metal Embedded)
Backend Stages
- Intermediate Representation (IR): The frontend translates C code into a generic, CPU-independent language (e.g., LLVM IR).
- Instruction Selection: The backend maps generic IR operations to specific target ISA instructions.
- Register Allocation: Assigns infinite virtual registers from the IR to the limited physical registers of the CPU.
- Instruction Scheduling: Reorders instructions to optimize for the specific Micro-architecture's pipeline (e.g., hiding memory latency).