Skip to content

Undefined Behavior in C (UB) 未定义行为

1. Definition and Classification

The ISO C Standard categorizes program behavior into three distinct classes. Understanding the difference is crucial for systems programming:

  • Implementation-defined Behavior: The behavior varies between implementations (compilers/architectures) but must be documented.
    • Example: The size of an int (usually 4 bytes, but can be 2), or the result of right-shifting a negative signed integer.
  • Unspecified Behavior: The standard provides multiple valid options; the compiler picks one, but need not document it.
    • Example: The order in which function arguments are evaluated (e.g., func(a(), b())).
  • Undefined Behavior (UB): The standard imposes no requirements on the program's behavior.
    • Consequence: The program is considered "erroneous." The compiler is allowed to do anything: crash, produce incorrect results, corrupt data silently, or delete entire blocks of code.

2. The Philosophy: Why UB Exists

UB is not a design flaw; it is a deliberate choice to prioritize performance and optimization potential. 1. Performance: C assumes the programmer knows what they are doing. Forcing the compiler to insert runtime checks (e.g., array bounds checking) for every operation would severely degrade performance. 2. Optimization (The "As-If" Rule): Compilers optimize code based on the assumption that UB never happens. If a code path implies UB, the compiler assumes that path is unreachable and may remove it (Dead Code Elimination) or reorder instructions around it.

3. Common Sources of UB

A. Signed Integer Overflow

Unlike unsigned integers (which define wrapping modulo \(2^N\)), signed integer overflow is UB. * Code: int a = INT_MAX + 1; * Risk: The compiler may assume a + 1 > a is always true. If you write if (a + 1 < a) handle_overflow();, the compiler may optimize this check away entirely.

B. Memory Access Violations

  • Buffer Overflow: Accessing arr[5] when the size is 5.
    • Misconception: "It will cause a Segfault."
    • Reality: It might crash. It might also overwrite adjacent stack variables (silent data corruption) or alter return addresses (security vulnerability).
  • Null/Wild Pointer Dereference: Reading/writing *p when p is NULL or uninitialized.
  • Use-After-Free: Accessing memory after calling free().

C. Strict Aliasing Violations

The standard assumes pointers of different types (except char*) do not point to the same memory location. * Code: Casting float* to int* and dereferencing it. * Risk: The compiler may reorder reads and writes, assuming the two pointers are unrelated, leading to stale data being read.


The "Nasal Demons" Concept A famous humorous term in the C community. Since the standard imposes no limits on UB, a compiler could theoretically "make demons fly out of your nose." In reality, this manifests as "Time Travel" optimizations, where code logically preceding the UB is deleted because the compiler reasons backward that the UB could not have been reached.