Undefined Behavior in C (UB) 未定义行为
1. Definition and Classification
The ISO C Standard categorizes program behavior into three distinct classes. Understanding the difference is crucial for systems programming:
- Implementation-defined Behavior: The behavior varies between implementations (compilers/architectures) but must be documented.
- Example: The size of an
int(usually 4 bytes, but can be 2), or the result of right-shifting a negative signed integer.
- Example: The size of an
- Unspecified Behavior: The standard provides multiple valid options; the compiler picks one, but need not document it.
- Example: The order in which function arguments are evaluated (e.g.,
func(a(), b())).
- Example: The order in which function arguments are evaluated (e.g.,
- Undefined Behavior (UB): The standard imposes no requirements on the program's behavior.
- Consequence: The program is considered "erroneous." The compiler is allowed to do anything: crash, produce incorrect results, corrupt data silently, or delete entire blocks of code.
2. The Philosophy: Why UB Exists
UB is not a design flaw; it is a deliberate choice to prioritize performance and optimization potential. 1. Performance: C assumes the programmer knows what they are doing. Forcing the compiler to insert runtime checks (e.g., array bounds checking) for every operation would severely degrade performance. 2. Optimization (The "As-If" Rule): Compilers optimize code based on the assumption that UB never happens. If a code path implies UB, the compiler assumes that path is unreachable and may remove it (Dead Code Elimination) or reorder instructions around it.
3. Common Sources of UB
A. Signed Integer Overflow
Unlike unsigned integers (which define wrapping modulo \(2^N\)), signed integer overflow is UB.
* Code: int a = INT_MAX + 1;
* Risk: The compiler may assume a + 1 > a is always true. If you write if (a + 1 < a) handle_overflow();, the compiler may optimize this check away entirely.
B. Memory Access Violations
- Buffer Overflow: Accessing
arr[5]when the size is 5.- Misconception: "It will cause a Segfault."
- Reality: It might crash. It might also overwrite adjacent stack variables (silent data corruption) or alter return addresses (security vulnerability).
- Null/Wild Pointer Dereference: Reading/writing
*pwhenpisNULLor uninitialized. - Use-After-Free: Accessing memory after calling
free().
C. Strict Aliasing Violations
The standard assumes pointers of different types (except char*) do not point to the same memory location.
* Code: Casting float* to int* and dereferencing it.
* Risk: The compiler may reorder reads and writes, assuming the two pointers are unrelated, leading to stale data being read.
The "Nasal Demons" Concept A famous humorous term in the C community. Since the standard imposes no limits on UB, a compiler could theoretically "make demons fly out of your nose." In reality, this manifests as "Time Travel" optimizations, where code logically preceding the UB is deleted because the compiler reasons backward that the UB could not have been reached.