Character Constants 字符常量
1. Definition and Syntax
A character constant is a type of literal used to represent a single character. It is enclosed in single quotation marks (e.g., 'A', '7', or '$').
In C, the value of a character constant is the numerical value of the character in the machine’s character set (typically ASCII).
2. Internal Representation and Data Type
One of the most distinctive features of C is how it treats characters:
* Integer Nature: Characters are stored as integers (ASCII codes). For example, the constant 'A' is stored as the integer 65.
* The int Paradox: In C, the type of a character constant (like 'a') is actually int, not char.
* sizeof('a') is equivalent to sizeof(int), which is typically 4 bytes on modern systems.
* However, when assigned to a char variable, it is truncated to 1 byte.
3. Classification of Character Constants
A. Simple Character Constants
Represented by a single graphic character: 'a', 'B', '9', etc.
B. Escape Sequences
Used to represent non-printable or special characters.
| Sequence | Description | ASCII (Dec) |
|---|---|---|
\n |
Newline | 10 |
\t |
Horizontal Tab | 9 |
\\ |
Backslash | 92 |
\' |
Single Quote | 39 |
\" |
Double Quote | 34 |
\0 |
Null Character | 0 |
C. Numeric Escape Sequences
You can represent any character by its numeric code:
* Octal: '\ddd' (e.g., '\101' is 'A'). Max 3 digits.
* Hexadecimal: '\xhh' (e.g., '\x41' is 'A'). NO Max digits (still can't overflow).
e.g. \x00004 is legal.
Overflow Warning for \ddd and \xhh
While C allows numeric escapes, their values must typically fit within 8 bits (1 byte) to be valid char constants.
- Octal (
\ddd): Limited to 3 digits. The maximum valid value is'\377'(Decimal 255).'\777'(511) overflows. - Hexadecimal (
\xhh): Can have unlimited digits in syntax, but the value is truncated to fit the target type. For a standardchar, the maximum is'\xff'(Decimal 255). - Result of Overflow: Exceeding 255 triggers compiler warnings and bitwise truncation. For example, both
'\777'and'\x1ff'(Decimal 511) are truncated to255or-1(binary11111111).
4. Range and Storage
The value range of a character is determined by its storage size (1 byte / 8 bits) and the execution environment.
Range Specifications:
- Standard ASCII: 0 to 127 (7 bits used).
- Extended ASCII: 0 to 255 (8 bits used).
- Signed vs. Unsigned
char:- If
charis signed: -128 to 127. - If
charis unsigned: 0 to 255.
- If
- Implementation Dependency: The C standard does not specify whether
charis signed or unsigned by default. This depends on the compiler and architecture (e.g., x86 is usually signed, ARM is often unsigned).
5. Character Constants vs. String Literals
Distinguishing between 'A' and "A" is critical for memory management and pointer logic.
| Feature | Character Constant ('A') |
String Literal ("A") |
|---|---|---|
| Enclosure | Single quotes | Double quotes |
| Type | int |
char[] (Array of chars) |
| Size | 1 byte (in storage) | 2 bytes (includes \0) |
| Termination | None | Automatic Null Terminator (\0) |
6. Character Arithmetic
Because characters are integers, they support mathematical operations:
* Case Conversion: 'a' - 32 results in 'A'.
* Digit Conversion: To convert the character '5' to the integer 5, subtract the character '0':
c
int val = '5' - '0'; // val = 5
7. Common Pitfalls and Best Practices
- Multiple Characters:
'abc'is a "multi-character constant." Its value is implementation-defined and generally avoided as it leads to non-portable code. - Numeric Confusion: Do not confuse the character
'0'(ASCII 48) with the null terminator'\0'(ASCII 0) or the integer0. - Overflow: Assigning a hex value like
'\xff'to a signedcharmay result in a value of-1due to sign extension, which can cause logic errors in comparisons (e.g.,if (c == 0xff)might fail).
Note: This guide follows the C11/C17 standards. Behavioral differences may exist in C++.