Skip to content

Character Constants 字符常量

1. Definition and Syntax

A character constant is a type of literal used to represent a single character. It is enclosed in single quotation marks (e.g., 'A', '7', or '$').

In C, the value of a character constant is the numerical value of the character in the machine’s character set (typically ASCII).


2. Internal Representation and Data Type

One of the most distinctive features of C is how it treats characters: * Integer Nature: Characters are stored as integers (ASCII codes). For example, the constant 'A' is stored as the integer 65. * The int Paradox: In C, the type of a character constant (like 'a') is actually int, not char. * sizeof('a') is equivalent to sizeof(int), which is typically 4 bytes on modern systems. * However, when assigned to a char variable, it is truncated to 1 byte.


3. Classification of Character Constants

A. Simple Character Constants

Represented by a single graphic character: 'a', 'B', '9', etc.

B. Escape Sequences

Used to represent non-printable or special characters.

Sequence Description ASCII (Dec)
\n Newline 10
\t Horizontal Tab 9
\\ Backslash 92
\' Single Quote 39
\" Double Quote 34
\0 Null Character 0

C. Numeric Escape Sequences

You can represent any character by its numeric code: * Octal: '\ddd' (e.g., '\101' is 'A'). Max 3 digits. * Hexadecimal: '\xhh' (e.g., '\x41' is 'A'). NO Max digits (still can't overflow). e.g. \x00004 is legal.

Overflow Warning for \ddd and \xhh While C allows numeric escapes, their values must typically fit within 8 bits (1 byte) to be valid char constants.

  • Octal (\ddd): Limited to 3 digits. The maximum valid value is '\377' (Decimal 255). '\777' (511) overflows.
  • Hexadecimal (\xhh): Can have unlimited digits in syntax, but the value is truncated to fit the target type. For a standard char, the maximum is '\xff' (Decimal 255).
  • Result of Overflow: Exceeding 255 triggers compiler warnings and bitwise truncation. For example, both '\777' and '\x1ff' (Decimal 511) are truncated to 255 or -1 (binary 11111111).

4. Range and Storage

The value range of a character is determined by its storage size (1 byte / 8 bits) and the execution environment.

Range Specifications:

  • Standard ASCII: 0 to 127 (7 bits used).
  • Extended ASCII: 0 to 255 (8 bits used).
  • Signed vs. Unsigned char:
    • If char is signed: -128 to 127.
    • If char is unsigned: 0 to 255.
  • Implementation Dependency: The C standard does not specify whether char is signed or unsigned by default. This depends on the compiler and architecture (e.g., x86 is usually signed, ARM is often unsigned).

5. Character Constants vs. String Literals

Distinguishing between 'A' and "A" is critical for memory management and pointer logic.

Feature Character Constant ('A') String Literal ("A")
Enclosure Single quotes Double quotes
Type int char[] (Array of chars)
Size 1 byte (in storage) 2 bytes (includes \0)
Termination None Automatic Null Terminator (\0)

6. Character Arithmetic

Because characters are integers, they support mathematical operations: * Case Conversion: 'a' - 32 results in 'A'. * Digit Conversion: To convert the character '5' to the integer 5, subtract the character '0': c int val = '5' - '0'; // val = 5


7. Common Pitfalls and Best Practices

  1. Multiple Characters: 'abc' is a "multi-character constant." Its value is implementation-defined and generally avoided as it leads to non-portable code.
  2. Numeric Confusion: Do not confuse the character '0' (ASCII 48) with the null terminator '\0' (ASCII 0) or the integer 0.
  3. Overflow: Assigning a hex value like '\xff' to a signed char may result in a value of -1 due to sign extension, which can cause logic errors in comparisons (e.g., if (c == 0xff) might fail).

Note: This guide follows the C11/C17 standards. Behavioral differences may exist in C++.