05 Read & Write Operations for Binary Files
1. Characteristics of Binary Files
Unlike text files, which are meant for human consumption, binary files are designed for machine efficiency. They represent a direct "snapshot" of memory contents stored on disk.
Raw Data Storage
- Byte Range: Binary files utilize the full range of a byte (0–255). There are no "forbidden" or special control characters. A null byte (
0x00) or a newline (0x0A) is treated simply as data (e.g., part of an integer or a color value). - No Conversion: Data is written exactly as it is represented in RAM. An integer
123456is stored as a 4-byte sequence (e.g.,40 E2 01 00), whereas a text file would convert it to the string "123456" (6 bytes).
Advantages & Disadvantages
- Speed: Extremely fast. The CPU does not need to perform expensive formatting (like
printfdoes) or parsing (likescanfdoes). It is essentially a memory copy operation. - Storage Efficiency: Typically more compact for numeric data.
- Portability Issues (The Major Drawback): Binary files are hardware-dependent.
- Endianness: A file written on a Little-Endian machine (Intel) might be read incorrectly on a Big-Endian machine.
- Type Sizes: A
longmight be 4 bytes on a 32-bit system but 8 bytes on a 64-bit system, causing alignment errors if the file is moved between systems.
2. The Binary I/O API
The C Standard Library provides two block-based functions for binary operations: fread and fwrite. They act as "bulk movers" of data.
Function Prototypes
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
Parameter Anatomy
ptr(Pointer):- For
fwrite: The source of data (RAM). - For
fread: The destination buffer (RAM). - Note: It is a
void *, meaning it can handle any data type (int,struct, arrays).
- For
size(Item Size): The size of a single element in bytes. Always use thesizeofoperator (e.g.,sizeof(int)orsizeof(Student)).nmemb(Count): The number of elements to read or write.stream(File Pointer): The target file. Note: Unlikefprintf, the file pointer is the last argument here.
Understanding the Return Value
Both functions return the number of items successfully read or written, not the number of bytes.
- Logic: If you request to write 10 integers (
nmemb = 10), and the function returns10, the operation was fully successful. - Error Checking: If the return value is less than
nmemb, it indicates either an End-of-File (EOF) occurred or a write error (e.g., disk full).
3. Serialization (Saving Structures)
One of the most powerful use cases for binary I/O is saving complex data structures directly to disk without parsing fields individually.
Example Logic:
struct Player {
int id;
float health;
char name[20];
};
struct Player p1 = {1, 100.0f, "Hero"};
// Writing (Serialization)
// "Take the memory at address &p1, size of struct Player, 1 item, write to fp"
fwrite(&p1, sizeof(struct Player), 1, fp);
// Reading (Deserialization)
// "Read from fp, size of struct Player, 1 item, put it into memory at &p1"
fread(&p1, sizeof(struct Player), 1, fp);
- The
restrictKeyword: In modern C standards (C99+), you may seevoid *restrict ptrin prototypes. This is a hint to the compiler for optimization, indicating that the memory regions do not overlap.
4. Mode Selection: The Importance of "b"
When opening files for binary access using fopen, the mode string must include the character 'b' (e.g., "rb", "wb", "ab").
Windows vs. Linux/Unix
- Linux/Unix: The kernel treats text and binary files identically. The
'b'flag is ignored but allowed for compatibility. - Windows: The file system distinguishes between the two.
- Text Mode (Default): Implicitly translates
\n(LF) in memory to\r\n(CRLF) on disk (and vice versa). - Binary Mode (
"b"): Disables this translation.
- Text Mode (Default): Implicitly translates
The Corruption Risk
If you open a binary file (like a .jpg image) in text mode on Windows:
1. The byte 0x0A (decimal 10) occurs naturally in the image data.
2. The system interprets this as a "Newline."
3. It might "correct" it to 0x0D 0x0A, inserting an extra byte.
4. Result: The file size changes, the data offsets shift, and the image becomes corrupted.
Best Practice: Always use the "b" flag for any non-text file to ensure the data is read/written exactly as is (What You See Is What You Get).