Skip to content

05 Read & Write Operations for Binary Files

1. Characteristics of Binary Files

Unlike text files, which are meant for human consumption, binary files are designed for machine efficiency. They represent a direct "snapshot" of memory contents stored on disk.

Raw Data Storage

  • Byte Range: Binary files utilize the full range of a byte (0–255). There are no "forbidden" or special control characters. A null byte (0x00) or a newline (0x0A) is treated simply as data (e.g., part of an integer or a color value).
  • No Conversion: Data is written exactly as it is represented in RAM. An integer 123456 is stored as a 4-byte sequence (e.g., 40 E2 01 00), whereas a text file would convert it to the string "123456" (6 bytes).

Advantages & Disadvantages

  • Speed: Extremely fast. The CPU does not need to perform expensive formatting (like printf does) or parsing (like scanf does). It is essentially a memory copy operation.
  • Storage Efficiency: Typically more compact for numeric data.
  • Portability Issues (The Major Drawback): Binary files are hardware-dependent.
    • Endianness: A file written on a Little-Endian machine (Intel) might be read incorrectly on a Big-Endian machine.
    • Type Sizes: A long might be 4 bytes on a 32-bit system but 8 bytes on a 64-bit system, causing alignment errors if the file is moved between systems.

2. The Binary I/O API

The C Standard Library provides two block-based functions for binary operations: fread and fwrite. They act as "bulk movers" of data.

Function Prototypes

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

Parameter Anatomy

  1. ptr (Pointer):
    • For fwrite: The source of data (RAM).
    • For fread: The destination buffer (RAM).
    • Note: It is a void *, meaning it can handle any data type (int, struct, arrays).
  2. size (Item Size): The size of a single element in bytes. Always use the sizeof operator (e.g., sizeof(int) or sizeof(Student)).
  3. nmemb (Count): The number of elements to read or write.
  4. stream (File Pointer): The target file. Note: Unlike fprintf, the file pointer is the last argument here.

Understanding the Return Value

Both functions return the number of items successfully read or written, not the number of bytes.

  • Logic: If you request to write 10 integers (nmemb = 10), and the function returns 10, the operation was fully successful.
  • Error Checking: If the return value is less than nmemb, it indicates either an End-of-File (EOF) occurred or a write error (e.g., disk full).

3. Serialization (Saving Structures)

One of the most powerful use cases for binary I/O is saving complex data structures directly to disk without parsing fields individually.

Example Logic:

struct Player {
    int id;
    float health;
    char name[20];
};

struct Player p1 = {1, 100.0f, "Hero"};

// Writing (Serialization)
// "Take the memory at address &p1, size of struct Player, 1 item, write to fp"
fwrite(&p1, sizeof(struct Player), 1, fp);

// Reading (Deserialization)
// "Read from fp, size of struct Player, 1 item, put it into memory at &p1"
fread(&p1, sizeof(struct Player), 1, fp);
  • The restrict Keyword: In modern C standards (C99+), you may see void *restrict ptr in prototypes. This is a hint to the compiler for optimization, indicating that the memory regions do not overlap.

4. Mode Selection: The Importance of "b"

When opening files for binary access using fopen, the mode string must include the character 'b' (e.g., "rb", "wb", "ab").

Windows vs. Linux/Unix

  • Linux/Unix: The kernel treats text and binary files identically. The 'b' flag is ignored but allowed for compatibility.
  • Windows: The file system distinguishes between the two.
    • Text Mode (Default): Implicitly translates \n (LF) in memory to \r\n (CRLF) on disk (and vice versa).
    • Binary Mode ("b"): Disables this translation.

The Corruption Risk

If you open a binary file (like a .jpg image) in text mode on Windows: 1. The byte 0x0A (decimal 10) occurs naturally in the image data. 2. The system interprets this as a "Newline." 3. It might "correct" it to 0x0D 0x0A, inserting an extra byte. 4. Result: The file size changes, the data offsets shift, and the image becomes corrupted.

Best Practice: Always use the "b" flag for any non-text file to ensure the data is read/written exactly as is (What You See Is What You Get).