07 File Descriptors & The Kernel
1. The File Descriptor (FD)
While C programmers interact with the high-level FILE * (stream), the Operating System Kernel manages files using a much simpler mechanism known as the File Descriptor.
Definition
A File Descriptor is a non-negative integer (0, 1, 2, ...) that acts as a handle for an input/output resource (files, sockets, pipes).
The File Descriptor Table
The kernel maintains a table for every running process (inside the Process Control Block - PCB). The FD is simply the array index into this table.
- Process Level: The FD Table maps the integer ID (e.g.,
3) to a pointer. - System Level: This pointer directs to a system-wide "Open File Table," which tracks the file offset, access mode, and reference count.
- Inode Level: Finally, this points to the actual data on the physical disk (Inode/Vnode).
Standard Descriptors
When a new process is created (forked), the kernel automatically populates the first three slots of the FD table:
* 0: Standard Input (stdin)
* 1: Standard Output (stdout)
* 2: Standard Error (stderr)
* 3+: Any subsequently opened files (via fopen or open) are assigned the lowest available integer.
2. The Relationship: FILE * vs. FD
It is crucial to understand the hierarchy between the C Standard Library and the OS Kernel.
The Hierarchy
- User Space (High Level): The
FILEstructure.- It is a wrapper object.
- It contains the buffering mechanism.
- It holds the File Descriptor as a member variable.
- Kernel Space (Low Level): The File Descriptor.
- It represents the actual connection to the hardware/driver.
- It is unbuffered (raw).
The Bridge: fileno()
The C library provides a function to extract the underlying FD from a stream pointer. This is often used when mixing standard C functions with POSIX system calls (like file locking or network operations).
#include <stdio.h>
#include <unistd.h> // POSIX standard header
FILE *fp = fopen("data.txt", "r");
int fd = fileno(fp); // Extracts the integer ID (e.g., 3)
3. System Calls vs. Library Functions
File operations exist at two distinct layers. Understanding the difference is key to performance tuning and system programming.
| Feature | C Standard Library (stdio.h) |
System Calls (POSIX/WinAPI) |
|---|---|---|
| Functions | fopen, fread, fwrite, fclose |
open, read, write, close |
| Buffering | Buffered (User-space buffer) | Unbuffered (Direct kernel access) |
| Portability | High (Works on any C compiler) | Low (OS specific) |
| Overhead | Low (Aggregates small writes) | High (Context switch per call) |
| Use Case | General application logic | Drivers, devices, special I/O |
- Performance Insight: Calling
fwrite100 times for 1 byte each is fast because it only triggers one system call (write) when the buffer fills. Callingwrite100 times for 1 byte each is slow because it forces 100 context switches between User Mode and Kernel Mode.
4. The Critical Role of fclose
The function fclose(fp) performs a complex cleanup sequence that bridges the gap between the two layers. It does much more than just freeing memory.
The Cleanup Sequence
- Flush User Buffer: Any data sitting in the
FILEstructure's buffer is pushed to the kernel via thewrite()system call. - Close File Descriptor: The library calls
close(fd). This tells the kernel to remove the entry from the process's FD table. - Free Memory: The
FILEstructure itself (allocated on the heap) is freed.
Consequences of Leaking FDs
If a programmer forgets to call fclose (or close), simply freeing the FILE pointer is insufficient. The underlying FD remains "Open" in the kernel.
- FD Exhaustion (Resource Leak):
- Operating systems enforce a limit on open files per process (e.g.,
ulimit -noften defaults to 1024). - If a server leaks FDs, it will eventually crash with a "Too many open files" error, refusing new connections or files.
- Operating systems enforce a limit on open files per process (e.g.,
- File Locking:
- File locks (
flock) are typically associated with the file description. Failing to close the FD may leave a file locked, preventing other processes from accessing it.
- File locks (
- Disk Space Reclamation (Unix/Linux):
- In Linux, if a file is deleted (unlinked) but an open FD still references it, the disk space is not freed. The file becomes "invisible" but still consumes storage until the process terminates or closes the FD.