Why does C have the best file API
Source: Hacker News
2026-02-28 (Programming) (Rants)
Memory Mapping in C
There are a lot of nice programming languages, but files always seem like an afterthought. You usually only get read(), write() and some kind of serialization library.
In C, you can access files exactly the same as data in memory:
#include
#include
#include
#include
#include
int main(void) {
// Create/open a file containing 1000 unsigned integers,
// initialized to all zeros.
int len = 1000 * sizeof(uint32_t);
int file = open("numbers.u32", O_RDWR | O_CREAT, 0600);
ftruncate(file, len);
// Map it into memory.
uint32_t *numbers = mmap(NULL, len,
PROT_READ | PROT_WRITE, MAP_SHARED,
file, 0);
// Do something:
printf("%d\n", numbers[42]);
numbers[42] = numbers[42] + 1;
// Clean up
munmap(numbers, len);
close(file);
return 0;
}
How Memory Mapping Differs from Loading
Memory mapping isn’t the same as loading a file into memory: it still works if the file doesn’t fit in RAM. Data is loaded as needed, so it won’t take all day to open a terabyte‑size file. It works with all data types and is automatically cached; the cache is cleared if the system needs memory for something else.
Limitations in Other Languages
However, in most other languages you have to read() in tiny chunks, parse, process, serialize, and finally write() back to disk. This is verbose and needlessly limited to sequential access—computers haven’t used tape for decades.
If you’re lucky enough to have memory mapping, it will often be limited to byte arrays, which still require explicit parsing/serialization. It ends up being just a nicer way to call read() and write().
Performance and Overhead
C’s implementation isn’t perfect: memory mapping incurs overhead (page faults, TLB flushes) and C does nothing to handle endianness—but it still beats having no direct access at all.
Sure, you might want to do some parsing and validation, but that shouldn’t be required every time data leaves the disk. It’s very common to run out of memory, which makes it impossible to parse everything into RAM. Being able to offload data without complicating the code is very useful.
Security and Serialization
Just look at Python’s pickle: it’s a completely insecure serialization format. Loading a file can cause code execution even if you only wanted some numbers, yet it’s still widely used because it fits the mix‑code‑and‑data model of Python. A lot of files are not untrusted.
Filesystem as a Database
File manipulation is similarly neglected. The filesystem is the original NoSQL database, but you seldom get more than a wrapper around C’s readdir(). This usually results in people running another database, such as SQLite, on top of the filesystem, but relational databases never quite fit your program.
…and SQL integrates even worse than files: on top of having to serialize all your data, you have to write code in a whole separate language just to access it! Most programmers end up using it as a key‑value store and implement their own indexing, creating a bizarre triple‑nested database.
Conclusion
So to answer the title, I think it’s a result of a bad assumption: that data read from a file is coming from somewhere else and needs to be parsed, and that data written to disk is being sent somewhere and needs to be serialized into a standard format. This simply isn’t true on memory‑constrained systems—and with 100 GB files, every system is memory constrained.