Stop decompressing entire archives to get one file — introducing ARCX
Source: Dev.to
Overview
Most archive formats make a simple task unnecessarily expensive: you need one file, so you download and decompress everything.
I built ARCX, a compressed archive format designed to fix that.
ARCX combines cross‑file compression (like tar+zstd) with indexed random access (like ZIP), so you can retrieve a single file from a large archive in milliseconds without decompressing the rest.
GitHub:
Install
cargo install arcx
Benchmarks (across 5 real‑world datasets)
| Dataset | ARCX Bytes Read | TAR+ZSTD Bytes Read | Reduction |
|---|---|---|---|
| Python ML | 326 KB | 63.1 MB | 198× less |
| Build Artifacts | 714 KB | 140.4 MB | 202× less |
| Other 3 datasets | ≈ 200 ms per file retrieval from a ~200 MB archive | — | up to 200× less data read vs tar+zstd |
| Compression overhead | within ~3 % of tar+zstd | — | — |
Use Cases
- CI/CD pipelines (artifact retrieval)
- Cloud storage with partial reads
- Large codebases
- Package registries
Modern systems often need one file, immediately, rather than the entire archive.
How ARCX Works
- Block‑based compression – the archive is split into independently compressed blocks.
- Binary manifest index – stored at the end of the archive, mapping each file to its block offset.
- Direct offset reads – a client can:
- Look up the file in the index.
- Seek to the relevant block.
- Decompress only that block.
This replaces scanning or decompressing the full archive with a simple manifest lookup and a single block read.
Format Comparison
| Format | Compression Strength | Access Speed |
|---|---|---|
| ZIP | weaker | fast |
| tar+zstd | strong | slow |
| ARCX | strong | fast |
Limitations & Future Work
- ARCX is not designed for streaming (like
tar). The archive must be complete before reading because the manifest is written at the end. - Remote/S3 range‑read workflows have not been fully benchmarked yet.
- Metadata/index overhead is still being optimized for very large file counts.
- Full extraction benchmarks in Rust are still in progress.
Still early – feedback welcome.