maph
Source: Dev.to
Overview
A high‑performance, memory‑mapped database and generalized framework for space‑efficient approximate data structures with O(1) lookups. Built on perfect‑hash functions, maph provides sub‑microsecond access to arbitrary mappings (f : X → Y) with configurable storage and custom decoders.
Key Features
| Feature | Description |
|---|---|
| O(1) Lookups | Guaranteed constant‑time operations with perfect‑hash optimisation |
| Memory‑Mapped Storage | Zero‑copy access via mmap for minimal memory overhead |
| Lock‑Free Operations | Atomic operations for thread‑safe concurrent access |
| Generalised Framework | Supports arbitrary mappings X → Y, not just membership |
| JSON Database | Built‑in daemon for JSON key‑value storage with a REST API |
| Configurable Storage | 8/16/32/64‑bit storage options for space/accuracy trade‑offs |
| Custom Decoders | Extensible decoder system for specialised applications |
| SIMD Optimised | AVX2 acceleration for batch operations |
Performance Summary
| Operation | Throughput | Latency | Notes |
|---|---|---|---|
| GET | 10 M ops/s |
int main() {
// Create a new maph store
auto store = maph::Maph::create("mystore.maph", 1'000'000);
// Store JSON key‑value pairs
store->set(R"({"user":"alice"})", R"({"score":95})");
// Retrieve with O(1) lookup
auto value = store->get(R"({"user":"alice"})");
if (value) {
std::cout << *value << std::endl;
}
// Batch insert
std::unordered_map<std::string, std::string> batch = {
{R"({"id":1})", R"({"name":"item1"})"},
{R"({"id":2})", R"({"name":"item2"})"}
};
store->mset(batch);
}
Daemon & CLI
# Start the maph daemon
./maph_daemon --port 8080 --threads 4
# CLI operations
./maph_cli set mystore '{"key":"user:1"}' '{"name":"Alice","age":30}'
./maph_cli get mystore '{"key":"user:1"}'
./maph_cli scan mystore --limit 100
REST API
# Create a store
curl -X POST http://localhost:8080/stores \
-d '{"name":"products","slots":1000000}'
# Insert a key
curl -X PUT "http://localhost:8080/stores/products/keys/sku:12345" \
-d '{"name":"Laptop","price":999}'
# Retrieve the key
curl http://localhost:8080/stores/products/keys/sku:12345
# Optimise for perfect hash (O(1) guaranteed)
curl -X POST http://localhost:8080/stores/products/optimize
Architecture Overview
Core Library (include/maph.hpp) → Memory‑mapped storage + atomic ops
Approximate Maps (include/maph/approximate_map.hpp) → Generalised X→Y mappings
Decoders (include/maph/decoders.hpp) → Configurable output transformations
Applications:
• CLI tool (maph_cli)
• Daemon (maph_daemon) – JSON DB server
• REST API (integrations/rest_api) – HTTP interface + web UI
File Layout (on‑disk)
┌─────────────────────┐
│ Header (512 B) │ Magic, version, slot counts, generation
├─────────────────────┤
│ Slot 0 (512 B) │ Hash (32 b) + Version (32 b) + Data (504 B)
├─────────────────────┤
│ Slot 1 (512 B) │ …
├─────────────────────┤
│ … │
└─────────────────────┘
Building from Source
# Clone the repository
git clone https://github.com/yourusername/maph.git
cd maph
# Build
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
-DBUILD_TESTS=ON \
-DBUILD_REST_API=ON \
-DBUILD_EXAMPLES=ON
make -j$(nproc)
# Run tests
ctest --verbose
# Install
sudo make install
CMake Options
| Option | Default | Description |
|---|---|---|
BUILD_TESTS | OFF | Build test suite |
BUILD_EXAMPLES | OFF | Build example programs |
BUILD_REST_API | OFF | Build REST API server |
BUILD_DOCS | OFF | Generate documentation |
Custom Decoders
// Threshold filter – maps values to bool based on a threshold
template <typename StorageType>
struct ThresholdDecoder {
using output_type = bool;
float threshold;
bool operator()(StorageType value, StorageType max_val) const {
return static_cast<float>(value) / max_val > threshold;
}
};
// Quantisation decoder – maps to discrete levels
template <typename StorageType>
struct QuantizeDecoder {
using output_type = int;
int levels;
int operator()(StorageType value, StorageType max_val) const {
return (value * levels) / (max_val + 1);
}
};
Parallel Operations
// Parallel batch insert with custom thread count
store->parallel_mset(large_batch, 8); // 8 threads
// Parallel scan using visitor pattern
std::atomic<size_t> count{0};
store->parallel_scan(
[&count](uint64_t idx, uint32_t hash, JsonView value) {
if (value.size() > 100) ++count;
},
4 // 4 threads
);
Durability
// Background sync manager (e.g., every 5 s or 1000 ops)
auto manager = store->start_durability_manager(
std::chrono::seconds(5), // interval
1000 // max ops before sync
);
// Manual sync
store->sync();
Storage Types & Characteristics
| Storage Type | Bytes/Element | False‑Positive Rate | Typical Use‑Case |
|---|---|---|---|
uint8_t | 1 | 0.39 % | Large datasets, error‑tolerant |
uint16_t | 2 | 0.0015 % | Balanced accuracy / space |
uint32_t | 4 | ≈ 0 % | High accuracy required |
uint64_t | 8 | 0 % | Perfect accuracy, cryptographic |
Comparative Overview
| System | Lookup | Space | Persistence | Concurrency | Typical Use‑Case |
|---|---|---|---|---|---|
| maph | O(1) | Optimal | mmap | Lock‑free | General K/V, high‑perf |
| Redis | O(1) avg | ~10× overhead | AOF / RDB | Single‑thread | Cache, sessions |
| RocksDB | O(log n) | Compressed | SST files | Multi‑thread | Large persistent data |
| Memcached | O(1) avg | In‑memory | None | Multi‑thread | Simple cache |
| Bloom filter | O(1) | Optimal | Custom | Read‑only | Membership only |
Example Use‑Cases
- High‑Performance Cache – Sub‑microsecond lookups for hot data.
- Feature Stores – ML feature serving with guaranteed latency.
- Session Storage – Web session data with instant access.
- Approximate Filters – Bloom‑filter alternative with configurable accuracy.
- Rate Limiting – Token‑bucket implementation with O(1) checks.
- Deduplication – Fast duplicate detection in data streams.
- Graph Databases – Adjacency‑list storage with perfect hashing.
- Time‑Series Index – Fast timestamp‑to‑data mapping.
API Snippet – Create / Open
// Create a new store
auto store = maph::Maph::create("path/to/store.maph", total_slots);
// Open an existing store (read‑only)
auto store = maph::Maph::open("path/to/store.maph", /*read_only=*/true);
API Reference
c_slots = 0);
static std::unique_ptr open(path, readonly = false);
// Basic operations
std::optional get(JsonView key);
bool set(JsonView key, JsonView value);
bool remove(JsonView key);
bool exists(JsonView key);
// Batch operations
void mget(keys, callback);
size_t mset(key_value_pairs);
// Parallel operations
void parallel_mget(keys, callback, threads = 0);
size_t parallel_mset(pairs, threads = 0);
void parallel_scan(visitor, threads = 0);
// Utilities
Stats stats();
void sync();
void close();
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Key areas for contribution
- Perfect hash function implementations
- Additional decoders for specialized use cases
- Language bindings (Python, Rust, Go, Java)
- Performance optimizations
- Documentation and examples
References
The theoretical foundation for this work:
@inproceedings{rdphfilter2024,
title = {Rate-Distorted Perfect Hash Filters},
author = {...},
booktitle = {...},
year = {2024}
}
License
MIT License – see LICENSE for details.
Built on research in perfect hash functions, approximate data structures, and space‑efficient algorithms. Special thanks to the authors of CHD, BBHash, and other minimal perfect hash algorithms that inspired this work.