maph

Published: (December 16, 2025 at 09:33 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Overview

A high‑performance, memory‑mapped database and generalized framework for space‑efficient approximate data structures with O(1) lookups. Built on perfect‑hash functions, maph provides sub‑microsecond access to arbitrary mappings (f : X → Y) with configurable storage and custom decoders.

Key Features

FeatureDescription
O(1) LookupsGuaranteed constant‑time operations with perfect‑hash optimisation
Memory‑Mapped StorageZero‑copy access via mmap for minimal memory overhead
Lock‑Free OperationsAtomic operations for thread‑safe concurrent access
Generalised FrameworkSupports arbitrary mappings X → Y, not just membership
JSON DatabaseBuilt‑in daemon for JSON key‑value storage with a REST API
Configurable Storage8/16/32/64‑bit storage options for space/accuracy trade‑offs
Custom DecodersExtensible decoder system for specialised applications
SIMD OptimisedAVX2 acceleration for batch operations

Performance Summary

OperationThroughputLatencyNotes
GET10 M ops/s
int main() {
    // Create a new maph store
    auto store = maph::Maph::create("mystore.maph", 1'000'000);

    // Store JSON key‑value pairs
    store->set(R"({"user":"alice"})", R"({"score":95})");

    // Retrieve with O(1) lookup
    auto value = store->get(R"({"user":"alice"})");
    if (value) {
        std::cout << *value << std::endl;
    }

    // Batch insert
    std::unordered_map<std::string, std::string> batch = {
        {R"({"id":1})", R"({"name":"item1"})"},
        {R"({"id":2})", R"({"name":"item2"})"}
    };
    store->mset(batch);
}

Daemon & CLI

# Start the maph daemon
./maph_daemon --port 8080 --threads 4
# CLI operations
./maph_cli set mystore '{"key":"user:1"}' '{"name":"Alice","age":30}'
./maph_cli get mystore '{"key":"user:1"}'
./maph_cli scan mystore --limit 100

REST API

# Create a store
curl -X POST http://localhost:8080/stores \
     -d '{"name":"products","slots":1000000}'

# Insert a key
curl -X PUT "http://localhost:8080/stores/products/keys/sku:12345" \
     -d '{"name":"Laptop","price":999}'

# Retrieve the key
curl http://localhost:8080/stores/products/keys/sku:12345

# Optimise for perfect hash (O(1) guaranteed)
curl -X POST http://localhost:8080/stores/products/optimize

Architecture Overview

Core Library (include/maph.hpp)          → Memory‑mapped storage + atomic ops
Approximate Maps (include/maph/approximate_map.hpp) → Generalised X→Y mappings
Decoders (include/maph/decoders.hpp)    → Configurable output transformations
Applications:
  • CLI tool (maph_cli)
  • Daemon (maph_daemon) – JSON DB server
  • REST API (integrations/rest_api) – HTTP interface + web UI

File Layout (on‑disk)

┌─────────────────────┐
│ Header (512 B)       │  Magic, version, slot counts, generation
├─────────────────────┤
│ Slot 0 (512 B)      │  Hash (32 b) + Version (32 b) + Data (504 B)
├─────────────────────┤
│ Slot 1 (512 B)      │  …
├─────────────────────┤
│      …              │
└─────────────────────┘

Building from Source

# Clone the repository
git clone https://github.com/yourusername/maph.git
cd maph

# Build
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
         -DBUILD_TESTS=ON \
         -DBUILD_REST_API=ON \
         -DBUILD_EXAMPLES=ON
make -j$(nproc)

# Run tests
ctest --verbose

# Install
sudo make install

CMake Options

OptionDefaultDescription
BUILD_TESTSOFFBuild test suite
BUILD_EXAMPLESOFFBuild example programs
BUILD_REST_APIOFFBuild REST API server
BUILD_DOCSOFFGenerate documentation

Custom Decoders

// Threshold filter – maps values to bool based on a threshold
template <typename StorageType>
struct ThresholdDecoder {
    using output_type = bool;
    float threshold;

    bool operator()(StorageType value, StorageType max_val) const {
        return static_cast<float>(value) / max_val > threshold;
    }
};

// Quantisation decoder – maps to discrete levels
template <typename StorageType>
struct QuantizeDecoder {
    using output_type = int;
    int levels;

    int operator()(StorageType value, StorageType max_val) const {
        return (value * levels) / (max_val + 1);
    }
};

Parallel Operations

// Parallel batch insert with custom thread count
store->parallel_mset(large_batch, 8);   // 8 threads

// Parallel scan using visitor pattern
std::atomic<size_t> count{0};
store->parallel_scan(
    [&count](uint64_t idx, uint32_t hash, JsonView value) {
        if (value.size() > 100) ++count;
    },
    4   // 4 threads
);

Durability

// Background sync manager (e.g., every 5 s or 1000 ops)
auto manager = store->start_durability_manager(
    std::chrono::seconds(5),   // interval
    1000                       // max ops before sync
);

// Manual sync
store->sync();

Storage Types & Characteristics

Storage TypeBytes/ElementFalse‑Positive RateTypical Use‑Case
uint8_t10.39 %Large datasets, error‑tolerant
uint16_t20.0015 %Balanced accuracy / space
uint32_t4≈ 0 %High accuracy required
uint64_t80 %Perfect accuracy, cryptographic

Comparative Overview

SystemLookupSpacePersistenceConcurrencyTypical Use‑Case
maphO(1)OptimalmmapLock‑freeGeneral K/V, high‑perf
RedisO(1) avg~10× overheadAOF / RDBSingle‑threadCache, sessions
RocksDBO(log n)CompressedSST filesMulti‑threadLarge persistent data
MemcachedO(1) avgIn‑memoryNoneMulti‑threadSimple cache
Bloom filterO(1)OptimalCustomRead‑onlyMembership only

Example Use‑Cases

  • High‑Performance Cache – Sub‑microsecond lookups for hot data.
  • Feature Stores – ML feature serving with guaranteed latency.
  • Session Storage – Web session data with instant access.
  • Approximate Filters – Bloom‑filter alternative with configurable accuracy.
  • Rate Limiting – Token‑bucket implementation with O(1) checks.
  • Deduplication – Fast duplicate detection in data streams.
  • Graph Databases – Adjacency‑list storage with perfect hashing.
  • Time‑Series Index – Fast timestamp‑to‑data mapping.

API Snippet – Create / Open

// Create a new store
auto store = maph::Maph::create("path/to/store.maph", total_slots);

// Open an existing store (read‑only)
auto store = maph::Maph::open("path/to/store.maph", /*read_only=*/true);

API Reference

c_slots = 0);
static std::unique_ptr open(path, readonly = false);

// Basic operations
std::optional get(JsonView key);
bool set(JsonView key, JsonView value);
bool remove(JsonView key);
bool exists(JsonView key);

// Batch operations
void mget(keys, callback);
size_t mset(key_value_pairs);

// Parallel operations
void parallel_mget(keys, callback, threads = 0);
size_t parallel_mset(pairs, threads = 0);
void parallel_scan(visitor, threads = 0);

// Utilities
Stats stats();
void sync();
void close();

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Key areas for contribution

  • Perfect hash function implementations
  • Additional decoders for specialized use cases
  • Language bindings (Python, Rust, Go, Java)
  • Performance optimizations
  • Documentation and examples

References

The theoretical foundation for this work:

@inproceedings{rdphfilter2024,
  title   = {Rate-Distorted Perfect Hash Filters},
  author  = {...},
  booktitle = {...},
  year    = {2024}
}

License

MIT License – see LICENSE for details.

Built on research in perfect hash functions, approximate data structures, and space‑efficient algorithms. Special thanks to the authors of CHD, BBHash, and other minimal perfect hash algorithms that inspired this work.

Back to Blog

Related posts

Read more »