맵

발행: 2일 전 (2025년 12월 17일 오전 11:33 GMT+9)

7 min read

Source: Dev.to

번역할 텍스트를 제공해 주시겠어요? 텍스트를 주시면 한국어로 번역해 드리겠습니다.

개요

높은 성능을 제공하는 메모리 매핑 데이터베이스이자 공간 효율적인 근사 데이터 구조를 위한 일반화된 프레임워크로, O(1) 조회를 지원합니다. 완전 해시 함수에 기반한 maph는 구성 가능한 저장소와 사용자 정의 디코더를 사용하여 임의의 매핑 *(f : X → Y)*에 대해 서브 마이크로초 수준의 접근을 제공합니다.

주요 기능

기능	설명
O(1) Lookups	완벽한 해시 최적화를 통한 보장된 상수‑시간 연산
Memory‑Mapped Storage	최소 메모리 오버헤드를 위한 `mmap`을 통한 제로‑카피 접근
Lock‑Free Operations	스레드‑안전한 동시 접근을 위한 원자 연산
Generalised Framework	멤버십뿐 아니라 임의 매핑 X → Y를 지원
JSON Database	REST API를 제공하는 내장 데몬을 통한 JSON 키‑값 저장
Configurable Storage	공간/정밀도 절충을 위한 8/16/32/64‑비트 저장 옵션
Custom Decoders	특수 애플리케이션을 위한 확장 가능한 디코더 시스템
SIMD Optimised	배치 연산을 위한 AVX2 가속

성능 요약

작업	처리량	지연시간	비고
GET	10 M ops/s

int main() {
    // Create a new maph store
    auto store = maph::Maph::create("mystore.maph", 1'000'000);

    // Store JSON key‑value pairs
    store->set(R"({"user":"alice"})", R"({"score":95})");

    // Retrieve with O(1) lookup
    auto value = store->get(R"({"user":"alice"})");
    if (value) {
        std::cout << *value << std::endl;
    }

    // Batch insert
    std::unordered_map<std::string, std::string> batch = {
        {R"({"id":1})", R"({"name":"item1"})"},
        {R"({"id":2})", R"({"name":"item2"})"}
    };
    store->mset(batch);
}

데몬 및 CLI

# Start the maph daemon
./maph_daemon --port 8080 --threads 4

# CLI operations
./maph_cli set mystore '{"key":"user:1"}' '{"name":"Alice","age":30}'
./maph_cli get mystore '{"key":"user:1"}'
./maph_cli scan mystore --limit 100

REST API

# 스토어 생성
curl -X POST http://localhost:8080/stores \
     -d '{"name":"products","slots":1000000}'

# 키 삽입
curl -X PUT "http://localhost:8080/stores/products/keys/sku:12345" \
     -d '{"name":"Laptop","price":999}'

# 키 조회
curl http://localhost:8080/stores/products/keys/sku:12345

# 완벽 해시 최적화 (O(1) 보장)
curl -X POST http://localhost:8080/stores/products/optimize

아키텍처 개요

Core Library (include/maph.hpp)          → Memory‑mapped storage + atomic ops
Approximate Maps (include/maph/approximate_map.hpp) → Generalised X→Y mappings
Decoders (include/maph/decoders.hpp)    → Configurable output transformations
Applications:
  • CLI tool (maph_cli)
  • Daemon (maph_daemon) – JSON DB server
  • REST API (integrations/rest_api) – HTTP interface + web UI

파일 레이아웃 (디스크 상)

┌─────────────────────┐
│ Header (512 B)       │  Magic, version, slot counts, generation
├─────────────────────┤
│ Slot 0 (512 B)      │  Hash (32 b) + Version (32 b) + Data (504 B)
├─────────────────────┤
│ Slot 1 (512 B)      │  …
├─────────────────────┤
│      …              │
└─────────────────────┘

소스에서 빌드하기

# Clone the repository
git clone https://github.com/yourusername/maph.git
cd maph

# Build
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
         -DBUILD_TESTS=ON \
         -DBUILD_REST_API=ON \
         -DBUILD_EXAMPLES=ON
make -j$(nproc)

# Run tests
ctest --verbose

# Install
sudo make install

CMake 옵션

옵션	기본값	설명
`BUILD_TESTS`	OFF	테스트 스위트 빌드
`BUILD_EXAMPLES`	OFF	예제 프로그램 빌드
`BUILD_REST_API`	OFF	REST API 서버 빌드
`BUILD_DOCS`	OFF	문서 생성

사용자 정의 디코더

// Threshold filter – maps values to bool based on a threshold
template <typename StorageType>
struct ThresholdDecoder {
    using output_type = bool;
    float threshold;

    bool operator()(StorageType value, StorageType max_val) const {
        return static_cast<float>(value) / max_val > threshold;
    }
};

// Quantisation decoder – maps to discrete levels
template <typename StorageType>
struct QuantizeDecoder {
    using output_type = int;
    int levels;

    int operator()(StorageType value, StorageType max_val) const {
        return (value * levels) / (max_val + 1);
    }
};

병렬 연산

// Parallel batch insert with custom thread count
store->parallel_mset(large_batch, 8);   // 8 threads

// Parallel scan using visitor pattern
std::atomic<size_t> count{0};
store->parallel_scan(
    [&count](uint64_t idx, uint32_t hash, JsonView value) {
        if (value.size() > 100) ++count;
    },
    4   // 4 threads
);

내구성

// Background sync manager (e.g., every 5 s or 1000 ops)
auto manager = store->start_durability_manager(
    std::chrono::seconds(5),   // interval
    1000                       // max ops before sync
);

// Manual sync
store->sync();

저장 유형 및 특성

저장 유형	바이트/요소	오탐률	일반적인 사용 사례
`uint8_t`	1	0.39 %	대규모 데이터셋, 오류 허용
`uint16_t`	2	0.0015 %	정확도와 공간의 균형
`uint32_t`	4	≈ 0 %	높은 정확도 필요
`uint64_t`	8	0 %	완벽한 정확도, 암호학

Comparative Overview

System	Lookup	Space	Persistence	Concurrency	Typical Use‑Case
maph	O(1)	최적	`mmap`	락프리	일반 K/V, 고성능
Redis	O(1) avg	약 10× 오버헤드	AOF / RDB	단일 스레드	캐시, 세션
RocksDB	O(log n)	압축	SST files	멀티 스레드	대용량 영구 데이터
Memcached	O(1) avg	인‑메모리	없음	멀티 스레드	단순 캐시
Bloom filter	O(1)	최적	Custom	읽기 전용	멤버십 전용

Example Use‑Cases

High‑Performance Cache – 핫 데이터에 대한 서브마이크로초 조회.
Feature Stores – 보장된 지연 시간을 갖는 머신러닝 피처 서빙.
Session Storage – 즉시 접근 가능한 웹 세션 데이터.
Approximate Filters – 구성 가능한 정확도를 가진 블룸‑필터 대안.
Rate Limiting – O(1) 검사로 구현된 토큰‑버킷.
Deduplication – 데이터 스트림에서 빠른 중복 감지.
Graph Databases – 완벽 해싱을 이용한 인접‑리스트 저장.
Time‑Series Index – 빠른 타임스탬프‑데이터 매핑.

API 스니펫 – 생성 / 열기

// Create a new store
auto store = maph::Maph::create("path/to/store.maph", total_slots);

// Open an existing store (read‑only)
auto store = maph::Maph::open("path/to/store.maph", /*read_only=*/true);

API 레퍼런스

c_slots = 0);
static std::unique_ptr open(path, readonly = false);

// Basic operations
std::optional get(JsonView key);
bool set(JsonView key, JsonView value);
bool remove(JsonView key);
bool exists(JsonView key);

// Batch operations
void mget(keys, callback);
size_t mset(key_value_pairs);

// Parallel operations
void parallel_mget(keys, callback, threads = 0);
size_t parallel_mset(pairs, threads = 0);
void parallel_scan(visitor, threads = 0);

// Utilities
Stats stats();
void sync();
void close();

기여

우리는 기여를 환영합니다! 가이드라인은 **CONTRIBUTING.md**를 참고하세요.

기여할 수 있는 주요 분야

완벽한 해시 함수 구현
특수 사용 사례를 위한 추가 디코더
언어 바인딩 (Python, Rust, Go, Java)
성능 최적화
문서 및 예제

참고 문헌

이 작업의 이론적 기반:

@inproceedings{rdphfilter2024,
  title   = {Rate-Distorted Perfect Hash Filters},
  author  = {...},
  booktitle = {...},
  year    = {2024}
}

라이선스

MIT 라이선스 – 자세한 내용은 LICENSE 를 참조하십시오.

완벽 해시 함수, 근사 데이터 구조 및 공간 효율적인 알고리즘에 대한 연구를 기반으로 합니다. 이 작업에 영감을 준 CHD, BBHash, 및 기타 최소 완벽 해시 알고리즘의 저자들에게 특별히 감사드립니다.

맵

개요