Building Python Packages with C++ Extensions: A Complete Guide
Source: Dev.to
URL: https://isaacfei.com/posts/cmsketch-py-cpp
Date: 2025-09-13 Tags: Python, C++, pybind11, CMake, Data Structures Description: Learn how to develop Python packages with C++ extensions using pybind11, CMake, and modern build tools. Complete project structure and setup guide. The complete project repository is available here. Building Python packages with C++ extensions is a powerful way to combine Python’s ease of use with C++‘s performance. This guide walks through creating a complete Python package with C++ backend, covering everything from project structure to PyPI publishing. We’ll use a Count-Min Sketch implementation as our example - a probabilistic data structure perfect for streaming data analysis. But the techniques apply to any C++ library you want to expose to Python. Multithreading Performance: C++ atomic operations bypass Python’s GIL limitations, enabling true parallel processing Lower-Level Control: Direct memory management and hardware-level optimizations Existing Libraries: Leverage existing C++ libraries in Python projects System Integration: Access low-level system APIs and hardware features Memory Efficiency: Better control over memory usage and data structures Complete project structure for Python packages with C++ extensions How to configure pyproject.toml for modern Python packaging CMake setup for cross-platform C++ builds pybind11 integration for seamless Python bindings Development workflow and testing strategies CI/CD pipeline for automated building and publishing Understanding the project structure is crucial for Python packages with C++ extensions. Here’s the complete layout: count-min-sketch/ ├── include/cmsketch/ # C++ header files │ ├── cmsketch.h # Main header (include this) │ ├── count_min_sketch.h # Core template class │ └── hash_util.h # Hash utility functions ├── src/cmsketchcpp/ # C++ source files │ └── count_min_sketch.cc # Core implementation ├── src/cmsketch/ # Python package source │ ├── init.py # Package initialization │ ├── base.py # Base classes and interfaces │ ├── _core.pyi # Type stubs for C++ bindings │ ├── _version.py # Version information │ ├── py.typed # Type checking marker │ └── py/ # Pure Python implementations │ ├── count_min_sketch.py # Python Count-Min Sketch implementation │ └── hash_util.py # Python hash utilities ├── src/ # Additional source files │ ├── main.cc # Example C++ application │ └── python_bindings.cc # Python bindings (pybind11) ├── tests/ # C++ unit tests │ ├── CMakeLists.txt # Test configuration │ ├── test_count_min_sketch.cc # Core functionality tests │ ├── test_hash_functions.cc # Hash function tests │ └── test_sketch_config.cc # Configuration tests ├── pytests/ # Python tests │ ├── init.py # Test package init │ ├── conftest.py # Pytest configuration │ ├── test_count_min_sketch.py # Core Python tests │ ├── test_hash_util.py # Hash utility tests │ ├── test_mixins.py # Mixin class tests │ └── test_py_count_min_sketch.py # Pure Python implementation tests ├── benchmarks/ # Performance benchmarks │ ├── init.py # Benchmark package init │ ├── generate_data.py # Data generation utilities │ └── test_benchmarks.py # Benchmark validation tests ├── examples/ # Example scripts │ └── example.py # Python usage example ├── scripts/ # Build and deployment scripts │ ├── build.sh # Production build script │ └── build-dev.sh # Development build script ├── data/ # Sample data files │ ├── ips.txt # IP address sample data │ └── unique-ips.txt # Unique IP sample data ├── build/ # Build artifacts (generated) │ ├── _core.cpython-.so # Compiled Python extensions │ ├── cmsketch_example # Compiled C++ example │ ├── libcmsketch.a # Static library │ └── tests/ # Compiled test binaries ├── dist/ # Distribution packages (generated) │ └── cmsketch-.whl # Python wheel packages ├── CMakeLists.txt # Main CMake configuration ├── pyproject.toml # Python package configuration ├── uv.lock # uv lock file ├── Makefile # Convenience make targets ├── LICENSE # MIT License └── README.md # This file
include/: C++ header files that define the public API src/cmsketchcpp/: C++ implementation files src/cmsketch/: Python package source code src/: Additional C++ files like bindings and examples tests/: C++ unit tests using Google Test pytests/: Python tests using pytest benchmarks/: Performance testing and comparison build/: Generated build artifacts (not in version control) dist/: Generated distribution packages (not in version control) Managing versions across multiple files (Python package, C++ library, documentation) can be challenging. This project uses bump-my-version to automate version updates across all relevant files. The version management is configured in .bumpversion.toml:
.bumpversion.toml
[bumpversion] current_version = “0.1.10” commit = true tag = true tag_name = “v{new_versio
n}” message = “Bump version: {current_version} → {new_version}”
[bumpversion:file:pyproject.toml] search = ‘version = “{current_version}”’ replace = ‘version = “{new_version}”’
[bumpversion:file:CMakeLists.txt] search = ‘VERSION {current_version} # Project version’ replace = ‘VERSION {new_version} # Project version’
[bumpversion:file:VERSION] search = ‘{current_version}’ replace = ‘{new_version}’
To make bump-my-version work with CMakeLists.txt, I use a clever trick by adding a comment:
CMakeLists.txt
project( cmsketch VERSION 0.1.10 # Project version LANGUAGES CXX)
The comment # Project version helps bump-my-version identify the correct version line in CMakeLists.txt. This ensures that other occurrences of strings like VERSION x.x.x elsewhere in the file are not mistaken for the actual project version.
Install bump-my-version
uv add —dev bump-my-version
Bump patch version (0.1.10 → 0.1.11)
uv run bump-my-version patch
Bump minor version (0.1.10 → 0.2.0)
uv run bump-my-version minor
Bump major version (0.1.10 → 1.0.0)
uv run bump-my-version major
Preview changes without committing
uv run bump-my-version —dry-run patch
When you run bump-my-version, it automatically updates: pyproject.toml: Python package version CMakeLists.txt: C++ project version VERSION: Standalone version file Git commit: Creates a commit with the version bump Git tag: Creates a tag like v0.1.11
This ensures all version references stay synchronized across your entire project. The pyproject.toml file is the heart of modern Python packaging. Here’s how to configure it for C++ extensions:
pyproject.toml
[build-system] requires = [“scikit-build-core>=0.10”, “pybind11”, “cmake>=3.15”] build-backend = “scikit_build_core.build”
[project] name = “cmsketch” version = “0.1.10” description = “High-performance Count-Min Sketch implementation with C++ and Python versions” readme = “README.md” license = { file = “LICENSE” } authors = [{ name = “isaac-fei”, email = “isaac.omega.fei@gmail.com” }] maintainers = [{ name = “isaac-fei”, email = “isaac.omega.fei@gmail.com” }] requires-python = ”>=3.11” classifiers = [ “Development Status :: 4 - Beta”, “Intended Audience :: Developers”, “License :: OSI Approved :: MIT License”, “Programming Language :: Python :: 3”, “Programming Language :: Python :: 3.11”, “Programming Language :: Python :: 3.12”, “Programming Language :: C++”, “Topic :: Scientific/Engineering”, “Topic :: Software Development :: Libraries :: Python Modules”, “Operating System :: OS Independent”, ] keywords = [“count-min-sketch”, “probabilistic”, “data-structure”, “streaming”]
[project.urls] Homepage = “https://github.com/isaac-fate/count-min-sketch” Repository = “https://github.com/isaac-fate/count-min-sketch” Documentation = “https://github.com/isaac-fate/count-min-sketch#readme” Issues = “https://github.com/isaac-fate/count-min-sketch/issues”
[project.optional-dependencies] dev = [“pytest>=8.0.0”, “pytest-benchmark>=4.0.0”, “build>=1.0.0”]
[tool.scikit-build] build-dir = “build/{wheel_tag}” wheel.exclude = [“lib/”, “include/”]
[tool.scikit-build.cmake] args = [ “-DCMAKE_BUILD_TYPE=Release”, “-DCMAKE_CXX_STANDARD=17”, “-DCMAKE_CXX_STANDARD_REQUIRED=ON”, “-DCMAKE_CXX_EXTENSIONS=OFF”, ]
[tool.cibuildwheel] build = “cp311-* cp312-” skip = “-win32 *-manylinux_i686 -musllinux” test-command = “python -m pytest {project}/pytests -v” test-requires = “pytest” manylinux-x86_64-image = “manylinux_2_28”
[tool.cibuildwheel.macos] environment = { MACOSX_DEPLOYMENT_TARGET = “10.15” }
[tool.cibuildwheel.windows] before-build = “pip install delvewheel” repair-wheel-command = “delvewheel repair -w {dest_dir} {wheel}”
[tool.pytest.ini_options] testpaths = [“pytests”] python_files = [“test_.py”] python_classes = [“Test”] python_functions = [“test_*”] addopts = [“-v”, “—tb=short”]
[build-system]: Specifies the build backend and requirements scikit-build-core: Modern build system for C++ extensions pybind11: C++ to Python binding library cmake: C++ build system [project]: Package metadata and dependencies Standard Python package information requires-python: Minimum Python version classifiers: PyPI categorization [tool.scikit-build]: Build configuration build-dir: Where to place build artifacts wheel.exclude: Files to exclude from wheels [tool.scikit-build.cmake]: CMake arguments C++ standard and build type settings Cross-platform compilation flags [tool.cibuildwheel]: CI/CD wheel building Python versions and platforms to build for Platform-specific configurations The CMakeLists.txt file orchestrates the C++ build process and Python binding generation:
CMakeLists.txt
cmake_minimum_required(VERSION 3.15)
project( cmsketch VERSION 0.1.10 # Project version LANGUAGES CXX)
Generate compile_commands.json for IDE support
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
Build options
option(DEVELOPMENT_MODE “Enable development mode with IDE support” OFF) option(BUILD_PYTHON_BINDINGS “Build Python bindings for development” OFF)
C++ standard - use C++17 for better compatibility
set(CMAKE_CXX_STANDARD 17) set(CMAKE_CXX_STANDARD_REQUIRED ON) set(CMAKE_CXX_EXTENSIONS OFF)
Default build type
if(NOT CMAKE_BUILD_TYPE) set(CMAKE_BUILD_TYPE Release CACHE STRING “Build type” FORCE) endif()
Compiler warnings
if(MSVC) add_compile_options(/W4)
Enable Windows symbol export
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON) else() add_compile_options(-Wall -Wextra -Wpedantic)
Enable position independent code for shared libraries
set(CMAKE_POSITION_INDEPENDENT_CODE ON) endif()
Platform-specific settings
if(APPLE) set(CMAKE_OSX_DEPLOYMENT_TARGET “10.9” CACHE STRING “Minimum OS X deployment version”) set(CMAKE_OSX_ARCHITECTURES “x86_64;arm64” CACHE STRING “Build architectures for OS X”) endif()
Source files
file(GLOB_RECURSE CMSKETCH_SOURCES “src/cmsketchcpp/*.cc”)
Create library
add_library(
cmsketch ${CMSKETCH_SOURCES}) target_include_directories(cmsketch PUBLIC include) target_compile_features(cmsketch PUBLIC cxx_std_17)
Example executable
file(GLOB EXAMPLE_SOURCES “src/main.cc”) add_executable(cmsketch_example ${EXAMPLE_SOURCES}) target_link_libraries(cmsketch_example PRIVATE cmsketch)
Install targets
install(TARGETS cmsketch DESTINATION lib) install(DIRECTORY include/ DESTINATION include)
Python bindings
if(SKBUILD_PROJECT_NAME OR BUILD_PYTHON_BINDINGS OR DEVELOPMENT_MODE) set(PYBIND11_FINDPYTHON ON) find_package(pybind11 REQUIRED) pybind11_add_module(_core MODULE src/python_bindings.cc) target_link_libraries(_core PRIVATE cmsketch) if(SKBUILD_PROJECT_NAME) install(TARGETS _core DESTINATION ${SKBUILD_PROJECT_NAME}) endif() endif()
Testing
option(BUILD_TESTS “Build tests” OFF) if(BUILD_TESTS OR DEVELOPMENT_MODE) find_package(GTest REQUIRED) enable_testing() add_subdirectory(tests) endif()
Project Setup: Basic project configuration and C++ standard Compiler Settings: Platform-specific compiler flags and warnings Library Creation: Building the core C++ library Python Bindings: pybind11 integration for Python extensions Testing: Google Test integration for C++ unit tests Installation: Target installation for packaging The Python bindings are created in src/python_bindings.cc: // src/python_bindings.cc #include “cmsketch/cmsketch.h” #include #include
namespace py = pybind11;
// Macro to define common CountMinSketch methods for a given type
#define DEFINE_COUNT_MIN_SKETCH_METHODS(class_type, class_name)
py::class_>(m, class_name)
.def(py::init(), py::arg(“width”), py::arg(“depth”),
“Create a Count-Min Sketch with specified dimensions”)
.def(“insert”, &cmsketch::CountMinSketch::Insert,
py::arg(“item”), “Insert an item into the sketch”)
.def(“count”, &cmsketch::CountMinSketch::Count,
py::arg(“item”), “Get the estimated count of an item”)
.def(“clear”, &cmsketch::CountMinSketch::Clear,
“Reset the sketch to initial state”)
.def(“merge”, &cmsketch::CountMinSketch::Merge,
py::arg(“other”), “Merge another sketch into this one”)
.def(“top_k”, &cmsketch::CountMinSketch::TopK, py::arg(“k”),
py::arg(“candidates”), “Get the top k items from candidates”)
.def(“get_width”, &cmsketch::CountMinSketch::GetWidth,
“Get the width of the sketch”)
.def(“get_depth”, &cmsketch::CountMinSketch::GetDepth,
“Get the depth of the sketch”)
PYBIND11_MODULE(_core, m) { m.doc() = “Count-Min Sketch implementation with Python bindings”;
// CountMinSketch class for strings DEFINE_COUNT_MIN_SKETCH_METHODS(std::string, “CountMinSketchStr”);
// CountMinSketch class for int DEFINE_COUNT_MIN_SKETCH_METHODS(int, “CountMinSketchInt”); }
Automatic Type Conversion: STL containers are automatically converted Method Binding: C++ methods become Python methods Documentation: Docstrings are automatically generated Template Specialization: Different types get separate Python classes The core advantage of this C++ implementation is its use of atomic operations for thread safety, which bypasses Python’s Global Interpreter Lock (GIL). Here’s how the atomic implementation works: Header file (include/cmsketch/count_min_sketch.h): // include/cmsketch/count_min_sketch.h template class CountMinSketch { private: // 2D array of atomic counters for thread-safe operations std::vector>> counters_; std::vector> hash_functions_; size_t width_; size_t depth_;
public: void Insert(const KeyType& key); size_t Count(const KeyType& key) const; // … other method declarations };
Implementation file (src/cmsketchcpp/count_min_sketch.cc): // src/cmsketchcpp/count_min_sketch.cc template void CountMinSketch::Insert(const KeyType& key) { for (size_t i = 0; i size_t CountMinSketch::Count(const KeyType& key) const { size_t min_count = std::numeric_limits::max(); for (size_t i = 0; i ::Insert(const KeyType& key) { // Parallel access - no locks needed counters_[i][index].fetch_add(1, std::memory_order_relaxed); }
This atomic implementation enables true parallel processing where multiple threads can simultaneously insert and query the sketch without blocking each other, providing significant performance advantages in multithreaded environments. Here’s the complete development workflow for building Python packages with C++ extensions:
Create project directory
mkdir my-python-cpp-package cd my-python-cpp-package
Initialize git repository
git init
Create basic directory structure
mkdir -p include/mypackage src/mypackagecpp src/mypackage/py tests pytests examples
Install development dependencies
uv sync —dev
Build in development mode
uv run python -m pip install -e .
Run tests
uv run pytest pytests/ make build-dev && cd build && make test
Understanding how files relate to the project structure: C++ Headers (include/cmsketch/): cmsketch.h → Main header included by users count_min_sketch.h → Core template class definition hash_util.h → Utility functions C++ Implementation (src/cmsketchcpp/): count_min_sketch.cc → Template class implementation Links to headers via #include “cmsketch/cmsketch.h”
Python Package (src/cmsketch/): init.py → Package initialization and public API _core.pyi → Type stubs for C++ bindings base.py → Abstract base classes py/ → Pure Python implementations Python Bindings (src/python_bindings.cc): Links C++ library to Python via pybind11 Creates _core module with CountMinSketchStr and CountMinSketchInt classes Build Configuration: pyproject.toml → Python package metadata and build settings CMakeLists.txt → C++ build configuration and pybind11 integration The build process follows this sequence: CMake Configuration: Reads CMakeLists.txt and configures build C++ Compilation: Compiles C++ source files into library pybind11 Binding: Generates Python extension module Python Packaging: Creates wheel with both C++ library and Python bindings C++ Tests (tests/): // tests/test_count_min_sketch.cc #include #include “cmsketch/cmsketch.h”
TEST(CountMinSketchTest, BasicFunctionality) { cmsketch::CountMinSketch sketch(100, 3); sketch.Insert(“test”); EXPECT_EQ(sketch.Count(“test”), 1); }
Python Tests (pytests/):
pytests/test_count_min_sketch.py
import pytest import cmsketch
def test_basic_functionality(): sketch = cmsketch.CountMinSketchStr(100, 3) sketch.insert(“test”) assert sketch.count(“test”) == 1
The project uses GitHub Actions for automated building and testing: Test Workflow (.github/workflows/test.yml): Runs on push/PR Tests C++ and Python code Cross-platform testing (Windows, Linux, macOS) Wheel Building (.github/workflows/wheels.yml): Uses cibuildwheel for cross-platform wheel generation Builds for multiple Python versions and architectures Tests wheels before publishing Release Workflow (.github/workflows/release.yml): Triggers on git tags Publishes wheels to PyPI Creates GitHub releases The core implementation uses a template-based design that supports any hashable key type: // src/main.cc #include “cmsketch/cmsketch.h” #include
int main() { // Create a sketch with width=1000, depth=5 cmsketch::CountMinSketch sketch(1000, 5);
// Add elements
sketch.Insert("apple");
sketch.Insert("apple");
sketch.Insert("banana");
// Query frequencies
std::cout
class CountMinSketch { public: CountMinSketch(size_t width, size_t depth); void Insert(const KeyType& key); size_t Count(const KeyType& key) const; std::vector> TopK(size_t k, const std::vector& candidates) const; void Merge(const CountMinSketch& other); void Clear();
private: std::vector>> counters_; std::vector> hash_functions_; size_t width_; size_t depth_; };
This design ensures type safety while maintaining high performance through template specialization. The Python interface provides a clean, easy-to-use API:
examples/example.py
import cmsketch
Create a sketch for strings
sketch = cmsketch.CountMinSketchStr(1000, 5)
Add elements
sketch.insert(“apple”) sketch.insert(“apple”) sketch.insert(“banana”)
Query frequencies
print(f”apple: {sketch.count(‘apple’)}”) # 2 print(f”banana: {sketch.count(‘banana’)}”) # 1 print(f”cherry: {sketch.count(‘cherry’)}”) # 0
Get top-k items
candidates = [“apple”, “banana”, “cherry”] top_k = sketch.top_k(2, candidates) for item, count in top_k: print(f”{item}: {count}”)
The library provides specialized classes for different data types: CountMinSketchStr: String-based sketch CountMinSketchInt: Integer-based sketch This approach optimizes performance for common use cases while maintaining the flexibility of the underlying C++ implementation. The C++ implementation provides significant performance improvements over Python, especially in multithreaded environments. Here are the actual benchmark results from our test suite: The benchmark suite tests real-world scenarios with: 100,000 IP address samples generated using Faker with weighted distribution Realistic frequency patterns (most frequent IP appears ~10% of the time) Threaded processing with 10 concurrent workers and 1,000-item batch
es Comprehensive testing across insert, count, top-k, and streaming operations Insert Performance (100k items, threaded): C++: 45.79ms (21.84 ops/sec) Python: 8,751.15ms (0.11 ops/sec) Speedup: 191x faster with C++ Count Performance (querying unique items): C++: 4.71μs per query (212,130 ops/sec) Python: 858.58μs per query (1,165 ops/sec) Speedup: 182x faster with C++ Top-K Performance (finding top items): C++: 2.57μs per operation (389,163 ops/sec) Python: 857.54μs per operation (1,166 ops/sec) Speedup: 334x faster with C++ Streaming Performance (insert + top-k): C++: 46.03ms (21.72 ops/sec) Python: 8,889.81ms (0.11 ops/sec) Speedup: 193x faster with C++
Operation C++ Time Python Time Speedup Key Advantage
Insert (100k threaded) 45.79ms 8,751.15ms 191x GIL bypass + atomic operations
Count (per query) 4.71μs 858.58μs 182x Direct memory access
Top-K (per operation) 2.57μs 857.54μs 334x Optimized algorithms
Streaming (end-to-end) 46.03ms 8,889.81ms 193x Combined benefits
Run all benchmarks with pytest
uv run pytest ./benchmarks
Run specific benchmark categories
uv run pytest ./benchmarks -k “insert” uv run pytest ./benchmarks -k “count” uv run pytest ./benchmarks -k “topk”
Run with verbose output
uv run pytest ./benchmarks -v
Generate test data (if needed)
uv run python ./benchmarks/generate_data.py
The benchmark suite uses pytest-benchmark for reliable measurements and includes both synthetic and real-world data patterns.
- GIL Bypass in Multithreaded Operations Python: GIL serializes all operations, even with threading C++: Atomic operations allow true parallel processing Result: 191x speedup in threaded insertions
- Memory Access Patterns Python: Object overhead, dynamic typing, garbage collection C++: Direct memory access, contiguous arrays, no GC overhead Result: 182x speedup in count operations
- Algorithm Optimization Python: Interpreted bytecode, dynamic dispatch C++: Compiled machine code, template specialization Result: 334x speedup in top-k operations
- Thread Safety Implementation
Python: Lock-based (serialized)
def insert_python(self, key): with self.lock: # All threads wait here # … increment counters
// C++: Atomic operations (parallel) void CountMinSketch::Insert(const KeyType& key) { // All threads can execute simultaneously counters_[i][index].fetch_add(1, std::memory_order_relaxed); }
- Memory Efficiency Python: ~8 bytes per integer + object overhead C++: 4 bytes per atomic counter Result: 2-3x less memory usage The project demonstrates modern software engineering practices: CMake: Cross-platform C++ build configuration scikit-build-core: Modern Python build system for C++ extensions pybind11: Seamless C++ to Python binding generation uv: Fast, modern Python package management count-min-sketch/ ├── include/cmsketch/ # C++ header files │ ├── cmsketch.h # Main header │ ├── count_min_sketch.h # Core template class │ └── hash_util.h # Hash utilities ├── src/cmsketchcpp/ # C++ source files │ └── count_min_sketch.cc # Core implementation ├── src/cmsketch/ # Python package │ ├── init.py # Package initialization │ ├── _core.pyi # Type stubs │ └── py/ # Pure Python implementations ├── tests/ # C++ unit tests ├── pytests/ # Python tests ├── benchmarks/ # Performance benchmarks └── examples/ # Example scripts
The project uses GitHub Actions for automated testing and publishing: Cross-Platform Testing: Windows, Linux, macOS Wheel Building: Automated wheel generation for all platforms PyPI Publishing: Automatic package distribution on release This project demonstrates several important software engineering concepts: pybind11 Integration: Seamless C++ to Python binding generation Type Stubs: Complete type information for Python IDEs Modern Build Tools: scikit-build-core and uv for package management C++ vs Python: Direct performance comparison between implementations Memory Efficiency: Optimized data structures and memory usage patterns Thread Safety: Atomic operations and concurrent access patterns CMake: Cross-platform C++ build configuration Python Packaging: Complete pip-installable package creation CI/CD: Automated testing and publishing workflows Template Metaprogramming: Generic, type-safe implementations RAII: Resource management and exception safety STL Integration: Standard library containers and algorithms
Using pip
pip install cmsketch
Using uv (recommended)
uv add cmsketch
import cmsketch
Create a sketch
sketch = cmsketch.CountMinSketchStr(1000, 5)
Add elements
sketch.insert(“apple”) sketch.insert(“apple”) sketch.insert(“banana”)
Query frequencies
print(f”apple: {sketch.count(‘apple’)}”) # 2 print(f”banana: {sketch.count(‘banana’)}”) # 1
Clone the repository
git clone https://github.com/isaac-fate/count-min-sketch.git cd count-min-sketch
Build everything
make build
Run tests
make test
Run example
make example
Building Python packages with C++ extensions requires understanding several interconnected systems: Clear separation between C++ headers, implementation, and Python bindings Logical organization that scales from simple to complex projects Build artifact management to keep source control clean pyproject.toml for modern Python packaging standards CMakeLists.txt for cross-platform C++ compilation pybind11 for seamless C++ to Python binding generation Incremental development with hot reloading during development Comprehensive testing at both C++ and Python levels CI/CD automation for cross-platform wheel building and publishing 191x speedup in threaded insertions (GIL bypass) 182x speedup in count operations (direct memory access) 334x speedup in top-k oper
ations (compiled optimization) Atomic operations enable true parallel processing without locks Memory efficiency through direct C++ data structure control To apply these techniques to your own projects: Start Simple: Begin with a basic C++ function and Python binding Iterate Gradually: Add complexity incrementally (templates, STL containers, etc.) Test Thoroughly: Implement both C++ and Python test suites Automate Everything: Set up CI/CD for automated building and testing Document Well: Provide clear examples and API documentation The complete source code, documentation, and benchmarks are available on GitHub, and the package is available on PyPI for immediate use. This approach to Python package development with C++ extensions provides a solid foundation for building high-performance libraries that combine the best of both worlds: Python’s ease of use and C++‘s performance.