SmartKNN - Large Scale Classification Benchmarks (CPU)

Published: (December 28, 2025 at 12:15 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

Overview

This release presents initial classification benchmarks for SmartKNN, evaluated on million‑scale datasets with a strong focus on single‑prediction p95 latency and Macro‑F1 under real production constraints.

All benchmarks are:

  • CPU‑only
  • Single‑query inference
  • Non‑parametric, nonlinear models
  • Million‑row scale datasets

More benchmarks (higher‑dimensional datasets, regression tasks, mixed feature spaces) will be released soon.

Datasets Used

DatasetOpenML IDApprox. RowsFeatures (D)TaskSource
BNG (Adult)1180~1 M15ClassificationOpenML / Kaggle
BNG (Australian)1205~1 M15ClassificationOpenML / Kaggle
BNG (Credit‑G)40514~1 M21ClassificationOpenML / Kaggle
Click Prediction (Small)1218~2 M12ClassificationOpenML / Kaggle
Click45556~1 M12ClassificationOpenML / Kaggle
Census (Augmented)43489~1 M15ClassificationOpenML / Kaggle

Benchmark Results

BNG (Adult) — OpenML ID 1180

ModelAccuracyMacro‑F1Train (s)Batch (ms)Single Med (ms)Single P95 (ms)
XGBoost0.92610.636529.050.0030.2610.309
LightGBM0.92600.637320.380.0090.7040.790
CatBoost0.92610.635344.660.0160.4530.495
SmartKNN0.90390.6641334.100.0610.4240.468

BNG (Australian) — OpenML ID 1205

ModelAccuracyMacro‑F1Train (s)Batch (ms)Single Med (ms)Single P95 (ms)
XGBoost0.87530.872315.970.0030.2740.356
LightGBM0.87530.872413.960.0100.7040.800
CatBoost0.87480.871724.720.0010.3560.403
SmartKNN0.84730.843563.300.0330.3610.410

BNG (Credit‑G) — OpenML ID 40514

ModelAccuracyMacro‑F1Train (s)Batch (ms)Single Med (ms)Single P95 (ms)
XGBoost0.82450.779029.680.0040.2650.309
LightGBM0.82750.783421.820.0160.7080.786
CatBoost0.82290.775354.900.0230.5010.532
SmartKNN0.76820.7085493.940.0690.5180.559

Click Prediction (Small) — OpenML ID 1218

ModelAccuracyMacro‑F1Train (s)Batch (ms)Single Med (ms)Single P95 (ms)
XGBoost0.84110.532526.080.0040.5090.558
LightGBM0.84130.535824.920.0110.8790.958
CatBoost0.83920.515447.490.0000.4440.588
SmartKNN0.81580.5792159.640.0760.5550.597

Click — OpenML ID 45556

ModelAccuracyMacro‑F1Train (s)Batch (ms)Single Med (ms)Single P95 (ms)
XGBoost0.75210.752112.050.0040.5310.588
LightGBM0.75200.752012.740.0120.9111.345
CatBoost0.75040.750420.620.0010.4190.466
SmartKNN0.70050.700543.440.0320.3460.373

Census (Augmented) — OpenML ID 43489

ModelAccuracyMacro‑F1Train (s)Batch (ms)Single Med (ms)Single P95 (ms)
XGBoost0.88590.866832.180.0050.5210.646
LightGBM0.88610.866815.200.0120.9741.017
CatBoost0.88610.866861.910.0360.7520.789
SmartKNN0.86530.8427718.210.1070.6990.811

Notes

  • SmartKNN is a non‑parametric, instance‑based model with ANN acceleration.
  • Benchmarks emphasize tail latency (p95) rather than average inference time.
  • All results are reproducible using publicly available datasets.

Further benchmarks covering regression tasks and higher‑dimensional datasets will be released soon.

Positioning & Claim (Carefully Worded)

SmartKNN demonstrates state‑of‑the‑art p95 single‑prediction latency on CPU among non‑parametric, nonlinear models at million‑scale data sizes, while preserving instance‑based decision behavior.

Tree‑based models remain strong on average latency and accuracy, but SmartKNN shows that KNN‑style models can be competitive in tail latency, a dominant concern in real production systems.

To our knowledge, SmartKNN is among the fastest CPU‑only nonlinear, instance‑based models evaluated at this scale with reported p95 single‑query latency.

Reproducibility & Community Benchmarks

We encourage the community to:

  • Run these benchmarks on different hardware
  • Test alternative ANN configurations
  • Compare against additional models
  • Share results publicly

If you:

  • Find a performance regression – open a GitHub Issue
  • Have questions, ideas, or improvements – start a GitHub Discussion
  • Run new benchmarks – post your results

Community validation and feedback will directly shape future releases.

Learn more about SmartKNN:

  • Website:
  • GitHub:

Benchmarks:

Back to Blog

Related posts

Read more »

GLM-4.7-Flash

Article URL: https://huggingface.co/zai-org/GLM-4.7-Flash Comments URL: https://news.ycombinator.com/item?id=46679872 Points: 69 Comments: 11...