QORA - Native Rust LLM Inference Engine
Source: Dev.to

Download
🤗 https://huggingface.co/qoranet/QORA-LLM
Model Details
| Feature | Value |
|---|---|
| Base Model | SmolLM3‑3B (HuggingFaceTB/SmolLM3‑3B) |
| Parameters | 3.07 Billion |
| Quantization | Q4 (4‑bit symmetric, group_size=32) |
| Model Size | 1.68 GB (Q4) / ~6 GB (F16) |
| Executable | 6.7 MB |
| Context Length | 65,536 tokens (up to 128K with YARN) |
| Platform | Windows x86_64 (CPU‑only) |
Key Architectural Innovation: NoPE (No Position Encoding)
SmolLM3 uses a 3:1 NoPE ratio — 75 % of layers have no positional encoding at all. Only layers 3, 7, 11, 15, 19, 23, 27, 31, 35 apply RoPE. This reduces computational overhead and enables better long‑context generalization.
Performance Benchmarks
Test Hardware
- OS: Windows 11
- CPU‑only (no GPU acceleration)