QORA - Native Rust LLM Inference Engine

Published: (February 28, 2026 at 12:51 PM EST)
1 min read
Source: Dev.to

Source: Dev.to

Cover image for QORA - Native Rust LLM Inference Engine

Download

🤗 https://huggingface.co/qoranet/QORA-LLM

Model Details

FeatureValue
Base ModelSmolLM3‑3B (HuggingFaceTB/SmolLM3‑3B)
Parameters3.07 Billion
QuantizationQ4 (4‑bit symmetric, group_size=32)
Model Size1.68 GB (Q4) / ~6 GB (F16)
Executable6.7 MB
Context Length65,536 tokens (up to 128K with YARN)
PlatformWindows x86_64 (CPU‑only)

Key Architectural Innovation: NoPE (No Position Encoding)

SmolLM3 uses a 3:1 NoPE ratio — 75 % of layers have no positional encoding at all. Only layers 3, 7, 11, 15, 19, 23, 27, 31, 35 apply RoPE. This reduces computational overhead and enables better long‑context generalization.

Performance Benchmarks

Test Hardware

  • OS: Windows 11
  • CPU‑only (no GPU acceleration)
0 views
Back to Blog

Related posts

Read more »

Google Gemini Writing Challenge

What I Built - Where Gemini fit in - Used Gemini’s multimodal capabilities to let users upload screenshots of notes, diagrams, or code snippets. - Gemini gener...