QORA - Native Rust LLM Inference Engine

Published: 3 days ago (February 28, 2026 at 12:51 PM EST)

1 min read

Source: Dev.to

Cover image for QORA - Native Rust LLM Inference Engine

Download

🤗 https://huggingface.co/qoranet/QORA-LLM

Model Details

Feature	Value
Base Model	SmolLM3‑3B (HuggingFaceTB/SmolLM3‑3B)
Parameters	3.07 Billion
Quantization	Q4 (4‑bit symmetric, group_size=32)
Model Size	1.68 GB (Q4) / ~6 GB (F16)
Executable	6.7 MB
Context Length	65,536 tokens (up to 128K with YARN)
Platform	Windows x86_64 (CPU‑only)

Key Architectural Innovation: NoPE (No Position Encoding)

SmolLM3 uses a 3:1 NoPE ratio — 75 % of layers have no positional encoding at all. Only layers 3, 7, 11, 15, 19, 23, 27, 31, 35 apply RoPE. This reduces computational overhead and enables better long‑context generalization.

Performance Benchmarks

Test Hardware

OS: Windows 11
CPU‑only (no GPU acceleration)

QORA - Native Rust LLM Inference Engine

Download

Model Details

Key Architectural Innovation: NoPE (No Position Encoding)

Performance Benchmarks

Test Hardware

Related posts

Shared Workflows: minha experiência definindo pipelines reutilizáveis

Building a Local-First Financial IDE: How I forced Gemini AI to do strict Double-Entry Accounting

I ran cursor-doctor on 50 real projects. Here's what broke.

Google Gemini Writing Challenge