MacBook M5 Pro and Qwen3.5 = Local AI Security System

Published: (March 20, 2026 at 12:41 PM EDT)
2 min read

Source: Hacker News


⚠️ Collection Error: Content refinement error: Error: 429 429 Too Many Requests: you (bkperio) have reached your weekly usage limit, upgrade for higher limits: https://ollama.com/upgrade


Qwen3.5-9B scores 93.8% — within 4 points of GPT-5.4 — running entirely on a MacBook Pro M5 at 25 tok/s, 765ms TTFT, using only 13.8 GB of unified memory. Zero API costs. Full data privacy. All local.

MacBook Pro M5 · M5 Pro · 18 cores · 64 GB Unified Memory · macOS 15.3 (arm64) · llama.cpp

Full Leaderboard

96-test evaluation across 15 suites covering tool use, security classification, event deduplication, and more. Rank Model Type Passed Failed Pass Rate Time 🥇 GPT-5.4
☁️ Cloud 94 2 97.9% 2m 22s 🥈 GPT-5.4-mini
☁️ Cloud 92 4 95.8% 1m 17s 🥉 Qwen3.5-9B (Q4_K_M)
🏠 Local 90 6 93.8% 5m 23s 🥉 Qwen3.5-27B (Q4_K_M)
🏠 Local 90 6 93.8% 15m 8s 5 Qwen3.5-122B-MoE (IQ1_M)
🏠 Local 89 7 92.7% 8m 26s 5 GPT-5.4-nano
☁️ Cloud 89 7 92.7% 1m 34s 7 Qwen3.5-35B-MoE (Q4_K_L)
🏠 Local 88 8 91.7% 3m 30s 8 GPT-5-mini (2025)
☁️ Cloud 60 36 62.5% 7m 38s

  • GPT-5-mini had many failures due to the API rejecting non-default temperature values — listed for completeness only.

Performance: Local vs Cloud

The Qwen3.5-35B-MoE has a lower TTFT than all OpenAI cloud models — 435ms vs 508ms for GPT-5.4-nano.

What is HomeSec-Bench?

A benchmark we created to evaluate LLMs on real home security assistant workflows — not generic chat, but the actual reasoning, triage, and tool use an AI home security system needs.

All 35 fixture images are AI-generated (no real user footage). Tests run against any OpenAI-compatible endpoint.

See It Run

Watch the benchmark suite execute live on Apple Silicon — every test visible in real time.

A 9B model on a laptop scoring within 4% of GPT-5.4 on domain tasks — fully offline with complete privacy — is the value proposition of local AI.

Download Aegis

Benchmark on GitHub

System: Aegis-AI — Local-first AI home security on consumer hardware.

Benchmark: HomeSec-Bench — 96 LLM + 35 VLM tests across 16 suites.

Skill Platform: DeepCamera — Decentralized AI skill ecosystem.

0 views
Back to Blog

Related posts

Read more »