MacBook M5 Pro and Qwen3.5 = Local AI Security System
Source: Hacker News
⚠️ Collection Error: Content refinement error: Error: 429 429 Too Many Requests: you (bkperio) have reached your weekly usage limit, upgrade for higher limits: https://ollama.com/upgrade
Qwen3.5-9B scores 93.8% — within 4 points of GPT-5.4 — running entirely on a MacBook Pro M5 at 25 tok/s, 765ms TTFT, using only 13.8 GB of unified memory. Zero API costs. Full data privacy. All local.
MacBook Pro M5 · M5 Pro · 18 cores · 64 GB Unified Memory · macOS 15.3 (arm64) · llama.cpp
Full Leaderboard
96-test evaluation across 15 suites covering tool use, security classification, event deduplication, and more.
Rank Model Type Passed Failed Pass Rate Time 🥇 GPT-5.4
☁️ Cloud
94 2 97.9% 2m 22s 🥈 GPT-5.4-mini
☁️ Cloud
92 4 95.8% 1m 17s 🥉 Qwen3.5-9B (Q4_K_M)
🏠 Local
90 6 93.8% 5m 23s 🥉 Qwen3.5-27B (Q4_K_M)
🏠 Local
90 6 93.8% 15m 8s 5 Qwen3.5-122B-MoE (IQ1_M)
🏠 Local
89 7 92.7% 8m 26s 5 GPT-5.4-nano
☁️ Cloud
89 7 92.7% 1m 34s 7 Qwen3.5-35B-MoE (Q4_K_L)
🏠 Local
88 8 91.7% 3m 30s 8 GPT-5-mini (2025)
☁️ Cloud
60 36 62.5% 7m 38s
- GPT-5-mini had many failures due to the API rejecting non-default temperature values — listed for completeness only.
Performance: Local vs Cloud
The Qwen3.5-35B-MoE has a lower TTFT than all OpenAI cloud models — 435ms vs 508ms for GPT-5.4-nano.
What is HomeSec-Bench?
A benchmark we created to evaluate LLMs on real home security assistant workflows — not generic chat, but the actual reasoning, triage, and tool use an AI home security system needs.
All 35 fixture images are AI-generated (no real user footage). Tests run against any OpenAI-compatible endpoint.
See It Run
Watch the benchmark suite execute live on Apple Silicon — every test visible in real time.
A 9B model on a laptop scoring within 4% of GPT-5.4 on domain tasks — fully offline with complete privacy — is the value proposition of local AI.
Download Aegis
Benchmark on GitHub
System: Aegis-AI — Local-first AI home security on consumer hardware.
Benchmark: HomeSec-Bench — 96 LLM + 35 VLM tests across 16 suites.
Skill Platform: DeepCamera — Decentralized AI skill ecosystem.