Skip to main content
NJannasch.Dev

Blog

OpenWrtNetworkingMobile WorkHomelabAntigravity CLI

Beryl AX OpenWrt Mobile Office Router with Antigravity CLI

Pixel 8 USB tethering, Mobile_Net Wi-Fi, SQM, encrypted DNS and adblock on a GL.iNet Beryl AX, set up with Antigravity CLI.

· 7 min read
AISecurityAgentsHomelab

Sandboxing AI Agents: Kernel-Level Isolation with nono and Landlock

AI agents have terminal access, network calls, and file operations. Here's how I lock them down with OS user isolation, Landlock kernel sandboxing, and nono. and where the gaps still are.

· 11 min read Beyond the Chat Window · Part 5
AIHomelabAgentsLocal-FirstMCP

From Vibe Coding to AI Agent: My Local Qwen 3.6 Now Runs 24/7

Built a local AI agent with Hermes on Qwen 3.6 MTP at 125 t/s. From benchmarking to vibe coding to a 24/7 autonomous agent. no API costs.

· 9 min read Beyond the Chat Window · Part 4
AIHomelabllama.cppBenchmarking

Gemma 4 MTP vs Qwen 3.6: Same GPU, Different Speedups

Gemma 4 MTP hits 133 t/s (1.32x) vs Qwen's 144 t/s (1.47x) on an RTX 5060 Ti. The 441 MB drafter looks light but compute buffers eat the savings.

· 4 min read Fast AI, Real Risks · Part 11
AIHomelabllama.cppBenchmarking

MTP Speculative Decoding Actually Works on MoE: 144 t/s on a 16GB GPU

MTP landed for Qwen 3.6 in llama.cpp. MoE jumps from 98 to 144 t/s, dense gets 42% slower. Benchmark data, server configs, and why MTP needs bandwidth headroom.

· 5 min read Fast AI, Real Risks · Part 10
AIHomelabllama.cppBenchmarking

Gemma 4 on a 5060 Ti: 256K Context on 16GB — but Only if You Know the Architecture Trick

Gemma 4 26B MoE hits 99 t/s with 256K context on an RTX 5060 Ti 16 GB. The 31B dense tops out at 65K. One flag and one architecture trick make the difference.

· 9 min read Fast AI, Real Risks · Part 8