hyperhive/TODO.md

# TODO

Pick anything from here when relevant. Cross-cutting design notes live in
[CLAUDE.md](CLAUDE.md); high-level project intro in [README.md](README.md).

## Security

- **Unprivileged containers (userns mapping).** Today the nspawn container
  runs as a fully privileged root. Goal: `PrivateUsersChown=yes` (or the
  nixos-container equivalent) so uid 0 inside maps to an unprivileged uid
  on the host, and a container-root compromise lands the attacker on an
  ordinary user account, not the host's root. Requires per-agent state
  dirs to be chown'd to that uid on the host side.
- **Bash command allow-list.** Replace the blanket `Bash` allow with a
  pattern allow-list (`Bash(git *)`, `Bash(nix build .*)`, etc.) per
  claude-code's `--allowedTools` extended grammar. Likely lives in
  `agent.nix` so each agent can scope its own shell surface.

## Per-agent settings

- **Model override.** Hard-coded to `haiku` in the turn loop right now.
  Surface as a per-agent override: operator via dashboard, manager via
  `request_apply_commit` setting an attr on the agent's flake (most natural
  place since the flake already carries per-agent env/identity). Pair with
  a **model status** indicator on the agent page (active / queued / last
  switched) once the override is in place.

## UI / UX

- **Terminal: `/model` slash command.** Operator-typeable model
  override from the terminal. Depends on the model-override work
  above; once an override mechanism exists, wire a `/model <name>`
  command that POSTs to a new endpoint.
- **xterm.js terminal** embedded per-agent, attached to a PTY exposed by
  the harness. Pairs well with the unprivileged-container work — would let
  the operator drop into the container without `nixos-container root-login`.

## Telemetry

- **Harness stats per agent in sqlite, charted on the agent page.**
  bitburner-agent samples 18 series; for hyperhive the generally-applicable
  ones are:
  - turns/min, tool calls/turn, turn duration p50/p95
  - claude exit code distribution (ok vs `--compact`-retry vs failure)
  - inbox depth (current + max-over-window)
  - messages sent/received per turn (split by recipient: peer / operator /
    manager / system)
  - approval queue length (across all agents — dashboard-level)
  - per-tool usage counts (Read/Edit/Bash/send/recv/…)
  - time-since-last-turn (helps spot stuck agents)
  - notes file size growth (cues compaction)
  Backend: a `stats` table with `(agent, ts, key, value)` written from
  the harness on `TurnEnd`; `GET /api/stats?since=…` returns the
  series; agent page renders with a small chart lib (uPlot is light).

## Manager → operator question channel


## Spawn flow

- **Two-step spawn.** Today `request_spawn(name)` is one shot: manager
  asks → operator approves → container is created with a default
  `agent.nix` and empty `/state/`. Manager has no way to pre-stage
  per-agent prompt material, package additions, or initial notes before
  the agent first wakes. Split into:
  1. `request_spawn_draft(name)` — host creates the per-agent
     `proposed/` repo (initial commit) and `state/` dir with no
     container; manager now has `/agents/<name>/{config,state}/` to
     edit + commit just like an existing agent.
  2. `request_spawn_commit(name, commit_ref)` — submits the queued
     approval; operator sees the diff in the dashboard like a normal
     `apply_commit`; on approve the container is created from that
     commit.
  Backwards-compat: keep the existing one-shot `request_spawn` for
  trivial agents (operator can still type a name in the dashboard).
  Surface "drafts" as a new section between K3PT ST4T3 and approvals.

## Loop substance

- **Notes compaction.** `/state/` is bind-mounted persistently and agents
  are told (in the system prompt) to keep `/state/notes.md` for durable
  knowledge — but we don't currently nudge them to compact when notes
  grow. Bitburner-agent's pattern: a short-lived secondary claude session
  that takes the existing notes + a "compact this" prompt and rewrites
  them in place. Add when the notes start bloating.

## Lifecycle / reliability

- **Container crash events.** Watch `container@*.service` via D-Bus, push
  `HelperEvent::ContainerCrash` to the manager's inbox so the manager can
  react (restart, escalate, etc.).