hyperhive/TODO.md

# TODO

Pick anything from here when relevant. Cross-cutting design notes live in
[CLAUDE.md](CLAUDE.md); high-level project intro in [README.md](README.md).

## Turn loop

- **`recv` with no `wait_seconds` should return immediately.**
  Today omitting the argument falls through to the 30s
  default long-poll (`RECV_LONG_POLL_DEFAULT` in
  `hive-c0re/src/agent_server.rs`); a manager that wants a
  cheap "anything in the inbox right now?" peek has to
  explicitly pass `wait_seconds: 0`. Flip the semantics so
  `None` = no sleep, returning `None` (or the empty inbox
  shape) right away. The agent opts into the long-poll by
  setting a positive value. Update both `AgentRequest::Recv`
  and `ManagerRequest::Recv` handlers + the prompt language
  in `prompts/{agent,manager}.md`. Tighten the cap (180s)
  too — only meaningful when the agent is choosing to wait.

## Permissions / policy

- **Per-agent send allow-list.** Today any agent can `send` to any
  other recipient (peer, manager, operator). Add a per-agent
  policy that constrains the `to` field — declared in `agent.nix`,
  e.g. `hyperhive.allowedRecipients = [ "manager" "alice" ]`.
  Broker rejects with an `Err { message }` when the policy denies.
  Default: unrestricted (back-compat). The manager can still
  always send anywhere. Useful for sandboxing untrusted sub-agents
  so they can only talk to the manager, not other sub-agents.

## Security

- **Unprivileged containers (userns mapping).** Today the nspawn container
  runs as a fully privileged root. Goal: `PrivateUsersChown=yes` (or the
  nixos-container equivalent) so uid 0 inside maps to an unprivileged uid
  on the host, and a container-root compromise lands the attacker on an
  ordinary user account, not the host's root. Requires per-agent state
  dirs to be chown'd to that uid on the host side. The per-agent git
  identity (currently injected via `programs.git.config.user` against
  the root user in `setup_applied`'s generated flake) also needs to be
  provisioned for whatever non-root user claude runs as, or commits
  the manager makes against `/agents/<n>/config` will fall back to a
  generic `nixos@…` identity.
- **Bash command allow-list.** Replace the blanket `Bash` allow with a
  pattern allow-list (`Bash(git *)`, `Bash(nix build .*)`, etc.) per
  claude-code's `--allowedTools` extended grammar. Likely lives in
  `agent.nix` so each agent can scope its own shell surface.

## Per-agent extension

- **Custom per-agent MCP tools.** Today every sub-agent gets the
  same fixed MCP surface (`send`, `recv`). To move bitburner-agent
  (and anything else with rich domain tooling) into hyperhive, an
  agent needs a way to ship its own tools alongside hyperhive's.
  Sketch: `agent.nix` declares a list of extra MCP servers
  (command + args + env), each registered into the agent's
  `--mcp-config` blob at flake-render time. The harness MCP server
  remains the hyperhive surface; new servers slot in as additional
  entries under `mcpServers.<name>` so claude sees them as
  `mcp__<name>__<tool>`. Per-agent tool whitelist (`allowedTools`)
  derived from the same config so the operator stays in control of
  what's exposed.

## Operational hygiene (post-meta-flake)

- **Tag retention.** Every approval mints up to 5 tags in
  `applied/<n>/.git` (`proposal/`, `approved/`, `building/`,
  `deployed/`, plus `failed/` or `denied/`). Every successful
  deploy adds one commit to `/var/lib/hyperhive/meta/.git`.
  Both grow unbounded. A retention policy — keep all
  `deployed/*` indefinitely, age-out `failed/` + `denied/`
  after N days, drop `proposal/` + `approved/` + `building/`
  once a terminal sibling lands — would keep the audit
  trails browsable without forever-growth.

- **Reject proposals that touch `flake.nix`.** The manager's
  prompt says don't edit it, but nothing on the host side
  enforces. Add a check in
  `manager_server::submit_apply_commit`: after fetching the
  proposal sha into applied, `git diff-tree <sha> -- flake.nix`
  — non-empty diff → refuse + clear error message. Cheap
  belt-and-suspenders.

- **Inert `nix flake lock` no-args call in `meta::sync_agents`.**
  Still valid in current nix (resolves missing inputs without
  bumping existing ones) but parallel to the deprecated
  `--update-input` we just had to migrate. Worth keeping an
  eye on; if it gets renamed too, sync_agents stops being
  able to seed a fresh meta repo.

## Bugs

- **Pending question doesn't always appear on the dashboard.**
  Repro: manager calls `ask_operator`, tool result is
  `question queued (id=N)` (so the row is in sqlite), but the
  M1ND H4S QU3STI0NS section keeps showing "no pending
  questions". Last seen with id=5. Suspected paths:
  - `OperatorQuestions::pending()` returns Err and the
    `unwrap_or_default()` in `api_state` hides it. Surface the
    error (warn-log) and check.
  - serialization: a new field in `OpQuestion` (e.g.
    `deadline_at: Option<i64>`) deserializes wrong against an
    old row whose columns don't match the new SELECT order →
    `row.get(N)?` panics for that row, the whole iterator
    errors, `pending()` returns Err. Diagnose by curl
    `/api/state | jq '.questions'` and compare with sqlite
    counts.
  - dashboard JS swallows a render error. Open browser console
    and look for exceptions during `renderQuestions`.

## UI / UX

- **Web UI for config repos + meta deploy log.** Browse
  per-agent proposed / applied tags
  (`proposal/* / approved/* / building/* / deployed/* /
  failed/* / denied/*`) plus the swarm-wide meta repo's git
  log on the dashboard. Read-only log + diff + raw-file view
  is enough — something lighter than a full forge. The meta
  log already answers "what's deployed where + when"; this
  surfaces it without an ssh-to-host detour.

- **xterm.js terminal** embedded per-agent, attached to a PTY exposed by
  the harness. Pairs well with the unprivileged-container work — would let
  the operator drop into the container without `nixos-container root-login`.

## Telemetry

- **Harness stats per agent in sqlite, charted on the agent page.**
  bitburner-agent samples 18 series; for hyperhive the generally-applicable
  ones are:
  - turns/min, tool calls/turn, turn duration p50/p95
  - claude exit code distribution (ok vs `--compact`-retry vs failure)
  - inbox depth (current + max-over-window)
  - messages sent/received per turn (split by recipient: peer / operator /
    manager / system)
  - approval queue length (across all agents — dashboard-level)
  - per-tool usage counts (Read/Edit/Bash/send/recv/…)
  - time-since-last-turn (helps spot stuck agents)
  - notes file size growth (cues compaction)
  Backend: a `stats` table with `(agent, ts, key, value)` written from
  the harness on `TurnEnd`; `GET /api/stats?since=…` returns the
  series; agent page renders with a small chart lib (uPlot is light).

## Spawn flow

- **Two-step spawn.** Today `request_spawn(name)` is one shot: manager
  asks → operator approves → container is created with a default
  `agent.nix` and empty `/state/`. Manager has no way to pre-stage
  per-agent prompt material, package additions, or initial notes before
  the agent first wakes. Split into:
  1. `request_spawn_draft(name)` — host creates the per-agent
     `proposed/` repo (initial commit) and `state/` dir with no
     container; manager now has `/agents/<name>/{config,state}/` to
     edit + commit just like an existing agent.
  2. `request_spawn_commit(name, commit_ref)` — submits the queued
     approval; operator sees the diff in the dashboard like a normal
     `apply_commit`; on approve the container is created from that
     commit.
  Backwards-compat: keep the existing one-shot `request_spawn` for
  trivial agents (operator can still type a name in the dashboard).
  Surface "drafts" as a new section between K3PT ST4T3 and approvals.

## Loop substance

- **Notes compaction.** `/state/` is bind-mounted persistently and agents
  are told (in the system prompt) to keep `/state/notes.md` for durable
  knowledge — but we don't currently nudge them to compact when notes
  grow. Bitburner-agent's pattern: a short-lived secondary claude session
  that takes the existing notes + a "compact this" prompt and rewrites
  them in place. Add when the notes start bloating.