hyperhive/TODO.md
müde 6db38cf70c model: runtime override via /model slash; fixes for port + bind
- runtime model override: Bus::{model,set_model} + POST /api/model
  (form-encoded {model: name}). turn.rs reads bus.model() per turn
  so a flip lands on the next claude invocation. /api/state grows
  a model field; agent page shows a 'model · <name>' chip in the
  state row. '/model <name>' slash command POSTs to the endpoint
  and refreshes state.

- port regression fix: agent_web_port no longer probes forward for
  *existing* agents (the previous fix shifted ports for any agent
  without a port file, including legacy ones whose container was
  already bound to the bare hashed port — dashboard rendered the
  new port, container was still on the old one, conn errors). new
  rule: port file exists → use it; absent + applied flake present
  → legacy, persist port_hash without probing; absent + no applied
  flake → fresh spawn, probe forward.

- SO_REUSEADDR on both the dashboard and per-agent web UI binds
  via tokio::net::TcpSocket. operator hit 12 retries failing on
  manager :8000 — REUSEADDR handles the TIME_WAIT case cleanly
  without a new dep; retry still covers the genuine
  process-still-alive overlap.

todo: drops the model-override entry (shipped); adds two new
items — model persistence (optional, future), and custom
per-agent MCP tools (groundwork for moving bitburner-agent into
hyperhive).
2026-05-15 20:59:45 +02:00

107 lines
4.9 KiB
Markdown

# TODO
Pick anything from here when relevant. Cross-cutting design notes live in
[CLAUDE.md](CLAUDE.md); high-level project intro in [README.md](README.md).
## Security
- **Unprivileged containers (userns mapping).** Today the nspawn container
runs as a fully privileged root. Goal: `PrivateUsersChown=yes` (or the
nixos-container equivalent) so uid 0 inside maps to an unprivileged uid
on the host, and a container-root compromise lands the attacker on an
ordinary user account, not the host's root. Requires per-agent state
dirs to be chown'd to that uid on the host side.
- **Bash command allow-list.** Replace the blanket `Bash` allow with a
pattern allow-list (`Bash(git *)`, `Bash(nix build .*)`, etc.) per
claude-code's `--allowedTools` extended grammar. Likely lives in
`agent.nix` so each agent can scope its own shell surface.
## Per-agent extension
- **Custom per-agent MCP tools.** Today every sub-agent gets the
same fixed MCP surface (`send`, `recv`). To move bitburner-agent
(and anything else with rich domain tooling) into hyperhive, an
agent needs a way to ship its own tools alongside hyperhive's.
Sketch: `agent.nix` declares a list of extra MCP servers
(command + args + env), each registered into the agent's
`--mcp-config` blob at flake-render time. The harness MCP server
remains the hyperhive surface; new servers slot in as additional
entries under `mcpServers.<name>` so claude sees them as
`mcp__<name>__<tool>`. Per-agent tool whitelist (`allowedTools`)
derived from the same config so the operator stays in control of
what's exposed.
## Per-agent settings
- **Model override persistence.** `/model <name>` already switches
the model at runtime via `Bus::set_model`; the chip on the agent
page reflects the current value. Override is in-memory only and
resets on harness restart — by design for now, but consider
optional persistence (`/state/model` file?) so an operator-set
model survives a rebuild.
## UI / UX
- **Terminal: `/model` slash command.** Operator-typeable model
override from the terminal. Depends on the model-override work
above; once an override mechanism exists, wire a `/model <name>`
command that POSTs to a new endpoint.
- **xterm.js terminal** embedded per-agent, attached to a PTY exposed by
the harness. Pairs well with the unprivileged-container work — would let
the operator drop into the container without `nixos-container root-login`.
## Telemetry
- **Harness stats per agent in sqlite, charted on the agent page.**
bitburner-agent samples 18 series; for hyperhive the generally-applicable
ones are:
- turns/min, tool calls/turn, turn duration p50/p95
- claude exit code distribution (ok vs `--compact`-retry vs failure)
- inbox depth (current + max-over-window)
- messages sent/received per turn (split by recipient: peer / operator /
manager / system)
- approval queue length (across all agents — dashboard-level)
- per-tool usage counts (Read/Edit/Bash/send/recv/…)
- time-since-last-turn (helps spot stuck agents)
- notes file size growth (cues compaction)
Backend: a `stats` table with `(agent, ts, key, value)` written from
the harness on `TurnEnd`; `GET /api/stats?since=…` returns the
series; agent page renders with a small chart lib (uPlot is light).
## Manager → operator question channel
## Spawn flow
- **Two-step spawn.** Today `request_spawn(name)` is one shot: manager
asks → operator approves → container is created with a default
`agent.nix` and empty `/state/`. Manager has no way to pre-stage
per-agent prompt material, package additions, or initial notes before
the agent first wakes. Split into:
1. `request_spawn_draft(name)` — host creates the per-agent
`proposed/` repo (initial commit) and `state/` dir with no
container; manager now has `/agents/<name>/{config,state}/` to
edit + commit just like an existing agent.
2. `request_spawn_commit(name, commit_ref)` — submits the queued
approval; operator sees the diff in the dashboard like a normal
`apply_commit`; on approve the container is created from that
commit.
Backwards-compat: keep the existing one-shot `request_spawn` for
trivial agents (operator can still type a name in the dashboard).
Surface "drafts" as a new section between K3PT ST4T3 and approvals.
## Loop substance
- **Notes compaction.** `/state/` is bind-mounted persistently and agents
are told (in the system prompt) to keep `/state/notes.md` for durable
knowledge — but we don't currently nudge them to compact when notes
grow. Bitburner-agent's pattern: a short-lived secondary claude session
that takes the existing notes + a "compact this" prompt and rewrites
them in place. Add when the notes start bloating.
## Lifecycle / reliability
- **Container crash events.** Watch `container@*.service` via D-Bus, push
`HelperEvent::ContainerCrash` to the manager's inbox so the manager can
react (restart, escalate, etc.).