hyperhive/TODO.md
müde 6d52f67292 broker: hourly vacuum of delivered messages older than 30 days
undelivered rows are always kept regardless of age (still in flight).
sweep runs immediately on serve start then every hour. logs row count
when non-zero. keep_secs is hard-coded for now (30 days); can be
config-driven later if a host wants to retain more / less for audit.
2026-05-15 19:40:38 +02:00

120 lines
6.1 KiB
Markdown

# TODO
Pick anything from here when relevant. Cross-cutting design notes live in
[CLAUDE.md](CLAUDE.md); high-level project intro in [README.md](README.md).
## Security
- **Unprivileged containers (userns mapping).** Today the nspawn container
runs as a fully privileged root. Goal: `PrivateUsersChown=yes` (or the
nixos-container equivalent) so uid 0 inside maps to an unprivileged uid
on the host, and a container-root compromise lands the attacker on an
ordinary user account, not the host's root. Requires per-agent state
dirs to be chown'd to that uid on the host side.
- **Bash command allow-list.** Replace the blanket `Bash` allow with a
pattern allow-list (`Bash(git *)`, `Bash(nix build .*)`, etc.) per
claude-code's `--allowedTools` extended grammar. Likely lives in
`agent.nix` so each agent can scope its own shell surface.
## Per-agent settings
- **Model override.** Hard-coded to `haiku` in the turn loop right now.
Surface as a per-agent override: operator via dashboard, manager via
`request_apply_commit` setting an attr on the agent's flake (most natural
place since the flake already carries per-agent env/identity). Pair with
a **model status** indicator on the agent page (active / queued / last
switched) once the override is in place.
## UI / UX
- **Per-agent UI substance.** Show last N inbox messages, last turn timing,
link back to dashboard.
- **Delivered events history persistence.** The `events::Bus` ring
buffer (500 events, in-memory) backfills the terminal on page load
but dies on harness restart, and only ever holds the most recent
turn or two. Persist to sqlite (`events(agent, id, ts, kind,
payload_json)`) so the operator can scroll back through prior
turns, and so `/events/history` survives restart. Cap rows per
agent or auto-vacuum on age, same trade-off as the bounded broker
entry below.
- **State badge: compacting + napping states.** Idle/thinking already
ship (driven from SSE turn_start/turn_end). Add `compacting 📦` and
`napping 😴` once the `/compact` trigger and `nap` tool exist —
both need a harness signal (an explicit `LiveEvent::StateChange`
variant or piggyback on Note).
- **Terminal: slash commands beyond /help and /clear.** Operator-facing
in-terminal commands still to add: `/model`, `/compact`, `/cancel`.
Each needs harness-side support (model override, force compaction,
cancel current claude turn).
- **Terminal: bigger.** The 32em max-height is cramped on a 1080p+
screen. Grow it (e.g. `min(70vh, 60em)`) so the live tail is the
main visual element of the page rather than a strip.
- **Terminal: sticky-bottom auto-scroll.** Today every appended row
scrolls to bottom, so the view shifts while the operator is reading
scrolled-up. Track whether the user is *already* at the bottom
(within a small threshold), and only auto-scroll when that's true.
Show a small "↓ N new" indicator when not at bottom; click to jump.
- **Terminal: cancel-current-turn button.** Explicit "kill claude
process for this turn" control. Harness needs to track the
in-flight claude child PID and offer a `/cancel` endpoint that sends
SIGTERM; UI surfaces a button while the state badge is `thinking`.
Slash-command equivalent: `/cancel`.
- **`/compact` trigger.** Operator-initiated compaction of the current
claude session — `claude --print --continue` with `/compact` over the
same session id. Surfaces as a slash command in the terminal + a
toolbar button while the state badge is `idle`. Sets state to
`compacting` during the run.
- **xterm.js terminal** embedded per-agent, attached to a PTY exposed by
the harness. Pairs well with the unprivileged-container work — would let
the operator drop into the container without `nixos-container root-login`.
## Telemetry
- **Harness stats per agent in sqlite, charted on the agent page.**
bitburner-agent samples 18 series; for hyperhive the generally-applicable
ones are:
- turns/min, tool calls/turn, turn duration p50/p95
- claude exit code distribution (ok vs `--compact`-retry vs failure)
- inbox depth (current + max-over-window)
- messages sent/received per turn (split by recipient: peer / operator /
manager / system)
- approval queue length (across all agents — dashboard-level)
- per-tool usage counts (Read/Edit/Bash/send/recv/…)
- time-since-last-turn (helps spot stuck agents)
- notes file size growth (cues compaction)
Backend: a `stats` table with `(agent, ts, key, value)` written from
the harness on `TurnEnd`; `GET /api/stats?since=…` returns the
series; agent page renders with a small chart lib (uPlot is light).
## Manager → operator question channel
- **TTL / cancel on `ask_operator`.** Questions today block forever; the
manager turn stays alive until the operator answers. Add a per-question
`ttl_seconds` (or a dashboard "cancel" button that resolves the question
with a sentinel answer) so a long-idle question can time out and let the
manager fall back. Wire the timeout into `OperatorQuestions::wait_answered`
and surface remaining-time on the dashboard.
## Loop substance
- **`nap` tool.** Agent-side MCP tool `mcp__hyperhive__nap(seconds)` that
parks the turn loop for a short while before next-message processing.
Use cases: agent decides it has nothing useful to do, or wants to
throttle itself between rapid wake events. Implementation: harness
records a "wake-not-before" timestamp; `recv_blocking` skips the long
poll until that ts; the state badge reads `napping · MM:SS` during.
Operator can cancel via the same `/cancel` slash command or a
dashboard button.
- **Notes compaction.** `/state/` is bind-mounted persistently and agents
are told (in the system prompt) to keep `/state/notes.md` for durable
knowledge — but we don't currently nudge them to compact when notes
grow. Bitburner-agent's pattern: a short-lived secondary claude session
that takes the existing notes + a "compact this" prompt and rewrites
them in place. Add when the notes start bloating.
## Lifecycle / reliability
- **Container crash events.** Watch `container@*.service` via D-Bus, push
`HelperEvent::ContainerCrash` to the manager's inbox so the manager can
react (restart, escalate, etc.).