docs: full sync ahead of compaction + config-management overhaul

readme: manager mcp surface picks up update; operator-surface
recap mentions /model + last-turn + model chip + the three
collapsibles (inbox / journald / agent.nix).

web-ui.md: details-restore-key story under shape; port-conflict
banner mention on containers; agent.nix viewer alongside journald;
notifications use per-event tags + console.debug log on
block/show; deny endpoint takes note=<reason>; data-prompt /
data-prompt-field generalisation noted.

conventions.md: data-prompt and snapshot/restoreOpenDetails added
to the async-forms section.

persistence.md: operator_questions row picks up deadline_at (ttl)
column with a migration note.

todo.md: new 'Bugs' section captures the manager-question
not-rendering issue with three suspect paths to chase.

claude.md scratchpad rewritten as a clean handoff for the
compaction + the upcoming config-git overhaul. flags the
two-repo (proposed/ + applied/) split as the thing to
reconsider.
This commit is contained in:
müde 2026-05-15 22:12:40 +02:00
parent 6a2ffd521b
commit 75e7faff0c
6 changed files with 120 additions and 35 deletions

View file

@ -114,17 +114,31 @@ read them à la carte.
In-flight or recent context that hasn't earned a section yet. In-flight or recent context that hasn't earned a section yet.
Prune freely. Prune freely.
- 2026-05-15 ish: tombstones, multi-select ask_operator, broker + - **Imminent:** overhaul the git management of agent configs.
events vacuum, docs split into `docs/`, lifecycle_action helper, Current shape: per-agent `proposed/` repo the manager edits
api_state split. + `applied/` repo hive-c0re owns, with `request_apply_commit`
- Then: inline +/- diffs on Write/Edit, operator cancel + ttl on shuttling commits between them. Pre-compact note: keep an eye
questions, dashboard back-link, per-agent inbox view, bind-retry on whether the two-repo split is still the right shape, or if
+ SO_REUSEADDR, journald viewer, server-side TurnState, a single repo with `proposed/` and `applied/` branches (or a
recv(wait_seconds) max 180s, runtime /model switch, crash shared bare repo per agent with refs/proposed and refs/applied)
watcher, model persistence, stopped auto-allowing claude-code would simplify the diff / approve / apply path.
unfree (operator must opt in), pure-hash agent_web_port (port - **Recent (since last compaction):** inline +/- diffs on
files reverted), browser notifications, focus-preserving Write/Edit, send full body via collapsed details, operator
refresh. cancel + ttl on questions, deny-with-reason, dashboard
- Open threads: telemetry/charts, custom per-agent MCP tools (the back-link + last-turn timing + model chip, per-agent inbox
groundwork for moving bitburner-agent into hyperhive), view, bind-retry + SO_REUSEADDR, journald viewer,
two-step spawn, unprivileged containers, Bash allow-list. agent.nix viewer, server-side TurnState, recv(wait_seconds)
max 180s, runtime /model switch + persistence to /state,
crash watcher + ContainerCrash / NeedsLogin / LoggedIn /
NeedsUpdate events, manager `update` tool, pure-hash
agent_web_port + collision banner + spawn/rebuild preflight,
browser notifications, focus-preserving refresh, generalised
<details data-restore-key> survival, prompt-on-submit pattern.
- **Open threads:** custom per-agent MCP tools (groundwork for
moving bitburner-agent into hyperhive), two-step spawn,
per-agent send allow-list, telemetry/charts, notes
compaction, unprivileged containers, Bash allow-list,
xterm.js. **Known bug** (in TODO.md): question id=5 was
queued but didn't render — likely a `pending()` row-decode
error swallowed by `unwrap_or_default`; investigate by curl
/api/state | jq '.questions' + browser console.

View file

@ -30,8 +30,8 @@ host (NixOS, runs hive-c0re.service)
├── hm1nd hive-m1nd serve : claude turn loop + ├── hm1nd hive-m1nd serve : claude turn loop +
│ MCP (send / recv / request_spawn / kill / start / │ MCP (send / recv / request_spawn / kill / start /
│ restart / request_apply_commit / ask_operator) │ restart / update / request_apply_commit /
│ + web UI on :8000 ask_operator) + web UI on :8000
└── h-<name> hive-ag3nt serve : claude turn loop + └── h-<name> hive-ag3nt serve : claude turn loop +
MCP (send / recv) + web UI on a hashed :8100-8999 MCP (send / recv) + web UI on a hashed :8100-8999
@ -44,10 +44,13 @@ streams JSON events into the per-agent SSE bus + a sqlite history db →
claude drives any further `recv`/`send` itself via the embedded MCP server. claude drives any further `recv`/`send` itself via the embedded MCP server.
Operator surface per agent: terminal-themed live tail with a textarea Operator surface per agent: terminal-themed live tail with a textarea
prompt; slash commands `/help` `/clear` `/cancel` `/compact`; granular prompt; slash commands `/help` `/clear` `/cancel` `/compact`
state badge (idle / thinking / offline) with age timer; cancel-turn `/model <name>`; granular state badge (idle / thinking /
button while thinking; sticky-bottom auto-scroll with "↓ N new" pill; compacting / offline) with age timer + last-turn duration chip +
event history backfilled on page load. model chip; cancel-turn button while thinking; sticky-bottom
auto-scroll with "↓ N new" pill; event history backfilled on page
load; collapsible inbox + collapsible journald viewer + collapsible
`agent.nix` viewer per agent on the dashboard.
Config changes flow the other way: manager edits `/agents/<name>/config/agent.nix` Config changes flow the other way: manager edits `/agents/<name>/config/agent.nix`
(bind-mounted from the host's proposed repo) → commits → submits the sha as (bind-mounted from the host's proposed repo) → commits → submits the sha as

20
TODO.md
View file

@ -42,6 +42,26 @@ Pick anything from here when relevant. Cross-cutting design notes live in
derived from the same config so the operator stays in control of derived from the same config so the operator stays in control of
what's exposed. what's exposed.
## Bugs
- **Pending question doesn't always appear on the dashboard.**
Repro: manager calls `ask_operator`, tool result is
`question queued (id=N)` (so the row is in sqlite), but the
M1ND H4S QU3STI0NS section keeps showing "no pending
questions". Last seen with id=5. Suspected paths:
- `OperatorQuestions::pending()` returns Err and the
`unwrap_or_default()` in `api_state` hides it. Surface the
error (warn-log) and check.
- serialization: a new field in `OpQuestion` (e.g.
`deadline_at: Option<i64>`) deserializes wrong against an
old row whose columns don't match the new SELECT order →
`row.get(N)?` panics for that row, the whole iterator
errors, `pending()` returns Err. Diagnose by curl
`/api/state | jq '.questions'` and compare with sqlite
counts.
- dashboard JS swallows a render error. Open browser console
and look for exceptions during `renderQuestions`.
## UI / UX ## UI / UX
- **xterm.js terminal** embedded per-agent, attached to a PTY exposed by - **xterm.js terminal** embedded per-agent, attached to a PTY exposed by

View file

@ -34,8 +34,15 @@ Dashboard + per-agent mutating forms carry `data-async`; a delegated
`submit` listener in `assets/app.js` intercepts, shows a spinner, `submit` listener in `assets/app.js` intercepts, shows a spinner,
POSTs `application/x-www-form-urlencoded` (axum's `Form` extractor POSTs `application/x-www-form-urlencoded` (axum's `Form` extractor
rejects multipart), calls `refreshState()` on success. New mutating rejects multipart), calls `refreshState()` on success. New mutating
forms should add `data-async` and optionally `data-confirm` for a forms should add `data-async` and optionally `data-confirm` (for a
JS-side confirmation prompt. JS-side `confirm()` prompt) or `data-prompt="…"` (for a
`window.prompt()` whose answer goes into a hidden input named by
`data-prompt-field`, default `note`).
`refreshState` defers automatically when `document.activeElement`
sits inside a managed section so the operator's typing isn't lost;
collapsible `<details data-restore-key=…>` survive the re-render
via `snapshotOpenDetails` / `restoreOpenDetails`.
## `rebuild` is the reconcile verb ## `rebuild` is the reconcile verb

View file

@ -14,7 +14,8 @@ Three tables, all in one file:
commit_ref / requested_at / status / resolved_at / note`. commit_ref / requested_at / status / resolved_at / note`.
- `operator_questions``ask_operator` queue. - `operator_questions``ask_operator` queue.
`asker / question / options_json / multi / asked_at / `asker / question / options_json / multi / asked_at /
answered_at / answer`. deadline_at (ttl) / answered_at / answer`. Migrated via
`ALTER TABLE ADD COLUMN` against `pragma_table_info`.
Retention: Retention:

View file

@ -30,6 +30,16 @@ and, if so, skips the refresh (defers 2s). The operator never has
the form yanked out from under them mid-type; the update lands as the form yanked out from under them mid-type; the update lands as
soon as they blur. soon as they blur.
**`<details>` open-state preservation:** any collapsible element
tagged with `data-restore-key="<stable-key>"` survives the
refresh. `snapshotOpenDetails()` walks managed sections before
render, `restoreOpenDetails()` re-applies after. Used today for
the journald viewer (`journal:<container>`), the agent-config
viewer (`agent-config:<name>`), and approval diff blocks
(`approval-diff:<id>`). Setting `.open = true` programmatically
also fires the `toggle` event, so any lazy-fetch wired to it
re-runs cleanly on restore.
Both bind their listeners with `SO_REUSEADDR` via Both bind their listeners with `SO_REUSEADDR` via
`tokio::net::TcpSocket` plus a retry loop on `AddrInUse` (12 tries, `tokio::net::TcpSocket` plus a retry loop on `AddrInUse` (12 tries,
exponential backoff capped at 2s) so an nspawn restart that races exponential backoff capped at 2s) so an nspawn restart that races
@ -42,6 +52,11 @@ the previous process's socket release resolves itself.
inline "unsupported / blocked" message when applicable. Sits inline "unsupported / blocked" message when applicable. Sits
under the banner. under the banner.
2. **C0NTAINERS** — live containers with their action surface. 2. **C0NTAINERS** — live containers with their action surface.
Pulsing red banner at the top of this section if any two
sub-agents hash to the same port (`port_conflicts` from
`/api/state`): the operator must rename one of them and
rebuild. `lifecycle::{spawn,rebuild}` also preflight this and
refuse with a clear error message naming the conflicting agent.
3. **K3PT ST4T3** — destroyed-but-state-kept tombstones (size + 3. **K3PT ST4T3** — destroyed-but-state-kept tombstones (size +
age + claude-creds badge). Two actions: `⊕ R3V1V3` (queues a age + claude-creds badge). Two actions: `⊕ R3V1V3` (queues a
Spawn approval; existing state is reused), `PURG3` (wipes Spawn approval; existing state is reused), `PURG3` (wipes
@ -73,12 +88,16 @@ Two-line layout (`assets/app.js::renderContainers`):
on sub-agents, `↺ R3ST4RT` + (sub-agents) `■ ST0P` when running, on sub-agents, `↺ R3ST4RT` + (sub-agents) `■ ST0P` when running,
`▶ ST4RT` when stopped. Buttons dim + disable while a transient `▶ ST4RT` when stopped. Buttons dim + disable while a transient
lifecycle action is in flight. lifecycle action is in flight.
- Plus a collapsible `↳ logs · <container>` `<details>` block. - Plus two collapsible `<details>` blocks:
Expanding lazy-fetches journald output via `GET - `↳ logs · <container>` — lazy-fetches journald output via
/api/journal/{name}?unit=...&lines=...` (`journalctl -M `GET /api/journal/{name}?unit=...&lines=...` (`journalctl -M
<container> -b --no-pager --output=short-iso`). A unit dropdown <container> -b --no-pager --output=short-iso`). A unit
switches between the harness service (default) and the full dropdown switches between the harness service (default) and
machine journal; refresh button re-fetches. the full machine journal; refresh button re-fetches.
- `↳ agent.nix · <name>` — lazy-fetches the applied config
file via `GET /api/agent-config/{name}` (read-only mirror of
`/var/lib/hyperhive/applied/<name>/agent.nix`). Mutating
this still requires `request_apply_commit` + approval.
`↻ UPD4TE 4LL` button appears above the containers list when any `↻ UPD4TE 4LL` button appears above the containers list when any
agent is stale. Banner pulses on each broker SSE event agent is stale. Banner pulses on each broker SSE event
@ -94,15 +113,27 @@ Pure frontend (`Notification` API). Three signals trigger them:
First `/api/state` after page load seeds "seen" sets without First `/api/state` after page load seeds "seen" sets without
firing — only items that arrive while the page is open count. firing — only items that arrive while the page is open count.
`tag: "hyperhive"` collapses bursts; click focuses the dashboard Per-event tags (`hyperhive:approval:<id>`, `hyperhive:question:<id>`,
tab. localStorage-backed mute toggle silences without revoking `hyperhive:msg:<at>:<rand>`) so distinct events stack in the OS
the OS permission. Requires a secure context (HTTPS or notification center instead of overwriting each other.
localhost); on other origins the controls hide themselves. `console.debug` logs at every block point (unsupported,
permission ungranted, muted) for in-browser debugging. Click
focuses the dashboard tab. localStorage-backed mute toggle
silences without revoking the OS permission. Requires a secure
context (HTTPS or localhost); on other origins the controls hide
themselves. Browsers typically suppress notifications while the
originating tab is focused — that's a browser-level decision,
not ours.
### Dashboard endpoints ### Dashboard endpoints
- `POST /{approve,deny}/{id}` — approve/deny a pending approval. - `POST /approve/{id}` — approve a pending approval.
- `POST /deny/{id}` (`note=<reason>`, optional) — deny a pending
approval with an optional operator-supplied reason. The reason
travels to the manager as `HelperEvent::ApprovalResolved.note`.
Dashboard prompts via `window.prompt()` on click.
- `POST /{rebuild,kill,restart,start,destroy}/{name}` — lifecycle. - `POST /{rebuild,kill,restart,start,destroy}/{name}` — lifecycle.
`destroy` accepts `purge=on` to also wipe state dirs.
- `POST /purge-tombstone/{name}` — wipe a tombstone's state dirs. - `POST /purge-tombstone/{name}` — wipe a tombstone's state dirs.
- `POST /answer-question/{id}` — answer a pending operator question. - `POST /answer-question/{id}` — answer a pending operator question.
- `POST /cancel-question/{id}` — cancel a pending question with - `POST /cancel-question/{id}` — cancel a pending question with
@ -111,6 +142,13 @@ localhost); on other origins the controls hide themselves.
- `POST /update-all` — rebuild every stale container. - `POST /update-all` — rebuild every stale container.
- `GET /api/journal/{name}?unit=&lines=` — journalctl viewer for - `GET /api/journal/{name}?unit=&lines=` — journalctl viewer for
a managed container. a managed container.
- `GET /api/agent-config/{name}` — read-only view of the applied
`agent.nix`.
Generalised form helpers: `form[data-confirm="…"]` pops
`confirm()` before submit; `form[data-prompt="…"]` pops
`prompt()` and stashes the answer in a hidden input named by
`data-prompt-field` (default `note`).
## Per-agent page ## Per-agent page
@ -134,7 +172,9 @@ Layout, top to bottom:
POSTs `/api/cancel`. POSTs `/api/cancel`.
- Inbox `<details>` block (collapsed): `inbox · N` — last 30 - Inbox `<details>` block (collapsed): `inbox · N` — last 30
messages addressed to this agent, fetched via messages addressed to this agent, fetched via
`AgentRequest::Recent { limit: 30 }`. `AgentRequest::Recent { limit: 30 }`. (Separate from
`AgentRequest::Recv { wait_seconds }` which the harness uses
internally to long-poll the broker.)
- Terminal-wrap: live event tail (sticky-bottom auto-scroll + - Terminal-wrap: live event tail (sticky-bottom auto-scroll +
`↓ N new` pill when not at bottom) followed by an `↓ N new` pill when not at bottom) followed by an
operator-input textarea acting as a prompt. operator-input textarea acting as a prompt.