diff --git a/docs/approvals.md b/docs/approvals.md index 8bba144..31e4355 100644 --- a/docs/approvals.md +++ b/docs/approvals.md @@ -47,6 +47,15 @@ step — the operator just sees the name. On approve, hive-c0re creates the container in a background task while the dashboard shows a spinner. +`InitConfig` approvals are the first step in a two-step spawn +flow. On approve, hive-c0re seeds the proposed config repo with +a default `agent.nix` template and sends the manager +`HelperEvent::ConfigReady { agent }`. The manager then reviews, +edits, and commits the template before calling `request_spawn` +to proceed to a Spawn approval. This gives the manager (and +operator) an explicit review gate on the initial configuration +before any container is created. + ## Meta flake The hive-c0re-owned repo at `/var/lib/hyperhive/meta/` @@ -341,6 +350,10 @@ regular claude turn so the manager can react. Variants the operator. - `LoggedIn { agent }` — sub-agent just completed login. Manager often greets the agent on this event. +- `ConfigReady { agent }` — a new agent's proposed config repo was + just seeded (post-`InitConfig` approval). The manager can now + edit `/agents//config/agent.nix`, commit the changes, + and submit `request_spawn` to create the container. - `NeedsUpdate { agent }` — sub-agent's recorded flake rev is stale. Manager calls `update(name)` to rebuild — idempotent, no approval required. diff --git a/docs/persistence.md b/docs/persistence.md index 59d3ece..8acd29f 100644 --- a/docs/persistence.md +++ b/docs/persistence.md @@ -9,7 +9,9 @@ Where state lives, what survives what, and how it's bounded. Three tables, all in one file: - `messages` — every inter-agent / operator-bound message. - `sender / recipient / body / sent_at / delivered_at`. + `sender / recipient / body / sent_at / delivered_at / acked_at / + in_reply_to`. `in_reply_to` links a reply to its parent row id; + the dashboard and per-agent inbox render these as threaded rows. - `approvals` — the queue. `agent / kind (apply_commit | spawn) / commit_ref / requested_at / status / resolved_at / note`. - `operator_questions` — `ask` / `answer` queue (despite the @@ -72,6 +74,18 @@ No host-side vacuum yet — tracked as forge issue [#10](http://localhost:3000/hyperhive/hyperhive/issues/10) (target retention ~90 days, age-only sweep like events_vacuum). +### `/state/hyperhive-rate-limited` (per agent) + +Sentinel file written by `Bus::emit_status("rate_limited")` when the +harness detects a 429 / rate-limit response from the Claude API, and +removed when the retry sleep expires (any subsequent status emit +clears it). The file's presence is checked by hive-c0re's +`container_view::is_rate_limited` on each `build_all` sweep (~10s) to +populate `ContainerView.rate_limited` for the dashboard. Survives a +harness restart (the Bus reads it back at boot and restores the flag), +so the badge remains accurate if hive-c0re restarts while the harness +is mid-sleep. + ### `/state/hyperhive-model` (per agent) Single-line text file holding the claude model name currently diff --git a/docs/turn-loop.md b/docs/turn-loop.md index 5c9c411..8ad00e8 100644 --- a/docs/turn-loop.md +++ b/docs/turn-loop.md @@ -20,7 +20,12 @@ Each agent harness (`hive-ag3nt serve` or `hive-m1nd serve`) runs: `LiveEvent::Stream(value)`. Pump stderr as `Note`. 6. Wait for claude to exit. Compaction is two-pronged — *reactive* on `Prompt is too long` and *proactive* on a context watermark - (see [Compaction](#compaction) below). + (see [Compaction](#compaction) below). **Rate-limit detection**: + if stdout contains `429` or `rate_limit` markers, the harness + sets the `rate_limited` sentinel (`Bus::emit_status("rate_limited")`), + sleeps `HIVE_RATE_LIMIT_SLEEP_SECS` (default 300), then retries. + The dashboard and per-agent page show a `⊘ rate limited` badge + while the harness is parked. 7. Emit `LiveEvent::TurnEnd { ok, note }`. Sleep `poll_ms` to avoid tight loops on transient failures. @@ -78,6 +83,20 @@ best-effort — a failed checkpoint turn or `/compact` is surfaced as a `Note` but never fails the turn that already succeeded. The operator can also force a compaction any time via `/api/compact`. +- **Auto session-reset** — a third path that fires when both + conditions hold: context is ≥ a watermark (`HIVE_AUTO_RESET_WATERMARK_TOKENS`, + default `100_000`) AND the time since the last turn exceeds the + assumed prompt-cache TTL (`HIVE_CACHE_TTL_SECS`, default `300`). + Claude's prompt cache lives ~5 minutes; if the cache is already + cold, resuming with `--continue` pays the full re-upload cost of + the current context with no benefit over starting fresh. So: + `drive_turn` injects one `AUTO_RESET_CHECKPOINT_PROMPT` notes turn + ("flush state to files, cache is cold") then arms + `Bus::take_skip_continue()` for the real turn — the next turn runs + without `--continue`, starting a fresh session. Unlike proactive + compaction the session is dropped entirely, not compacted. Set + `HIVE_AUTO_RESET_WATERMARK_TOKENS=0` to disable. + The child runs with `cwd = /state` (when the bind exists; falls back to the parent's cwd in dev), so any relative path in a tool call (`Read foo.md`, `Bash ls`, `Write notes.md`) lands in the @@ -126,9 +145,11 @@ it as a stdio child via `--mcp-config`. The hyperhive socket name is ### Sub-agent tools -- `send(to, body)` — message a peer (logical agent name), another - agent, or the operator (recipient `operator`, surfaces in the - dashboard inbox). +- `send(to, body, in_reply_to?)` — message a peer (logical agent + name), another agent, or the operator (recipient `operator`, + surfaces in the dashboard inbox). Optional `in_reply_to: i64` + links this message to a prior message id for thread rendering + in the dashboard message flow and the per-agent inbox. - `recv(wait_seconds?, max?)` — drain inbox messages. Without `wait_seconds` (or with `0`) returns immediately, a cheap "anything pending?" peek. Positive value parks the turn up @@ -161,7 +182,9 @@ it as a stdio child via `--mcp-config`. The hyperhive socket name is this agent's own inbox at a future time (sender shows as `reminder`). Large payloads spill to `/agents//state/reminders/` with the inbox message a - short pointer. + short pointer. Each agent's pending-reminder count is capped + (default 50, override via `HIVE_REMIND_MAX_PENDING_PER_AGENT`); + scheduling a new one fails if the cap is already hit. - `whoami()` — `{ name, role, pronouns, hyperhive_rev }` for self-identification without scraping the system prompt. @@ -209,8 +232,17 @@ meta's. ### Manager tools (in addition to send/recv) -- `request_spawn(name)` — queue a Spawn approval for a brand-new - sub-agent (≤9 char name). Operator approves on the dashboard. +- `request_init_config(name, description?)` — first step of a + two-step spawn. Queues an `InitConfig` approval (≤9 char name); + on operator approve, hive-c0re seeds the proposed config repo + with a default `agent.nix` template and sends the manager a + `HelperEvent::ConfigReady { agent }`. The manager then edits + `agent.nix`, commits the changes, and calls `request_spawn`. + Fails if a proposed repo for this name already exists. +- `request_spawn(name)` — second step of a two-step spawn. Queues + a Spawn approval; requires the proposed config repo to exist + (from a prior approved `request_init_config`). Operator approves + on the dashboard to create the container. - `kill(name)` — graceful stop. No approval required. - `start(name)` — start a stopped sub-agent. No approval. - `restart(name)` — stop + start. No approval. diff --git a/docs/web-ui.md b/docs/web-ui.md index b1901fd..7e6b815 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -150,10 +150,14 @@ the previous process's socket release resolves itself. Two-line layout (`assets/app.js::renderContainers`): -- Line 1: agent name (link → new tab), m1nd/ag3nt chip, `needs - login` / `needs update` warning badges, in-flight `◐ +- Line 1: agent name (link → new tab), m1nd/ag3nt chip, status + badges — `⊘ rate limited` (red, while the harness is parked + after a 429), `needs login`, `needs update` — in-flight `◐ pending-state…` pill (replaces buttons during start / stop / - restart / rebuild / destroy), container name + port. + restart / rebuild / destroy), container name + port, and a + `ctx · Nk` chip showing the agent's last-turn context size + (from `ContainerView.ctx_tokens`, read from the turn-stats + sqlite on each `build_all` sweep; absent until the first turn). - Line 2: action buttons — `↻ R3BU1LD` always, `DESTR0Y` + `PURG3` on sub-agents, `↺ R3ST4RT` + (sub-agents) `■ ST0P` when running, `▶ ST4RT` when stopped. Buttons dim + disable while a transient @@ -296,9 +300,12 @@ Wire vocabulary on `/dashboard/stream` (kind tag is in the JSON payload): - `sent` / `delivered` — broker traffic, mirrored from the - intra-process channel by a forwarder task. Used by the - message-flow terminal renderer and the operator-inbox - derived state. + intra-process channel by a forwarder task. Both carry `id: i64` + (the broker row id) and `in_reply_to: Option` for thread + rendering. The dashboard message-flow terminal renders reply + rows with a `↳ reply` tag that scroll-highlights the parent + row on click. Used by the message-flow terminal renderer and + the operator-inbox derived state. - `approval_added` (id, agent, approval_kind, sha_short, diff, description) / `approval_resolved` (id, agent, approval_kind, sha_short, status, resolved_at, note, description) — pending @@ -355,8 +362,10 @@ Layout, top to bottom: badge + last-turn timing + cancel-turn button + new-session button. Every chip carries a `title=...` tooltip with the detailed breakdown. - - Alive badge: `● alive` (green) / `◌ needs login` (amber) / - `◌ logging in` / `○ offline` / `… connecting`. Driven by + - Alive badge: `● alive` (green) / `⊘ rate limited` (red, while + the harness is parked after a 429 — clears automatically when + the sleep expires) / `◌ needs login` (amber) / `◌ logging in` / + `○ offline` / `… connecting`. Driven by `LiveEvent::StatusChanged`; replaces the old "harness alive — turn loop running" paragraph so the state row carries every reachability signal. @@ -401,7 +410,9 @@ Layout, top to bottom: updates from SSE; no periodic refresh timer runs. - Inbox `
` block (collapsed): `inbox · N` — last 30 messages addressed to this agent, fetched via - `AgentRequest::Recent { limit: 30 }`. (Separate from + `AgentRequest::Recent { limit: 30 }`. Reply messages (those + with a non-null `in_reply_to`) are indented and prefixed with + `↳ reply ·` in amber. (Separate from `AgentRequest::Recv { wait_seconds, max }` which the harness uses internally to long-poll the broker.) - Loose-ends `
` block: `loose ends · N` — questions, @@ -506,12 +517,22 @@ shaped). - `POST /api/new-session` — arm a one-shot for the next turn to drop `--continue`. Emits a `LiveEvent::Note`. - `GET /events/history` — replay buffer for the terminal. +- `GET /screen` — VNC viewer page (minimal RFB-over-WebSocket + renderer). Only accessible when `hyperhive.gui.enable = true` + in the agent's `agent.nix`; the harness shows a 🖥 screen link + in the state row when `gui_vnc_port` is present. +- `GET /screen/ws` — raw RFB byte relay: proxies WebSocket + frames to the weston VNC server at `127.0.0.1:`. + Transparent to any RFB variant. VNC port comes from + `/etc/hyperhive/gui.json` (written by the weston startup + script in `weston-vnc.nix`). Bus events (new vocabulary on `/events/stream`): -- `status_changed { status }` — `online` / +- `status_changed { status }` — `online` / `rate_limited` / `needs_login_idle` / `needs_login_in_progress`. Drives the - alive-badge. + alive-badge. `rate_limited` is set when the harness detects a + 429 response and cleared when the retry sleep expires. - `model_changed { model }` — drives the model chip. - `token_usage_changed { ctx: TokenUsage, cost: TokenUsage }` — drives the ctx + cost badges. Emitted from