hyperhive/TODO.md
müde 8f5752980f turn_stats: per-turn analytics sink
new sqlite table at /state/hyperhive-turn-stats.sqlite on each
agent's state dir. one row per claude turn captures identity
(model, wake_from, result_kind), timing (started/ended_at,
duration_ms), cost (input/output/cache_read/cache_creation token
counts), behaviour (tool_call_count + per-tool breakdown JSON),
and post-turn snapshot metrics (open_threads_count,
open_reminders_count).

wire additions:
- AgentRequest/ManagerRequest::CountPendingReminders +
  Broker::count_pending_reminders_for(agent)
- Bus::observe_stream + take_tool_calls — pumps the existing
  stdout stream-json, picks out tool_use blocks, accumulates per
  turn. bin loops fold the breakdown into each row.
- TurnStats::open_default + TurnStatRow + record() — best-effort
  inserts; failures log + don't block the harness.

both ag3nt and m1nd bins capture started_at + duration via
Instant::elapsed, fetch open-thread + reminder counts from
hive-c0re via the existing socket (post-turn, best-effort), and
record one row at turn_end. record_kind splits ok / failed /
prompt_too_long; failures carry the error message in note.

todo entries for host-side vacuum sweep + reading the table back
into agent/dashboard badges.
2026-05-17 23:00:41 +02:00

84 lines
9.2 KiB
Markdown

# Hyperhive TODOs
## Architecture / Features
- Shared space for all agents to access documents/files without manager routing
- Private git forge agents can push to and create new repos in
- Move bind mounts in agents to `/agents/<name>/state` so path for agent = path for manager
- **Broadcast messaging**: allow sending messages with recipient "*" to all agents; deliver with hint "this was a broadcast and may not need any action from you"
- **Multi-agent restart coordination**: when rebuilding all agents, manager should start first so it can coordinate post-restart confusion (notify agents, suppress unnecessary retries, etc)
- **Shared docs/skills repo (RO)**: a single repo on the hive forge that every agent has read-only access to — common references, prompts, runbooks, "skills" the operator wants every agent to inherit without baking into the system prompt or `/shared`. Implementation likely: seed an `org-shared/docs` repo on first hive-forge boot, grant every per-agent user a read membership in the org. Agents `git clone` it (or use the API) to read; only the manager + operator can push.
- ~~**Loose-ends tracker + `get_open_threads` tool**~~ ✓ landed — new `mcp__hyperhive__get_open_threads` MCP tool on both agent + manager surfaces. Wire types in `hive-sh4re`: `AgentRequest::GetOpenThreads` / `ManagerRequest::GetOpenThreads``OpenThreads { threads: Vec<OpenThread> }`. `OpenThread` is a tagged enum with `Approval { id, agent, commit_ref, description, age_seconds }` and `Question { id, asker, target, question, age_seconds }`. Shared aggregator at `hive-c0re/src/open_threads.rs`: `for_agent(coord, name)` (sub-agent surface; filters questions by asker == self OR target == self, approvals only for manager) and `hive_wide(coord)` (manager surface; everything pending in the swarm). No caching — fresh sqlite sweep per call. **Per-agent web UI rendering** is a follow-up below.
- **Follow-up: surface open-threads on the per-agent web UI** so the operator can see at a glance what each agent has hanging open — same data source as the MCP tool, just rendered into the existing per-agent dashboard page (next to inbox view / model chip / etc).
## Reminder Tool
- Per-agent reminder limits (burst capacity, rate limiting)
- **Scheduler shutdown**: add graceful shutdown signal when coordinator is destroyed (currently runs forever)
- **DB lock contention**: under high reminder volume, the broker's `Mutex<Connection>` serializes every delivery transaction. Consider batching multiple deliveries into one tx, or moving reminders onto a separate sqlite connection.
## Dashboard
- **Reminder delivery-error surface**: `reminder_scheduler::tick` logs failed deliveries but doesn't persist. Add `last_error TEXT, attempt_count INTEGER` columns + a banner on the dashboard row + a "retry" affordance. Needs a sqlite migration (idempotent ALTER TABLE).
- **Per-agent reminder status / query interface**: surface pending vs. delivered counts per agent (manager + each sub-agent) as a small chip on the container row.
- **Tombstones + meta_inputs events**: not yet event-derived. PURG3 + meta-update still trigger a post-submit `/api/state` refetch on the dashboard. Add `TombstoneAdded`/`TombstoneRemoved` + `MetaInputsChanged` so those forms can drop their refetch too and the cold-load is the only `/api/state` fetch in normal operation.
## Security
- **Privsep the dashboard from the privileged daemon**: hive-c0re runs as root (it has to — `nixos-container` create / start / destroy, the meta git repo, every per-agent bind mount). The HTTP server lives in the same process, so every read-endpoint (`/api/state-file`, `/api/journal/{name}`, `/api/agent-config/{name}`) is one allow-list bug away from serving arbitrary host files. Split the architecture: keep the privileged daemon doing lifecycle + git + ipc, run the web UI as an unprivileged user that talks to the daemon over a unix socket with a narrow request surface (`ReadAgentStateFile { agent, rel_path }` etc.). The unprivileged process can't read `/etc/shadow` even if every check in `get_state_file` is bypassed — it doesn't have the bits. Container-lifecycle POSTs (`/restart`, `/destroy`, etc.) become forwarded RPCs the privileged side authorises on its terms.
- **Defense in depth on `get_state_file`**: until privsep lands, the allow-list is load-bearing. Worth adding: refuse files whose mode is not world-readable (so an agent writing a 0600 file inside `state/` can't have its contents proxied through the endpoint to a different operator), and refuse symlinks at any path component (`O_NOFOLLOW`-style — `canonicalize` resolves them, but we currently don't reject if the original path had symlinks).
## Harness Ergonomics (agent-side wishlist)
Filed by damocles, who actually lives in this thing. Loosely ranked by
how often the friction bites in normal use.
- ~~**Auto-attach oversize message bodies**~~ — superseded by simply
raising the inline cap from 1 KiB → 4 KiB (covers ~95% of
conversational overflow). Anything genuinely larger still needs a
state file. Blob-in-broker-sqlite was prototyped on paper
(`/agents/damocles/state/oversize-msg-proposal.md`) but rejected as
future vacuum/sync pain not worth carrying for the long-tail 5% of
cases that legitimately belong in a file.
- **Inbox batching hint in the wake prompt** — when the harness pops a
message and there are N more waiting, the wake prompt should say so
(e.g. `"(+3 more queued; consider draining before acting)"`) so claude
knows to call `recv()` again in the same turn instead of doing the
expensive Read/Edit dance once per message over N turns. The data's
already in the broker (`Broker::pending_count(agent)`); just thread it
into the prompt builder in `hive-ag3nt::turn.rs`. Even better: add a
one-shot `recv_batch(max: u32)` MCP tool that returns up to `max`
pending messages in a single round-trip.
- **Self-management of own asks + reminders** — once I fire `ask` or
`remind` I have no way to inspect or cancel them from the agent side.
Operator can cancel asks via dashboard; nothing for reminders at all
(TODO above). Want `list_my_asks() -> [{id, target, question, asked_at}]`
and `cancel_ask(id)` on the agent surface, plus `list_my_reminders()`
/ `cancel_reminder(id)`. Bounded by `asker == self` and `reminder.owner
== self` so no cross-agent meddling.
- **`whoami` introspection tool** — agents currently rely on the system
prompt remembering their name + role. After a rename or model swap
there's no trustworthy source-of-truth from inside the harness.
Cheap: a `whoami() -> { name, role: "agent" | "manager", model, port,
hyperhive_rev, started_at }` tool reading from the harness's own env
+ `TurnState`. Useful for self-documenting state files ("this dropped
by damocles@gpt-5-codex on rev abc1234") and for the future
`get_open_threads` to know whose threads to query without
trusting prompt-substituted strings.
- **Optional `in_reply_to: <msg_id>` on send** — pure wire addition; no
behavioural change. The dashboard could render conversation threads
(already wants this for the agent-to-agent question UI in the
Dashboard section). Today every reply is a fresh root in the message
flow which obscures cause-and-effect when two agents are mid-debate.
Field is optional, ignored if the referenced id is unknown / cross-
agent / out of retention.
## Telemetry
- **Per-turn stats: host-side vacuum sweep**: the sink writes to `/state/hyperhive-turn-stats.sqlite` on each agent's state dir; needs a periodic retention sweep mirroring `events_vacuum.rs` so the table doesn't grow forever. Default keep-window: 90 days (turn-stats are denser than events but smaller per-row, ~200B each).
- **Surface per-turn stats on the agent web UI**: badges sourced from the new sink — `open_threads` count chip, `open_reminders` count chip, "N turns today" chip, rolling tool-call histogram tooltip on the model chip. Both `open_threads` and `open_reminders` are already columns on every row; the badge just reads the latest. The richer histograms read across rows.
- **Stats UI on the main dashboard**: per-agent rollups (avg turn duration, tokens-since-boot, top 5 tools) on the container row. Same data source, host-side aggregation query.
## Bugs
- **Post-rebuild system-message missed wake**: at 09:13:14 the dashboard showed `system → damocles container rebuilt` as ✓ delivered, but the agent harness never ran a turn for it (no claude invocation, no operator-visible activity). A subsequent `recv()` from inside the agent returned `(empty)`, confirming the message was popped + marked delivered server-side — yet drove no turn. Most likely cause: the agent_server `serve_agent_stdio` task is up and answering MCP/socket calls, but the `hive-ag3nt::serve` long-poll loop that drives `drive_turn` either died silently during rebuild or never restarted. Investigate: (a) does hive-ag3nt's serve loop survive `nixos-container update` cleanly, or does its tokio runtime get torn down mid-loop? (b) is there an early-exit path on a transient socket error during rebuild that drops the serve task without notifying the manager? (c) compare timeline with manager's own post-rebuild wake to see if this is rebuilt-agents-only or universal. Could be related to the `recv_blocking` fix in `e423d57` if the rebuild restarts the broker mid-subscribe.