hyperhive/TODO.md
müde b444dac6e8 agent ui: consolidate status into state-row badges
drop the "● harness alive — turn loop running" paragraph; the
new #alive-badge chip in the state row carries the same signal
across all statuses (loading / online / needs-login / offline)
with colour coding. token-usage chip renamed + restyled as
#ctx-badge — primary number is total context-window tokens
used, mirroring claude code's "N tokens" indicator.

every state-row badge now has hover detail: state-badge gets
per-state tooltips + age suffix, model-chip explains the
/model command, last-turn shows the raw ms duration, ctx-badge
breaks out input / cache_read / cache_write / output.

new todo entry for the per-turn stats sink (start/end/model/
tokens/tool-call-count) the harness should be writing.
2026-05-17 22:36:02 +02:00

11 KiB
Raw Blame History

Hyperhive TODOs

Architecture / Features

  • Shared space for all agents to access documents/files without manager routing
  • Private git forge agents can push to and create new repos in
  • Move bind mounts in agents to /agents/<name>/state so path for agent = path for manager
  • Broadcast messaging: allow sending messages with recipient "*" to all agents; deliver with hint "this was a broadcast and may not need any action from you"
  • Multi-agent restart coordination: when rebuilding all agents, manager should start first so it can coordinate post-restart confusion (notify agents, suppress unnecessary retries, etc)
  • Shared docs/skills repo (RO): a single repo on the hive forge that every agent has read-only access to — common references, prompts, runbooks, "skills" the operator wants every agent to inherit without baking into the system prompt or /shared. Implementation likely: seed an org-shared/docs repo on first hive-forge boot, grant every per-agent user a read membership in the org. Agents git clone it (or use the API) to read; only the manager + operator can push.
  • Loose-ends tracker + get_open_threads tool: hive-c0re already knows about pending approvals + unanswered questions; soon will also know about open PRs on hive-forge. Aggregate these into a per-agent "open threads" view (e.g. [{kind: "approval", id: 7, summary: "spawn alice"}, {kind: "question", id: 12, asker: "alice", summary: "deploy now?"}]). New MCP tool mcp__hyperhive__get_open_threads returns the list so an agent can see what's still pending against it without rebuilding context from inbox history. Manager's version includes hive-wide threads. Also surface this list on the per-agent web UI so the operator can see at a glance what each agent has hanging open — same data source as the MCP tool, just rendered into the existing per-agent dashboard page (next to inbox view / model chip / etc).
    • Scope per agent X (confirmed with operator): include BOTH (a) unanswered questions where asker == X (X is waiting on someone) AND (b) unanswered questions where target == X (X owes an answer). Distinguish via a role: "asker" | "target" field on the question variant so the agent can render "waiting on" vs "owe a reply" appropriately. Approvals: include rows where the submitter is X (waiting on the operator). Forge PRs (future): include open PRs where X is author OR reviewer.
    • Wire shape sketch: new AgentRequest::GetOpenThreads / ManagerRequest::GetOpenThreads returning Response::OpenThreads { threads: Vec<OpenThread> } with OpenThread as a tagged enum ({kind: "approval", id, summary, age_seconds} / {kind: "question", id, role, counterparty, summary, age_seconds} / future {kind: "pr", ...}). Manager flavour returns hive-wide threads (no asker/target filter). MCP tool get_open_threads takes no args.
    • Aggregator location: new helper on Coordinator (or a dedicated open_threads.rs) so both surfaces share the query logic; queries approvals + operator_questions tables with a single per-call sweep (no caching — call frequency is low).

Reminder Tool

  • Per-agent reminder limits (burst capacity, rate limiting)
  • Scheduler shutdown: add graceful shutdown signal when coordinator is destroyed (currently runs forever)
  • DB lock contention: under high reminder volume, the broker's Mutex<Connection> serializes every delivery transaction. Consider batching multiple deliveries into one tx, or moving reminders onto a separate sqlite connection.

Dashboard

  • Reminder delivery-error surface: reminder_scheduler::tick logs failed deliveries but doesn't persist. Add last_error TEXT, attempt_count INTEGER columns + a banner on the dashboard row + a "retry" affordance. Needs a sqlite migration (idempotent ALTER TABLE).
  • Per-agent reminder status / query interface: surface pending vs. delivered counts per agent (manager + each sub-agent) as a small chip on the container row.
  • Phase 6 follow-ups — dashboard side is fully event-driven (Phase 6 leftovers landed); the per-agent web UI's lifecycle endpoints (/api/{cancel,compact,model,new-session}, /login/*) still 303-redirect-and-poll. Convert them to 200 + data-no-refresh so the per-agent page stops refetching /api/state on every operator click — LiveEvent::Note already covers cancel/compact/model/new-session, login state needs its own NeedsLogin / LoggedIn events on the per-agent bus.
  • Tombstones + meta_inputs events: not yet event-derived. PURG3 + meta-update still trigger a post-submit /api/state refetch on the dashboard. Add TombstoneAdded/TombstoneRemoved + MetaInputsChanged so those forms can drop their refetch too and the cold-load is the only /api/state fetch in normal operation.

Security

  • Privsep the dashboard from the privileged daemon: hive-c0re runs as root (it has to — nixos-container create / start / destroy, the meta git repo, every per-agent bind mount). The HTTP server lives in the same process, so every read-endpoint (/api/state-file, /api/journal/{name}, /api/agent-config/{name}) is one allow-list bug away from serving arbitrary host files. Split the architecture: keep the privileged daemon doing lifecycle + git + ipc, run the web UI as an unprivileged user that talks to the daemon over a unix socket with a narrow request surface (ReadAgentStateFile { agent, rel_path } etc.). The unprivileged process can't read /etc/shadow even if every check in get_state_file is bypassed — it doesn't have the bits. Container-lifecycle POSTs (/restart, /destroy, etc.) become forwarded RPCs the privileged side authorises on its terms.
  • Defense in depth on get_state_file: until privsep lands, the allow-list is load-bearing. Worth adding: refuse files whose mode is not world-readable (so an agent writing a 0600 file inside state/ can't have its contents proxied through the endpoint to a different operator), and refuse symlinks at any path component (O_NOFOLLOW-style — canonicalize resolves them, but we currently don't reject if the original path had symlinks).

Harness Ergonomics (agent-side wishlist)

Filed by damocles, who actually lives in this thing. Loosely ranked by how often the friction bites in normal use.

  • Auto-attach oversize message bodies — superseded by simply raising the inline cap from 1 KiB → 4 KiB (covers ~95% of conversational overflow). Anything genuinely larger still needs a state file. Blob-in-broker-sqlite was prototyped on paper (/agents/damocles/state/oversize-msg-proposal.md) but rejected as future vacuum/sync pain not worth carrying for the long-tail 5% of cases that legitimately belong in a file.
  • Inbox batching hint in the wake prompt — when the harness pops a message and there are N more waiting, the wake prompt should say so (e.g. "(+3 more queued; consider draining before acting)") so claude knows to call recv() again in the same turn instead of doing the expensive Read/Edit dance once per message over N turns. The data's already in the broker (Broker::pending_count(agent)); just thread it into the prompt builder in hive-ag3nt::turn.rs. Even better: add a one-shot recv_batch(max: u32) MCP tool that returns up to max pending messages in a single round-trip.
  • Self-management of own asks + reminders — once I fire ask or remind I have no way to inspect or cancel them from the agent side. Operator can cancel asks via dashboard; nothing for reminders at all (TODO above). Want list_my_asks() -> [{id, target, question, asked_at}] and cancel_ask(id) on the agent surface, plus list_my_reminders() / cancel_reminder(id). Bounded by asker == self and reminder.owner == self so no cross-agent meddling.
  • whoami introspection tool — agents currently rely on the system prompt remembering their name + role. After a rename or model swap there's no trustworthy source-of-truth from inside the harness. Cheap: a whoami() -> { name, role: "agent" | "manager", model, port, hyperhive_rev, started_at } tool reading from the harness's own env
    • TurnState. Useful for self-documenting state files ("this dropped by damocles@gpt-5-codex on rev abc1234") and for the future get_open_threads to know whose threads to query without trusting prompt-substituted strings.
  • Optional in_reply_to: <msg_id> on send — pure wire addition; no behavioural change. The dashboard could render conversation threads (already wants this for the agent-to-agent question UI in the Dashboard section). Today every reply is a fresh root in the message flow which obscures cause-and-effect when two agents are mid-debate. Field is optional, ignored if the referenced id is unknown / cross- agent / out of retention.

Telemetry

  • Per-turn stats log: persist one row per claude turn in a new sqlite table on the per-agent state dir (or the host broker DB, indexed by agent). Columns: started_at, ended_at, duration_ms, model, input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, tool_call_count, tool_call_breakdown (JSON: {Read: 12, Bash: 3, ...}), bytes_streamed, wake_reason (recv'd message / reminder / operator-kick / manual), result_kind (ok / cancelled / failed-mid-turn / compacted), note (e.g. failure reason). Powers: per-agent dashboards (avg turn time over time, tool-usage histogram, cost projections from token counts × model rate), debugging stuck loops (look for repeated identical wake_reason + zero tool calls), and operator-visible "this is what your spend looked like this week" rollups. Source data is already mostly in the harness's TurnState + the per-event bus; just needs a sink. Keep a retention sweep (host-side) so the table doesn't grow forever.

Bugs

  • Post-rebuild system-message missed wake: at 09:13:14 the dashboard showed system → damocles container rebuilt as ✓ delivered, but the agent harness never ran a turn for it (no claude invocation, no operator-visible activity). A subsequent recv() from inside the agent returned (empty), confirming the message was popped + marked delivered server-side — yet drove no turn. Most likely cause: the agent_server serve_agent_stdio task is up and answering MCP/socket calls, but the hive-ag3nt::serve long-poll loop that drives drive_turn either died silently during rebuild or never restarted. Investigate: (a) does hive-ag3nt's serve loop survive nixos-container update cleanly, or does its tokio runtime get torn down mid-loop? (b) is there an early-exit path on a transient socket error during rebuild that drops the serve task without notifying the manager? (c) compare timeline with manager's own post-rebuild wake to see if this is rebuilt-agents-only or universal. Could be related to the recv_blocking fix in e423d57 if the rebuild restarts the broker mid-subscribe.