hyperhive/TODO.md
müde e7ce35c503 phase 6: container events + drop the 5s /api/state poll
new DashboardEvent::ContainerStateChanged + ContainerRemoved
close the last refetch loop on the dashboard. Coordinator's
rescan_containers_and_emit diffs a fresh container_view::build_all
against a cached last_containers map and fires per-row events.
called from actions::approve (post-spawn), actions::destroy,
the lifecycle_action wrapper, auto_update::rebuild_agent, and
the existing 10s crash_watch poll.

ContainerView extracted to its own module so coordinator and
dashboard can both build it. dashboard endpoints flip to 200;
container-lifecycle forms carry data-no-refresh. client drops
the periodic poll entirely — initial cold load + SSE for
everything afterwards. pending overlay reads from the existing
transientsState since the new event payload doesn't carry it.

PURG3 + meta-update keep the post-submit refetch since
tombstones + meta_inputs aren't event-derived yet; tracked in
TODO.md.
2026-05-17 22:01:15 +02:00

7.3 KiB

Hyperhive TODOs

Architecture / Features

  • Shared space for all agents to access documents/files without manager routing
  • Private git forge agents can push to and create new repos in
  • Move bind mounts in agents to /agents/<name>/state so path for agent = path for manager
  • Broadcast messaging: allow sending messages with recipient "*" to all agents; deliver with hint "this was a broadcast and may not need any action from you"
  • Multi-agent restart coordination: when rebuilding all agents, manager should start first so it can coordinate post-restart confusion (notify agents, suppress unnecessary retries, etc)
  • Shared docs/skills repo (RO): a single repo on the hive forge that every agent has read-only access to — common references, prompts, runbooks, "skills" the operator wants every agent to inherit without baking into the system prompt or /shared. Implementation likely: seed an org-shared/docs repo on first hive-forge boot, grant every per-agent user a read membership in the org. Agents git clone it (or use the API) to read; only the manager + operator can push.
  • Loose-ends tracker + get_open_threads tool: hive-c0re already knows about pending approvals + unanswered questions; soon will also know about open PRs on hive-forge. Aggregate these into a per-agent "open threads" view (e.g. [{kind: "approval", id: 7, summary: "spawn alice"}, {kind: "question", id: 12, asker: "alice", summary: "deploy now?"}]). New MCP tool mcp__hyperhive__get_open_threads returns the list so an agent can see what's still pending against it without rebuilding context from inbox history. Manager's version includes hive-wide threads. Also surface this list on the per-agent web UI so the operator can see at a glance what each agent has hanging open — same data source as the MCP tool, just rendered into the existing per-agent dashboard page (next to inbox view / model chip / etc).
    • Scope per agent X (confirmed with operator): include BOTH (a) unanswered questions where asker == X (X is waiting on someone) AND (b) unanswered questions where target == X (X owes an answer). Distinguish via a role: "asker" | "target" field on the question variant so the agent can render "waiting on" vs "owe a reply" appropriately. Approvals: include rows where the submitter is X (waiting on the operator). Forge PRs (future): include open PRs where X is author OR reviewer.
    • Wire shape sketch: new AgentRequest::GetOpenThreads / ManagerRequest::GetOpenThreads returning Response::OpenThreads { threads: Vec<OpenThread> } with OpenThread as a tagged enum ({kind: "approval", id, summary, age_seconds} / {kind: "question", id, role, counterparty, summary, age_seconds} / future {kind: "pr", ...}). Manager flavour returns hive-wide threads (no asker/target filter). MCP tool get_open_threads takes no args.
    • Aggregator location: new helper on Coordinator (or a dedicated open_threads.rs) so both surfaces share the query logic; queries approvals + operator_questions tables with a single per-call sweep (no caching — call frequency is low).

Reminder Tool

  • Per-agent reminder limits (burst capacity, rate limiting)
  • Scheduler shutdown: add graceful shutdown signal when coordinator is destroyed (currently runs forever)
  • DB lock contention: under high reminder volume, the broker's Mutex<Connection> serializes every delivery transaction. Consider batching multiple deliveries into one tx, or moving reminders onto a separate sqlite connection.

Dashboard

  • UI for agent-to-agent questions (follow-up to the ask rename): now that agents can ask(to: <agent>) each other, surface those threads in the per-agent dashboard view. Replace the existing read/unread tabs with THREE filters: unread, from: <agent>, to: <agent>. The to: filter makes agent-targeted questions visible so the operator can see at a glance "alice has 3 questions outstanding from bob" and intervene if a thread is stuck. Same UI is useful for general inbox filtering too. Data lives in the existing operator_questions table (with the new target column) + the broker inbox; no new schema needed. Also expose a "respond" affordance so the operator can override-answer a peer question when an agent is offline / stuck (the answerer-auth check in OperatorQuestions::answer already permits the operator on any target).
  • Clickable file paths in message bodies: agents drop pointer strings like /agents/<name>/state/foo.md constantly (it's the whole 1 KiB-cap escape hatch). Right now they're plain text — operator has to copy-paste into a terminal to peek. Detect path-shaped tokens (start with /agents/, /shared/, /state/, or absolute /var/lib/hyperhive/...) in rendered message bodies + question text + answer text + helper-event payloads, render as clickable links that hit a new /api/state-file?path=… dashboard endpoint. Endpoint serves the file as text (with a strict allow-list — only paths under /var/lib/hyperhive/agents/*/state/, /var/lib/hyperhive/shared/, never anything else), syntax-highlighting where it makes sense, falling back to download for binaries. Reuses the existing <details> collapse pattern so inline preview doesn't blow up the message-flow stream.
  • UI for pending reminders: show pending/queued reminders in dashboard, allow operator to view/debug/cancel
  • Per-agent reminder status (pending, delivered)
  • Reminder query interface for debugging
  • Display reminder delivery errors (failed sends, mark failures)
  • Phase 6 follow-ups — dashboard side is fully event-driven (Phase 6 leftovers landed); the per-agent web UI's lifecycle endpoints (/api/{cancel,compact,model,new-session}, /login/*) still 303-redirect-and-poll. Convert them to 200 + data-no-refresh so the per-agent page stops refetching /api/state on every operator click — LiveEvent::Note already covers cancel/compact/model/new-session, login state needs its own NeedsLogin / LoggedIn events on the per-agent bus.
  • Tombstones + meta_inputs events: not yet event-derived. PURG3 + meta-update still trigger a post-submit /api/state refetch on the dashboard. Add TombstoneAdded/TombstoneRemoved + MetaInputsChanged so those forms can drop their refetch too and the cold-load is the only /api/state fetch in normal operation.

Bugs

  • Post-rebuild system-message missed wake: at 09:13:14 the dashboard showed system → damocles container rebuilt as ✓ delivered, but the agent harness never ran a turn for it (no claude invocation, no operator-visible activity). A subsequent recv() from inside the agent returned (empty), confirming the message was popped + marked delivered server-side — yet drove no turn. Most likely cause: the agent_server serve_agent_stdio task is up and answering MCP/socket calls, but the hive-ag3nt::serve long-poll loop that drives drive_turn either died silently during rebuild or never restarted. Investigate: (a) does hive-ag3nt's serve loop survive nixos-container update cleanly, or does its tokio runtime get torn down mid-loop? (b) is there an early-exit path on a transient socket error during rebuild that drops the serve task without notifying the manager? (c) compare timeline with manager's own post-rebuild wake to see if this is rebuilt-agents-only or universal. Could be related to the recv_blocking fix in e423d57 if the rebuild restarts the broker mid-subscribe.