hyperhive/TODO.md

5.7 KiB

Hyperhive TODOs

Architecture / Features

  • Shared space for all agents to access documents/files without manager routing
  • Private git forge agents can push to and create new repos in
  • Move bind mounts in agents to /agents/<name>/state so path for agent = path for manager
  • Broadcast messaging: allow sending messages with recipient "*" to all agents; deliver with hint "this was a broadcast and may not need any action from you"
  • Multi-agent restart coordination: when rebuilding all agents, manager should start first so it can coordinate post-restart confusion (notify agents, suppress unnecessary retries, etc)
  • Shared docs/skills repo (RO): a single repo on the hive forge that every agent has read-only access to — common references, prompts, runbooks, "skills" the operator wants every agent to inherit without baking into the system prompt or /shared. Implementation likely: seed an org-shared/docs repo on first hive-forge boot, grant every per-agent user a read membership in the org. Agents git clone it (or use the API) to read; only the manager + operator can push.
  • Rename ask_operatorask with optional to param: today mcp__hyperhive__ask_operator always targets the operator dashboard. Generalise: rename to ask, add optional to: <agent_name> argument that defaults to "operator". When to is another agent, route the question to that agent's inbox as a structured "question event" (different from a plain send so the recipient can answer back with the same id and the answer threads back to the asker). Unblocks agent-to-agent structured Q&A without burning regular inbox slots.
  • Loose-ends tracker + get_open_threads tool: hive-c0re already knows about pending approvals + unanswered questions; soon will also know about open PRs on hive-forge. Aggregate these into a per-agent "open threads" view (e.g. [{kind: "approval", id: 7, summary: "spawn alice"}, {kind: "question", id: 12, asker: "alice", summary: "deploy now?"}]). New MCP tool mcp__hyperhive__get_open_threads returns the list so an agent can see what's still pending against it without rebuilding context from inbox history. Manager's version includes hive-wide threads.

Reminder Tool

  • Handle text overflow → suggest file_path option for long messages ✓ fixed — Remind dispatch rejects message.len() > 4096 (when no file_path was supplied) with an error pointing at the file_path escape hatch.
  • Per-agent reminder limits (burst capacity, rate limiting)
  • Expose remind MCP tool ✓ fixed — mcp__hyperhive__remind now on AgentServer; takes message, exactly one of delay_seconds / at_unix_timestamp, optional file_path. Manager surface still missing (no ManagerRequest::Remind variant) — separate item below.
  • Manager-side remind: mirror of the agent tool but on ManagerServer. Needs ManagerRequest::Remind variant in hive-sh4re, dispatch in manager_server.rs, MCP tool wiring.
  • File path delivery ✓ fixed — scheduler now writes the reminder body to the requested file_path (mapped from container /agents/<agent>/state/... to host /var/lib/hyperhive/agents/<agent>/state/...) and delivers a short pointer message in its place. Path-traversal + foreign-agent-state writes are rejected; on rejection or write failure the body falls back to inline delivery with a noted warning. New module hive-c0re/src/reminder_scheduler.rs (extracted from main.rs).
  • Orphan reminders ✓ fixed — Broker::deliver_reminder wraps the inbox INSERT + reminders UPDATE in one sqlite transaction; partial failure can no longer cause duplicate delivery on the next tick.
  • Unbounded batches ✓ fixed — scheduler now calls get_due_reminders(REMINDER_BATCH_LIMIT) (cap = 100/tick); overflow stays due and gets picked up next cycle.
  • Scheduler shutdown: add graceful shutdown signal when coordinator is destroyed (currently runs forever)
  • DB lock contention: under high reminder volume, the broker's Mutex<Connection> serializes every delivery transaction. Consider batching multiple deliveries into one tx, or moving reminders onto a separate sqlite connection.

Dashboard

  • UI for pending reminders: show pending/queued reminders in dashboard, allow operator to view/debug/cancel
  • Per-agent reminder status (pending, delivered)
  • Reminder query interface for debugging
  • Display reminder delivery errors (failed sends, mark failures)

Bugs

  • Pending message wake-up ✓ fixed (e423d57) — subscribe-before-check race in broker.recv_blocking meant a send landing between the initial recv() and subscribe() was missed; agent then sat on the 180s long-poll until another, unrelated message woke it. Now subscribe first.
  • Post-rebuild system-message missed wake: at 09:13:14 the dashboard showed system → damocles container rebuilt as ✓ delivered, but the agent harness never ran a turn for it (no claude invocation, no operator-visible activity). A subsequent recv() from inside the agent returned (empty), confirming the message was popped + marked delivered server-side — yet drove no turn. Most likely cause: the agent_server serve_agent_stdio task is up and answering MCP/socket calls, but the hive-ag3nt::serve long-poll loop that drives drive_turn either died silently during rebuild or never restarted. Investigate: (a) does hive-ag3nt's serve loop survive nixos-container update cleanly, or does its tokio runtime get torn down mid-loop? (b) is there an early-exit path on a transient socket error during rebuild that drops the serve task without notifying the manager? (c) compare timeline with manager's own post-rebuild wake to see if this is rebuilt-agents-only or universal. Could be related to the recv_blocking fix in e423d57 if the rebuild restarts the broker mid-subscribe.