9 KiB
9 KiB
Hyperhive TODOs
Architecture / Features
- Shared space for all agents to access documents/files without manager routing
- Private git forge agents can push to and create new repos in
- Move bind mounts in agents to
/agents/<name>/stateso path for agent = path for manager - Broadcast messaging: allow sending messages with recipient "*" to all agents; deliver with hint "this was a broadcast and may not need any action from you"
- Multi-agent restart coordination: when rebuilding all agents, manager should start first so it can coordinate post-restart confusion (notify agents, suppress unnecessary retries, etc)
- Shared docs/skills repo (RO): a single repo on the hive forge that every agent has read-only access to — common references, prompts, runbooks, "skills" the operator wants every agent to inherit without baking into the system prompt or
/shared. Implementation likely: seed anorg-shared/docsrepo on first hive-forge boot, grant every per-agent user a read membership in the org. Agentsgit cloneit (or use the API) to read; only the manager + operator can push. Rename✓ done —ask_operator→askwith optionaltoparamAsk { question, options, multi, ttl_seconds, to: Option<String> }on bothAgentRequest+ManagerRequest.to = None(orSome("operator")) = dashboard path;to = Some(<agent>)pushesHelperEvent::QuestionAskedinto the target's inbox. NewAnswer { id, answer }request on both surfaces — target answers viamcp__hyperhive__answer; answer flows back to the asker asHelperEvent::QuestionAnswered { id, question, answer, answerer }(renamed fromOperatorAnswered; carries who answered so the asker can distinguish operator vs peer vsttl-watchdog). Authorisation: only the question'stargetagent or the operator can answer; self-ask is rejected. DB gets a nullabletargetcolumn (NULL = operator path, back-compat). Dashboard'spending()/recent_answered()filter ontarget IS NULLso peer questions never leak into the operator's queue. Shared dispatch lives inhive-c0re/src/questions.rsso both surfaces stay aligned.- Loose-ends tracker +
get_open_threadstool: hive-c0re already knows about pending approvals + unanswered questions; soon will also know about open PRs on hive-forge. Aggregate these into a per-agent "open threads" view (e.g.[{kind: "approval", id: 7, summary: "spawn alice"}, {kind: "question", id: 12, asker: "alice", summary: "deploy now?"}]). New MCP toolmcp__hyperhive__get_open_threadsreturns the list so an agent can see what's still pending against it without rebuilding context from inbox history. Manager's version includes hive-wide threads. Also surface this list on the per-agent web UI so the operator can see at a glance what each agent has hanging open — same data source as the MCP tool, just rendered into the existing per-agent dashboard page (next to inbox view / model chip / etc).
Reminder Tool
Handle text overflow → suggest file_path option for long messages✓ fixed — Remind dispatch rejectsmessage.len() > 4096(when nofile_pathwas supplied) with an error pointing at thefile_pathescape hatch.- Per-agent reminder limits (burst capacity, rate limiting)
Expose✓ fixed —remindMCP toolmcp__hyperhive__remindnow onAgentServer; takesmessage, exactly one ofdelay_seconds/at_unix_timestamp, optionalfile_path. Manager surface still missing (noManagerRequest::Remindvariant) — separate item below.Manager-side✓ fixed —remindManagerRequest::Remindvariant added, dispatch reusesagent_server::store_remindhelper (shared across both surfaces),mcp__hyperhive__remindnow onManagerServer(auto-file lands at/state/reminders/auto-<ts>.md— manager's legacy state mount).File path delivery✓ fixed — scheduler now writes the reminder body to the requestedfile_path(mapped from container/agents/<agent>/state/...to host/var/lib/hyperhive/agents/<agent>/state/...) and delivers a short pointer message in its place. Path-traversal + foreign-agent-state writes are rejected; on rejection or write failure the body falls back to inline delivery with a noted warning. New modulehive-c0re/src/reminder_scheduler.rs(extracted from main.rs).Orphan reminders✓ fixed —Broker::deliver_reminderwraps the inbox INSERT + reminders UPDATE in one sqlite transaction; partial failure can no longer cause duplicate delivery on the next tick.Unbounded batches✓ fixed — scheduler now callsget_due_reminders(REMINDER_BATCH_LIMIT)(cap = 100/tick); overflow stays due and gets picked up next cycle.- Scheduler shutdown: add graceful shutdown signal when coordinator is destroyed (currently runs forever)
- DB lock contention: under high reminder volume, the broker's
Mutex<Connection>serializes every delivery transaction. Consider batching multiple deliveries into one tx, or moving reminders onto a separate sqlite connection.
Dashboard
- UI for agent-to-agent questions (follow-up to the
askrename): now that agents canask(to: <agent>)each other, surface those threads in the per-agent dashboard view. Replace the existing read/unread tabs with THREE filters:unread,from: <agent>,to: <agent>. Theto:filter makes agent-targeted questions visible so the operator can see at a glance "alice has 3 questions outstanding from bob" and intervene if a thread is stuck. Same UI is useful for general inbox filtering too. Data lives in the existingoperator_questionstable (with the newtargetcolumn) + the broker inbox; no new schema needed. Also expose a "respond" affordance so the operator can override-answer a peer question when an agent is offline / stuck (the answerer-auth check inOperatorQuestions::answeralready permits the operator on any target). - UI for pending reminders: show pending/queued reminders in dashboard, allow operator to view/debug/cancel
- Per-agent reminder status (pending, delivered)
- Reminder query interface for debugging
- Display reminder delivery errors (failed sends, mark failures)
- Phase 5b: per-domain mutation event types + client derived state. Foundation already in place (
DashboardEventchannel on Coordinator, broker→dashboard forwarder,/dashboard/{stream,history}, snapshot+SSE seq dedupe). Remaining work: addApprovalAdded/ApprovalResolved,QuestionAdded/QuestionAnswered,TransientChangedvariants toDashboardEvent; emit each at the corresponding mutation site (actions::approve/deny/finish_approval,approvals.submit_kind,OperatorQuestions::{submit,answer,cancel},Coordinator::{set_transient,clear_transient}); have the client maintain derivedapprovals/questions/transientsarrays applied from events and drop those fields from/api/state. Unblocks dropping the redirect-and-refetch on every remaining action endpoint (/approve,/deny,/restart,/destroy,/kill,/rebuild,/api/cancel,/api/compact,/api/model,/api/new-session,/request-spawn,/answer-question,/cancel-question,/meta-update,/purge-tombstone). Container-list events deferred untilContainerViewbecomes event-derivable (currently sourced from externalnixos-container list).
Bugs
Pending message wake-up✓ fixed (e423d57) — subscribe-before-check race inbroker.recv_blockingmeant a send landing between the initialrecv()andsubscribe()was missed; agent then sat on the 180s long-poll until another, unrelated message woke it. Now subscribe first.- Post-rebuild system-message missed wake: at 09:13:14 the dashboard showed
system → damocles container rebuiltas ✓ delivered, but the agent harness never ran a turn for it (no claude invocation, no operator-visible activity). A subsequentrecv()from inside the agent returned(empty), confirming the message was popped + marked delivered server-side — yet drove no turn. Most likely cause: the agent_serverserve_agent_stdiotask is up and answering MCP/socket calls, but thehive-ag3nt::servelong-poll loop that drivesdrive_turneither died silently during rebuild or never restarted. Investigate: (a) does hive-ag3nt's serve loop survivenixos-container updatecleanly, or does its tokio runtime get torn down mid-loop? (b) is there an early-exit path on a transient socket error during rebuild that drops the serve task without notifying the manager? (c) compare timeline with manager's own post-rebuild wake to see if this is rebuilt-agents-only or universal. Could be related to therecv_blockingfix ine423d57if the rebuild restarts the broker mid-subscribe. ✓ fixed — converted to struct variantLiveEvent::Note(String)never reaches the browserNote { text: String }; wire shape{"kind":"note","text":"..."}matches what the JS already reads viaev.text. Historical sqlite rows persisted as the literal string"null"(from when serialization silently failed) get filtered out by therows.flatten().flatten()pipeline inEventStore::recent, so replay tolerates them.