hyperhive/TODO.md

82 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Hyperhive TODOs
## Architecture / Features
- Shared space for all agents to access documents/files without manager routing
- Private git forge agents can push to and create new repos in
- Move bind mounts in agents to `/agents/<name>/state` so path for agent = path for manager
- **Broadcast messaging**: allow sending messages with recipient "*" to all agents; deliver with hint "this was a broadcast and may not need any action from you"
- **Multi-agent restart coordination**: when rebuilding all agents, manager should start first so it can coordinate post-restart confusion (notify agents, suppress unnecessary retries, etc)
- **Shared docs/skills repo (RO)**: a single repo on the hive forge that every agent has read-only access to — common references, prompts, runbooks, "skills" the operator wants every agent to inherit without baking into the system prompt or `/shared`. Implementation likely: seed an `org-shared/docs` repo on first hive-forge boot, grant every per-agent user a read membership in the org. Agents `git clone` it (or use the API) to read; only the manager + operator can push.
- ~~**Loose-ends tracker + `get_open_threads` tool**~~ ✓ landed — new `mcp__hyperhive__get_open_threads` MCP tool on both agent + manager surfaces. Wire types in `hive-sh4re`: `AgentRequest::GetOpenThreads` / `ManagerRequest::GetOpenThreads``OpenThreads { threads: Vec<OpenThread> }`. `OpenThread` is a tagged enum with `Approval { id, agent, commit_ref, description, age_seconds }` and `Question { id, asker, target, question, age_seconds }`. Shared aggregator at `hive-c0re/src/open_threads.rs`: `for_agent(coord, name)` (sub-agent surface; filters questions by asker == self OR target == self, approvals only for manager) and `hive_wide(coord)` (manager surface; everything pending in the swarm). No caching — fresh sqlite sweep per call. **Per-agent web UI rendering** is a follow-up below.
- **Follow-up: surface open-threads on the per-agent web UI** so the operator can see at a glance what each agent has hanging open — same data source as the MCP tool, just rendered into the existing per-agent dashboard page (next to inbox view / model chip / etc).
## Reminder Tool
- Per-agent reminder limits (burst capacity, rate limiting)
- **Scheduler shutdown**: add graceful shutdown signal when coordinator is destroyed (currently runs forever)
- **DB lock contention**: under high reminder volume, the broker's `Mutex<Connection>` serializes every delivery transaction. Consider batching multiple deliveries into one tx, or moving reminders onto a separate sqlite connection.
## Dashboard
- **Reminder delivery-error surface**: `reminder_scheduler::tick` logs failed deliveries but doesn't persist. Add `last_error TEXT, attempt_count INTEGER` columns + a banner on the dashboard row + a "retry" affordance. Needs a sqlite migration (idempotent ALTER TABLE).
- **Per-agent reminder status / query interface**: surface pending vs. delivered counts per agent (manager + each sub-agent) as a small chip on the container row.
- **Tombstones + meta_inputs events**: not yet event-derived. PURG3 + meta-update still trigger a post-submit `/api/state` refetch on the dashboard. Add `TombstoneAdded`/`TombstoneRemoved` + `MetaInputsChanged` so those forms can drop their refetch too and the cold-load is the only `/api/state` fetch in normal operation.
## Security
- **Privsep the dashboard from the privileged daemon**: hive-c0re runs as root (it has to — `nixos-container` create / start / destroy, the meta git repo, every per-agent bind mount). The HTTP server lives in the same process, so every read-endpoint (`/api/state-file`, `/api/journal/{name}`, `/api/agent-config/{name}`) is one allow-list bug away from serving arbitrary host files. Split the architecture: keep the privileged daemon doing lifecycle + git + ipc, run the web UI as an unprivileged user that talks to the daemon over a unix socket with a narrow request surface (`ReadAgentStateFile { agent, rel_path }` etc.). The unprivileged process can't read `/etc/shadow` even if every check in `get_state_file` is bypassed — it doesn't have the bits. Container-lifecycle POSTs (`/restart`, `/destroy`, etc.) become forwarded RPCs the privileged side authorises on its terms.
- **Defense in depth on `get_state_file`**: until privsep lands, the allow-list is load-bearing. Worth adding: refuse files whose mode is not world-readable (so an agent writing a 0600 file inside `state/` can't have its contents proxied through the endpoint to a different operator), and refuse symlinks at any path component (`O_NOFOLLOW`-style — `canonicalize` resolves them, but we currently don't reject if the original path had symlinks).
## Harness Ergonomics (agent-side wishlist)
Filed by damocles, who actually lives in this thing. Loosely ranked by
how often the friction bites in normal use.
- ~~**Auto-attach oversize message bodies**~~ — superseded by simply
raising the inline cap from 1 KiB → 4 KiB (covers ~95% of
conversational overflow). Anything genuinely larger still needs a
state file. Blob-in-broker-sqlite was prototyped on paper
(`/agents/damocles/state/oversize-msg-proposal.md`) but rejected as
future vacuum/sync pain not worth carrying for the long-tail 5% of
cases that legitimately belong in a file.
- **Inbox batching hint in the wake prompt** — when the harness pops a
message and there are N more waiting, the wake prompt should say so
(e.g. `"(+3 more queued; consider draining before acting)"`) so claude
knows to call `recv()` again in the same turn instead of doing the
expensive Read/Edit dance once per message over N turns. The data's
already in the broker (`Broker::pending_count(agent)`); just thread it
into the prompt builder in `hive-ag3nt::turn.rs`. Even better: add a
one-shot `recv_batch(max: u32)` MCP tool that returns up to `max`
pending messages in a single round-trip.
- **Self-management of own asks + reminders** — once I fire `ask` or
`remind` I have no way to inspect or cancel them from the agent side.
Operator can cancel asks via dashboard; nothing for reminders at all
(TODO above). Want `list_my_asks() -> [{id, target, question, asked_at}]`
and `cancel_ask(id)` on the agent surface, plus `list_my_reminders()`
/ `cancel_reminder(id)`. Bounded by `asker == self` and `reminder.owner
== self` so no cross-agent meddling.
- **`whoami` introspection tool** — agents currently rely on the system
prompt remembering their name + role. After a rename or model swap
there's no trustworthy source-of-truth from inside the harness.
Cheap: a `whoami() -> { name, role: "agent" | "manager", model, port,
hyperhive_rev, started_at }` tool reading from the harness's own env
+ `TurnState`. Useful for self-documenting state files ("this dropped
by damocles@gpt-5-codex on rev abc1234") and for the future
`get_open_threads` to know whose threads to query without
trusting prompt-substituted strings.
- **Optional `in_reply_to: <msg_id>` on send** — pure wire addition; no
behavioural change. The dashboard could render conversation threads
(already wants this for the agent-to-agent question UI in the
Dashboard section). Today every reply is a fresh root in the message
flow which obscures cause-and-effect when two agents are mid-debate.
Field is optional, ignored if the referenced id is unknown / cross-
agent / out of retention.
## Telemetry
- **Per-turn stats log**: persist one row per claude turn in a new sqlite table on the per-agent state dir (or the host broker DB, indexed by agent). Columns: `started_at`, `ended_at`, `duration_ms`, `model`, `input_tokens`, `output_tokens`, `cache_read_input_tokens`, `cache_creation_input_tokens`, `tool_call_count`, `tool_call_breakdown` (JSON: `{Read: 12, Bash: 3, ...}`), `bytes_streamed`, `wake_reason` (recv'd message / reminder / operator-kick / manual), `result_kind` (ok / cancelled / failed-mid-turn / compacted), `note` (e.g. failure reason). Powers: per-agent dashboards (avg turn time over time, tool-usage histogram, cost projections from token counts × model rate), debugging stuck loops (look for repeated identical wake_reason + zero tool calls), and operator-visible "this is what your spend looked like this week" rollups. Source data is already mostly in the harness's `TurnState` + the per-event bus; just needs a sink. Keep a retention sweep (host-side) so the table doesn't grow forever.
## Bugs
- **Post-rebuild system-message missed wake**: at 09:13:14 the dashboard showed `system → damocles container rebuilt` as ✓ delivered, but the agent harness never ran a turn for it (no claude invocation, no operator-visible activity). A subsequent `recv()` from inside the agent returned `(empty)`, confirming the message was popped + marked delivered server-side — yet drove no turn. Most likely cause: the agent_server `serve_agent_stdio` task is up and answering MCP/socket calls, but the `hive-ag3nt::serve` long-poll loop that drives `drive_turn` either died silently during rebuild or never restarted. Investigate: (a) does hive-ag3nt's serve loop survive `nixos-container update` cleanly, or does its tokio runtime get torn down mid-loop? (b) is there an early-exit path on a transient socket error during rebuild that drops the serve task without notifying the manager? (c) compare timeline with manager's own post-rebuild wake to see if this is rebuilt-agents-only or universal. Could be related to the `recv_blocking` fix in `e423d57` if the rebuild restarts the broker mid-subscribe.