todo: prune resolved items
This commit is contained in:
parent
88a1f4c146
commit
62784d4933
1 changed files with 1 additions and 10 deletions
11
TODO.md
11
TODO.md
|
|
@ -8,18 +8,11 @@
|
|||
- **Broadcast messaging**: allow sending messages with recipient "*" to all agents; deliver with hint "this was a broadcast and may not need any action from you"
|
||||
- **Multi-agent restart coordination**: when rebuilding all agents, manager should start first so it can coordinate post-restart confusion (notify agents, suppress unnecessary retries, etc)
|
||||
- **Shared docs/skills repo (RO)**: a single repo on the hive forge that every agent has read-only access to — common references, prompts, runbooks, "skills" the operator wants every agent to inherit without baking into the system prompt or `/shared`. Implementation likely: seed an `org-shared/docs` repo on first hive-forge boot, grant every per-agent user a read membership in the org. Agents `git clone` it (or use the API) to read; only the manager + operator can push.
|
||||
- ~~**Rename `ask_operator` → `ask` with optional `to` param**~~ ✓ done — `Ask { question, options, multi, ttl_seconds, to: Option<String> }` on both `AgentRequest` + `ManagerRequest`. `to = None` (or `Some("operator")`) = dashboard path; `to = Some(<agent>)` pushes `HelperEvent::QuestionAsked` into the target's inbox. New `Answer { id, answer }` request on both surfaces — target answers via `mcp__hyperhive__answer`; answer flows back to the asker as `HelperEvent::QuestionAnswered { id, question, answer, answerer }` (renamed from `OperatorAnswered`; carries who answered so the asker can distinguish operator vs peer vs `ttl-watchdog`). Authorisation: only the question's `target` agent or the operator can answer; self-ask is rejected. DB gets a nullable `target` column (NULL = operator path, back-compat). Dashboard's `pending()` / `recent_answered()` filter on `target IS NULL` so peer questions never leak into the operator's queue. Shared dispatch lives in `hive-c0re/src/questions.rs` so both surfaces stay aligned.
|
||||
- **Loose-ends tracker + `get_open_threads` tool**: hive-c0re already knows about pending approvals + unanswered questions; soon will also know about open PRs on hive-forge. Aggregate these into a per-agent "open threads" view (e.g. `[{kind: "approval", id: 7, summary: "spawn alice"}, {kind: "question", id: 12, asker: "alice", summary: "deploy now?"}]`). New MCP tool `mcp__hyperhive__get_open_threads` returns the list so an agent can see what's still pending against it without rebuilding context from inbox history. Manager's version includes hive-wide threads. **Also surface this list on the per-agent web UI** so the operator can see at a glance what each agent has hanging open — same data source as the MCP tool, just rendered into the existing per-agent dashboard page (next to inbox view / model chip / etc).
|
||||
|
||||
## Reminder Tool
|
||||
|
||||
- ~~Handle text overflow → suggest file_path option for long messages~~ ✓ fixed — Remind dispatch rejects `message.len() > 4096` (when no `file_path` was supplied) with an error pointing at the `file_path` escape hatch.
|
||||
- Per-agent reminder limits (burst capacity, rate limiting)
|
||||
- ~~**Expose `remind` MCP tool**~~ ✓ fixed — `mcp__hyperhive__remind` now on `AgentServer`; takes `message`, exactly one of `delay_seconds` / `at_unix_timestamp`, optional `file_path`. Manager surface still missing (no `ManagerRequest::Remind` variant) — separate item below.
|
||||
- ~~**Manager-side `remind`**~~ ✓ fixed — `ManagerRequest::Remind` variant added, dispatch reuses `agent_server::store_remind` helper (shared across both surfaces), `mcp__hyperhive__remind` now on `ManagerServer` (auto-file lands at `/state/reminders/auto-<ts>.md` — manager's legacy state mount).
|
||||
- ~~**File path delivery**~~ ✓ fixed — scheduler now writes the reminder body to the requested `file_path` (mapped from container `/agents/<agent>/state/...` to host `/var/lib/hyperhive/agents/<agent>/state/...`) and delivers a short pointer message in its place. Path-traversal + foreign-agent-state writes are rejected; on rejection or write failure the body falls back to inline delivery with a noted warning. New module `hive-c0re/src/reminder_scheduler.rs` (extracted from main.rs).
|
||||
- ~~**Orphan reminders**~~ ✓ fixed — `Broker::deliver_reminder` wraps the inbox INSERT + reminders UPDATE in one sqlite transaction; partial failure can no longer cause duplicate delivery on the next tick.
|
||||
- ~~**Unbounded batches**~~ ✓ fixed — scheduler now calls `get_due_reminders(REMINDER_BATCH_LIMIT)` (cap = 100/tick); overflow stays due and gets picked up next cycle.
|
||||
- **Scheduler shutdown**: add graceful shutdown signal when coordinator is destroyed (currently runs forever)
|
||||
- **DB lock contention**: under high reminder volume, the broker's `Mutex<Connection>` serializes every delivery transaction. Consider batching multiple deliveries into one tx, or moving reminders onto a separate sqlite connection.
|
||||
|
||||
|
|
@ -31,10 +24,8 @@
|
|||
- Per-agent reminder status (pending, delivered)
|
||||
- Reminder query interface for debugging
|
||||
- Display reminder delivery errors (failed sends, mark failures)
|
||||
- ~~**Phase 5b: per-domain mutation event types + client derived state**~~ ✓ landed across 56d615b (approvals), 1879b2f (questions), 7956e1c (transients). `DashboardEvent` now carries `ApprovalAdded` / `ApprovalResolved`, `QuestionAdded` / `QuestionResolved`, `TransientSet` / `TransientCleared`; emit sites cover `actions::approve`/`deny`/`finish_approval`, dashboard's orphan-approval GC, manager-socket `request_spawn` + `request_apply_commit` (success + git_fetch failure), `questions::handle_ask`/`handle_answer` (operator-targeted only), dashboard's `/answer-question` + `/cancel-question`, ttl-watchdog, `Coordinator::set_transient`/`clear_transient`. `/api/state` still serves these arrays for cold-start; live updates flow through the events. Container-list events still deferred — `ContainerView` is sourced from external `nixos-container list`, so the 5s poll continues to drive `/containers-section`. Phase 6 remaining redirect conversions (`/approve`, `/deny`, `/restart`, `/destroy`, `/kill`, `/rebuild`, `/api/cancel`, `/api/compact`, `/api/model`, `/api/new-session`, `/request-spawn`, `/answer-question`, `/cancel-question`, `/meta-update`, `/purge-tombstone`) are now unblocked for the event-covered domains; container-lifecycle ones still need either container-list events or to live with the 5s poll-refresh delay.
|
||||
- **Phase 6 leftovers**: convert the remaining redirect-and-refetch endpoints whose mutations now fire dashboard events. `/approve`, `/deny`, `/answer-question`, `/cancel-question`, `/request-spawn` are 5-line `Redirect::to("/")` → `(StatusCode::OK, "ok")` changes; the client already updates derived state from the events. Container-lifecycle endpoints (`/restart`, `/destroy`, `/kill`, `/rebuild`, `/start`, `/api/{cancel,compact,model,new-session}`, `/meta-update`, `/purge-tombstone`) need a `ContainerListChanged` event first — `ContainerView` is currently sourced from external `nixos-container list`, so the 5s poll still drives that section.
|
||||
|
||||
## Bugs
|
||||
|
||||
- ~~**Pending message wake-up**~~ ✓ fixed (e423d57) — subscribe-before-check race in `broker.recv_blocking` meant a send landing between the initial `recv()` and `subscribe()` was missed; agent then sat on the 180s long-poll until another, unrelated message woke it. Now subscribe first.
|
||||
- **Post-rebuild system-message missed wake**: at 09:13:14 the dashboard showed `system → damocles container rebuilt` as ✓ delivered, but the agent harness never ran a turn for it (no claude invocation, no operator-visible activity). A subsequent `recv()` from inside the agent returned `(empty)`, confirming the message was popped + marked delivered server-side — yet drove no turn. Most likely cause: the agent_server `serve_agent_stdio` task is up and answering MCP/socket calls, but the `hive-ag3nt::serve` long-poll loop that drives `drive_turn` either died silently during rebuild or never restarted. Investigate: (a) does hive-ag3nt's serve loop survive `nixos-container update` cleanly, or does its tokio runtime get torn down mid-loop? (b) is there an early-exit path on a transient socket error during rebuild that drops the serve task without notifying the manager? (c) compare timeline with manager's own post-rebuild wake to see if this is rebuilt-agents-only or universal. Could be related to the `recv_blocking` fix in `e423d57` if the rebuild restarts the broker mid-subscribe.
|
||||
- ~~**`LiveEvent::Note(String)` never reaches the browser**~~ ✓ fixed — converted to struct variant `Note { text: String }`; wire shape `{"kind":"note","text":"..."}` matches what the JS already reads via `ev.text`. Historical sqlite rows persisted as the literal string `"null"` (from when serialization silently failed) get filtered out by the `rows.flatten().flatten()` pipeline in `EventStore::recent`, so replay tolerates them.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue