Broker schema gains attempt_count INTEGER + last_error TEXT
columns via idempotent ALTER TABLE migration (pragma-probed so
fresh + existing dbs converge). reminder_scheduler::tick calls
record_reminder_failure on every deliver_reminder error,
bumping the counter + stashing the message. get_due_reminders
filters out rows where attempt_count >= MAX_REMINDER_ATTEMPTS
(5) so the scheduler stops retrying a stuck row until the
operator intervenes.
new POST /retry-reminder/{id} → reset_reminder_failure clears
the counters; next 5s tick re-attempts. cancel-reminder
unchanged (hard-delete).
dashboard renders failed rows with a red left rule, the error
text inline, and a ⚠ N failed badge. ↻ R3TRY button appears
when attempt_count > 0 — sits next to ✗ C4NC3L in a small
actions row below the body.
DashboardEvent::QuestionAdded gains question_refs and
QuestionResolved gains answer_refs — both populated via
scan_validated_paths at emit time, same helper the broker
forwarder uses for Sent/Delivered. cold-load snapshot wraps
each OpQuestion in QuestionView with the same fields computed
once per /api/state.
client threads refs through questionsState rows (pending +
history) and passes them to appendLinkified at every render
site (live pane, history details). path tokens in question and
answer bodies now linkify with the same server-vouched
guarantee broker messages already enjoyed.
ContainerView gains pending_reminders: u64; computed during
build_all via Broker::count_pending_reminders_for, mapping
manager → MANAGER_AGENT recipient + sub-agents → logical name.
Updates on every rescan (mutation sites + crash_watch's 10s
poll); accept 10s staleness on background remind / scheduler
delivery — live updates on operator cancel via /api/state path.
client renders a small cyan chip on the row when the count > 0;
tooltip points the operator at the reminders section to view
or cancel.
new DashboardEvent::TombstonesChanged + MetaInputsChanged carry
full snapshots (lists are tiny; snapshot beats diff for race
avoidance). Coordinator-side helpers
emit_tombstones_snapshot + emit_meta_inputs_snapshot fire from
every mutation site: actions::destroy + post_purge_tombstone +
actions::approve (spawn finalise consumes tombstone) +
run_meta_update + auto_update::rebuild_agent (lock bumps).
client adds derived stores + apply* handlers + drops the
post-submit refetch on PURG3 (container row + tombstone row)
and meta-update.
after this commit /api/state is fetched exactly once per page
session (cold load); every other change rides the SSE channel.
drop the /api/state-file/check probe endpoint (which let any
dashboard visitor enumerate filesystem layout by feeding paths)
and the client's optimistic-then-downgrade dance. instead, the
broker forwarder calls scan_validated_paths(body) — same
allow-list helper as the read endpoint — and attaches the
verified file tokens to DashboardEvent::Sent/Delivered as
file_refs: Vec<String>. /dashboard/history backfill does the
same per-row.
client appendLinkified takes a (text, refs) pair, walks
left-to-right linkifying every occurrence of any ref token,
longest-first tie-break. no regex, no probe, no cache, no
queue. when refs is empty/absent the body emits as plain text
(question/answer/reminder rendering — refs for those are a
follow-up).
operator inbox stores file_refs from the sent event so its
renderer gets the same anchors as the message-flow terminal.
regex back to permissive ("looks like a path") — the server is
authoritative on whether each match is a file. anchors render
optimistically, paths queue for batch validation (50ms coalesce),
non-files downgrade to plain text + the sibling <details>
preview is dropped. session-scoped cache (pathValidity Map) so
repeated paths skip the roundtrip.
new endpoint POST /api/state-file/check accepts { paths } and
returns { results: {<path>: bool} }. shares resolve_state_path
helper with the read endpoint so security rules can't drift —
both refuse anything outside the allow-list, anything resolved
outside via symlink, or anything in a per-agent subdir other
than state/. capped at 64 paths/request.
drops the brittle client-side filename heuristic (the .ext-
required rule that missed README/Makefile and still matched bare
dirs without trailing slash). single source of truth.
new sqlite table at /state/hyperhive-turn-stats.sqlite on each
agent's state dir. one row per claude turn captures identity
(model, wake_from, result_kind), timing (started/ended_at,
duration_ms), cost (input/output/cache_read/cache_creation token
counts), behaviour (tool_call_count + per-tool breakdown JSON),
and post-turn snapshot metrics (open_threads_count,
open_reminders_count).
wire additions:
- AgentRequest/ManagerRequest::CountPendingReminders +
Broker::count_pending_reminders_for(agent)
- Bus::observe_stream + take_tool_calls — pumps the existing
stdout stream-json, picks out tool_use blocks, accumulates per
turn. bin loops fold the breakdown into each row.
- TurnStats::open_default + TurnStatRow + record() — best-effort
inserts; failures log + don't block the harness.
both ag3nt and m1nd bins capture started_at + duration via
Instant::elapsed, fetch open-thread + reminder counts from
hive-c0re via the existing socket (post-turn, best-effort), and
record one row at turn_end. record_kind splits ok / failed /
prompt_too_long; failures carry the error message in note.
todo entries for host-side vacuum sweep + reading the table back
into agent/dashboard badges.
new 'qu3u3d r3m1nd3rs' section between approvals and operator
inbox. lists every pending reminder with agent, due-relative
timestamp, body, payload path (path-linkified), and a cancel
button. drives off a new /api/reminders endpoint and a
POST /cancel-reminder/{id} that hard-deletes the row.
failure surface (last_error / attempt_count + retry) deferred —
needs a sqlite migration; tracked in TODO.md.
agents constantly emit pointer strings to /agents/<n>/state/foo.md
since broker bodies cap at 1 KiB. now those tokens linkify in the
message flow, question bodies, answer text, and operator inbox;
clicking expands an inline <details> that lazy-fetches via the
new /api/state-file?path=... endpoint.
endpoint allow-list: per-agent state dirs + shared docs, both
in their container-mount form (/agents/<n>/state, /shared) and
host form (/var/lib/hyperhive/...). 1 MiB read cap; canonicalises
before the prefix check so `..` / symlinks can't escape.
legacy bare `/state/...` is deliberately not matched — ambiguous
from the host's perspective (we'd need to know which agent the
message references to translate). agents should use the qualified
form going forward.
questions pane now shows both operator-targeted threads
(target IS NULL) and agent-to-agent threads (target = some
agent). filter chips above the list: all / @operator / @peer /
per-participant. peer rows get a mauve left rule + a 0V3RR1D3
button that POSTs the same /answer-question endpoint
(OperatorQuestions::answer already permits the operator as
answerer on any target).
wire changes: OperatorQuestions gains pending_all +
recent_answered_all; QuestionAdded + QuestionResolved events
carry target: Option<String>; emit sites drop their
target.is_none() guard. answered-history rows show the
answerer prefix so override answers are auditable at a glance.
new DashboardEvent::ContainerStateChanged + ContainerRemoved
close the last refetch loop on the dashboard. Coordinator's
rescan_containers_and_emit diffs a fresh container_view::build_all
against a cached last_containers map and fires per-row events.
called from actions::approve (post-spawn), actions::destroy,
the lifecycle_action wrapper, auto_update::rebuild_agent, and
the existing 10s crash_watch poll.
ContainerView extracted to its own module so coordinator and
dashboard can both build it. dashboard endpoints flip to 200;
container-lifecycle forms carry data-no-refresh. client drops
the periodic poll entirely — initial cold load + SSE for
everything afterwards. pending overlay reads from the existing
transientsState since the new event payload doesn't carry it.
PURG3 + meta-update keep the post-submit refetch since
tombstones + meta_inputs aren't event-derived yet; tracked in
TODO.md.
startup sweep adds ensure_repo('meta', core_token) after the orgs
so the first push isn't a 404. meta::git_commit now calls
forge::push_meta after every successful commit — token-in-URL
`git push http://core:$token@localhost:3000/core/meta.git` —
gated on the core token file existing (no-op when forge isn't
seeded). push failures log warn, don't bubble up.
no tea needed on the host; git is already on the hive-c0re service
PATH via /run/current-system/sw.
new ensure_core_user_and_token mints a site-admin 'core' user with
its token at /var/lib/hyperhive/forge-core-token (root 0600) —
hive-c0re's own forge identity for pushing the meta repo + driving
the admin API. that token then drives ensure_org for 'core' (meta
repo lives here) and 'agents' (per-agent applied config repos).
both org-create calls are idempotent: HTTP 422/409 treated as
success. failures log but don't abort the rest of the sweep.
curl is shelled out from the host — already on the hive-c0re
service PATH via /run/current-system/sw, no new dep.
manager has /agents bind-mounted too, so /agents/hm1nd/state
resolves there alongside the legacy /state. one canonical path in
the wake message instead of branching on MANAGER_NAME.
manager keeps /state (legacy mount); sub-agents see their state at
/agents/<name>/state. wake message hardcoded /state/ for everyone,
which is wrong for sub-agents post-refactor — they get a path they
can't ls. switch on MANAGER_NAME and format the right path.
without --work-path, forgejo's admin CLI defaults WorkPath to the
binary's directory (RO nix store), can't find custom/conf/app.ini
there, falls back to defaults, and F3 init mkdir-fails inside the
store. systemd unit sets WORK_PATH for the daemon; mirror it here
for every nixos-container-driven 'forgejo admin' invocation.
bumped from (read:user,write:repository,write:issue) to also include
write:user (own profile + create repos under own namespace),
write:organization (share namespaces between agents), write:misc
(hooks/attachments). still excludes admin and package scopes.
new forge module probes the hive-forge nixos-container (no-op when
absent), and ensures every agent + the manager has a forgejo user
named after them with an access token at `<state>/forge-token`
(visible inside the container as `/state/forge-token`).
idempotent: skips user creation when forgejo reports 'already
exists', skips token issuance when the file is present, scopes the
token to read:user,write:repository,write:issue. token-name suffixed
with a clock so re-issuing doesn't collide with a stale name. shells
out via `nixos-container run hive-forge -- runuser -u forgejo --
forgejo admin` (runuser instead of sudo since sudo isn't in the
container by default).
hooks: ensure_all sweeps existing containers at hive-c0re startup
(backgrounded), and the actions.rs spawn task calls ensure_user_for
the new agent right after lifecycle::spawn succeeds. failures log a
warning but don't abort spawn — a missing token is recoverable from
the next startup sweep.
bare set_transient/clear_transient pairs leak the in-memory transient
on task cancellation, panics, or any early return between the two
calls — dashboard then shows the agent stuck in 'rebuilding…'
forever (coder hit this today). add Coordinator::transient_guard
returning a TransientGuard whose Drop clears, and convert every
caller (dashboard lifecycle_action, auto_update::rebuild_agent,
manager_server Update, actions::destroy, actions Spawn task,
migrate phase 4). destroy() now takes &Arc<Coordinator> so it can
hold a guard. existing stuck transients clear on next hive-c0re
restart since transient state is in-memory only.