dashboard.rs had the same 12-attempt cap shape as the per-agent
bind_with_retry. Apply the same fix — retry forever with the 2s-capped
backoff, WARN early then INFO once we're clearly stuck on a stale
socket, INFO on success when we did have to retry. Mirrors the
agent change in this PR.
The previous take put a shared NavLink wire type in hive-sh4re and
duplicated the link-building logic across crates. Per @mara on #326:
that doesn't fit the eventual frontend/backend split goal (#273).
The agent backend is the natural source of truth for what links its
own page exposes; hive-c0re just passes the list through to the
dashboard.
* hive-ag3nt/src/web_ui.rs: agent_links now also serves the
config-repo link + reads agent-declared dashboardLinks extras
from {state_dir}/hyperhive-dashboard-links.json. AgentLink gains a
kind enum (Container | Forge | External) so the frontend can build
the right href no matter which surface is rendering. The host
header is no longer used — URLs are paths for Container/Forge,
absolute for External.
* hive-c0re/src/dashboard.rs: new GET /api/agent/{name}/links route,
a same-origin proxy that fetches the agent's /api/state and
forwards just the links field. No shared wire type — hive-c0re
treats the payload as opaque JSON (serde_json::Value). All failure
modes degrade to an empty list so the dashboard still renders.
* hive-c0re/assets/app.js: container card head row gets an async-
populated icon-only nav strip from the proxy. The hardcoded stats
link, the standalone config-repo trigger, and the extras block are
gone. The deployed:<sha> chip stays — the agent harness can't know
its own deployed sha, so this chip is how the operator sees what's
live alongside the agent's (root-only) config link.
* hive-ag3nt/assets/app.js: agent page meta-links rendered via
el() / textContent (DOM build) so agent-declared icon / label / url
strings never reach innerHTML. kind-based href resolution mirrors
the dashboard side.
* docs/web-ui.md: dashboard + per-agent sections updated for the new
architecture.
Closes#262.
Remove the depth-2 cap in walk_meta_inputs so every fetched input
at every depth is surfaced, not just two levels (issue #275). The
uncapped walk needs a guard: a visited-node set makes it a spanning
tree — each fetched node walked once, at its shallowest path — so
shared subtrees don't re-walk and a cycle can't recurse forever.
A two-pass walk (claim a node's direct inputs before descending)
keeps shallow inputs at a shallow path.
Frontend: renderMetaInputs indents each row by its slash-path depth
and shows the leaf segment (full path on hover), plus a select-all /
select-none control so a long input list isn't ticked box by box.
post_meta_update returns 200 immediately and runs the nix flake
update + agent-rebuild ripple in a background task, so the META
INPUTS panel looked idle for the whole multi-minute window (#259).
Track in-flight runs with a Coordinator atomic counter, exposed via
an RAII MetaUpdateGuard held across run_meta_update. Surface it as
the meta_update_running snapshot field plus a MetaUpdateRunning SSE
event (flipped only when the count crosses 0, so concurrent runs
flip the flag once). The panel shows a pulsing in-progress banner
and disables the update button while a run is active.
Follow-up to #188. Two additions to the side-panel file preview:
- Markdown files get a rendered/plain tabbed view (was: always
rendered, no way to see source) — same tab pattern as SVG.
- Raster images (png/jpg/gif/webp/bmp/ico/avif) render as an
<img>. /api/state-file previously from_utf8_lossy-stringified
every file and served text/plain, which corrupts binary; it
now serves image files as raw bytes with their real
content-type (over-cap images are rejected, not truncated —
a clipped binary is corrupt).
buildSvgPanel generalised to buildTabbedPreview, shared by SVG +
markdown. .svg-host/.svg-render renamed .preview-host/.img-preview
since they now back images + md too.
closes#192
The main dashboard had no favicon — PR #145 added them to the
per-agent pages but missed hive-c0re's index. Serve branding/
hyperhive.svg at /favicon.svg and declare it in the index head.
The dashboard represents the whole hive, so it uses the project
mark (per-agent pages keep their own configurable /icon).
closes#173
- MessageEvent and DashboardEvent Sent/Delivered now carry id and in_reply_to
- broker.send() includes last_insert_rowid in the emitted event
- recent_all() and recv_batch() include id and in_reply_to from the DB
- deliver_reminders_batch() tracks per-row rowids within the transaction
- dashboard message flow: reply rows are indented with a border-left and a
clickable '↳ reply' tag that scroll-jumps + briefly highlights the parent
- per-agent inbox: reply messages get a '↳ reply ·' prefix and indent
Closes#26
- forge nix option moves to hyperhive.forge.enable, defaults true;
hive-c0re imports the forge module so it's on by default.
- drop the agent.nix container-row viewer + /api/agent-config; link
to the agent-configs forge repo instead.
- restructure pending approvals into a card (identity header /
what-changed body / decision actions) with a link to the proposal
commit on the forge.
- diff opens in the side panel with a 3-way base toggle: vs applied
(running) / vs last-approved / vs previous proposal, served by the
new /api/approval-diff/{id}?base= endpoint.
clicking a .md / .markdown path reference now opens a marked-rendered
view in the slide-in panel instead of raw text; other files stay raw
in a <pre>. serves the vendored marked bundle at /static/marked.js and
scopes a .md stylesheet to the panel body.
loose-ends question rows get a textarea + send button; the operator
answers as operator by POSTing to the core dashboard's
/answer-question route, not the per-agent socket — keeps the
operator-authority path off the agent's own socket. cross-origin POST
needs a CORS shim on that route for now; drops out once the gateway
makes the page same-origin.
also splits deployment/ops/boundaries/gateway work into TODO-ops.md.
Broker schema gains attempt_count INTEGER + last_error TEXT
columns via idempotent ALTER TABLE migration (pragma-probed so
fresh + existing dbs converge). reminder_scheduler::tick calls
record_reminder_failure on every deliver_reminder error,
bumping the counter + stashing the message. get_due_reminders
filters out rows where attempt_count >= MAX_REMINDER_ATTEMPTS
(5) so the scheduler stops retrying a stuck row until the
operator intervenes.
new POST /retry-reminder/{id} → reset_reminder_failure clears
the counters; next 5s tick re-attempts. cancel-reminder
unchanged (hard-delete).
dashboard renders failed rows with a red left rule, the error
text inline, and a ⚠ N failed badge. ↻ R3TRY button appears
when attempt_count > 0 — sits next to ✗ C4NC3L in a small
actions row below the body.
DashboardEvent::QuestionAdded gains question_refs and
QuestionResolved gains answer_refs — both populated via
scan_validated_paths at emit time, same helper the broker
forwarder uses for Sent/Delivered. cold-load snapshot wraps
each OpQuestion in QuestionView with the same fields computed
once per /api/state.
client threads refs through questionsState rows (pending +
history) and passes them to appendLinkified at every render
site (live pane, history details). path tokens in question and
answer bodies now linkify with the same server-vouched
guarantee broker messages already enjoyed.
new DashboardEvent::TombstonesChanged + MetaInputsChanged carry
full snapshots (lists are tiny; snapshot beats diff for race
avoidance). Coordinator-side helpers
emit_tombstones_snapshot + emit_meta_inputs_snapshot fire from
every mutation site: actions::destroy + post_purge_tombstone +
actions::approve (spawn finalise consumes tombstone) +
run_meta_update + auto_update::rebuild_agent (lock bumps).
client adds derived stores + apply* handlers + drops the
post-submit refetch on PURG3 (container row + tombstone row)
and meta-update.
after this commit /api/state is fetched exactly once per page
session (cold load); every other change rides the SSE channel.
drop the /api/state-file/check probe endpoint (which let any
dashboard visitor enumerate filesystem layout by feeding paths)
and the client's optimistic-then-downgrade dance. instead, the
broker forwarder calls scan_validated_paths(body) — same
allow-list helper as the read endpoint — and attaches the
verified file tokens to DashboardEvent::Sent/Delivered as
file_refs: Vec<String>. /dashboard/history backfill does the
same per-row.
client appendLinkified takes a (text, refs) pair, walks
left-to-right linkifying every occurrence of any ref token,
longest-first tie-break. no regex, no probe, no cache, no
queue. when refs is empty/absent the body emits as plain text
(question/answer/reminder rendering — refs for those are a
follow-up).
operator inbox stores file_refs from the sent event so its
renderer gets the same anchors as the message-flow terminal.
regex back to permissive ("looks like a path") — the server is
authoritative on whether each match is a file. anchors render
optimistically, paths queue for batch validation (50ms coalesce),
non-files downgrade to plain text + the sibling <details>
preview is dropped. session-scoped cache (pathValidity Map) so
repeated paths skip the roundtrip.
new endpoint POST /api/state-file/check accepts { paths } and
returns { results: {<path>: bool} }. shares resolve_state_path
helper with the read endpoint so security rules can't drift —
both refuse anything outside the allow-list, anything resolved
outside via symlink, or anything in a per-agent subdir other
than state/. capped at 64 paths/request.
drops the brittle client-side filename heuristic (the .ext-
required rule that missed README/Makefile and still matched bare
dirs without trailing slash). single source of truth.
new 'qu3u3d r3m1nd3rs' section between approvals and operator
inbox. lists every pending reminder with agent, due-relative
timestamp, body, payload path (path-linkified), and a cancel
button. drives off a new /api/reminders endpoint and a
POST /cancel-reminder/{id} that hard-deletes the row.
failure surface (last_error / attempt_count + retry) deferred —
needs a sqlite migration; tracked in TODO.md.
agents constantly emit pointer strings to /agents/<n>/state/foo.md
since broker bodies cap at 1 KiB. now those tokens linkify in the
message flow, question bodies, answer text, and operator inbox;
clicking expands an inline <details> that lazy-fetches via the
new /api/state-file?path=... endpoint.
endpoint allow-list: per-agent state dirs + shared docs, both
in their container-mount form (/agents/<n>/state, /shared) and
host form (/var/lib/hyperhive/...). 1 MiB read cap; canonicalises
before the prefix check so `..` / symlinks can't escape.
legacy bare `/state/...` is deliberately not matched — ambiguous
from the host's perspective (we'd need to know which agent the
message references to translate). agents should use the qualified
form going forward.
questions pane now shows both operator-targeted threads
(target IS NULL) and agent-to-agent threads (target = some
agent). filter chips above the list: all / @operator / @peer /
per-participant. peer rows get a mauve left rule + a 0V3RR1D3
button that POSTs the same /answer-question endpoint
(OperatorQuestions::answer already permits the operator as
answerer on any target).
wire changes: OperatorQuestions gains pending_all +
recent_answered_all; QuestionAdded + QuestionResolved events
carry target: Option<String>; emit sites drop their
target.is_none() guard. answered-history rows show the
answerer prefix so override answers are auditable at a glance.
new DashboardEvent::ContainerStateChanged + ContainerRemoved
close the last refetch loop on the dashboard. Coordinator's
rescan_containers_and_emit diffs a fresh container_view::build_all
against a cached last_containers map and fires per-row events.
called from actions::approve (post-spawn), actions::destroy,
the lifecycle_action wrapper, auto_update::rebuild_agent, and
the existing 10s crash_watch poll.
ContainerView extracted to its own module so coordinator and
dashboard can both build it. dashboard endpoints flip to 200;
container-lifecycle forms carry data-no-refresh. client drops
the periodic poll entirely — initial cold load + SSE for
everything afterwards. pending overlay reads from the existing
transientsState since the new event payload doesn't carry it.
PURG3 + meta-update keep the post-submit refetch since
tombstones + meta_inputs aren't event-derived yet; tracked in
TODO.md.
bare set_transient/clear_transient pairs leak the in-memory transient
on task cancellation, panics, or any early return between the two
calls — dashboard then shows the agent stuck in 'rebuilding…'
forever (coder hit this today). add Coordinator::transient_guard
returning a TransientGuard whose Drop clears, and convert every
caller (dashboard lifecycle_action, auto_update::rebuild_agent,
manager_server Update, actions::destroy, actions Spawn task,
migrate phase 4). destroy() now takes &Arc<Coordinator> so it can
hold a guard. existing stuck transients clear on next hive-c0re
restart since transient state is in-memory only.
agents weren't being woken with the 'you were rebuilt — check
/state/ for notes, --continue intact' system message after
several recent rebuild surfaces:
- auto_update::rebuild_agent — used by the dashboard rebuild
button, admin-CLI rebuild via lifecycle_action, the startup
rev-scan, AND the new meta-input update batch loop. kick
moves *into* rebuild_agent's success arm so all four
paths benefit. (the dashboard's lifecycle_action extra
closure was already firing kick — now it's a no-op for the
rebuild path since rebuild_agent does it.)
- actions::run_apply_commit — apply-commit approve flow built
+ tagged deployed/<id> but never kicked. add kick on
success with the more specific 'config update applied' hint.
- server.rs::HostRequest::Rebuild — the admin-CLI direct path
calls lifecycle::rebuild bypassing rebuild_agent. add kick
on success.
dashboard's restart / start lifecycle_action extras still
kick via their own closures since they don't route through
rebuild_agent. stop / kill / destroy intentionally don't
kick — there's nothing to wake.
read_meta_inputs() previously only included direct inputs of
meta's root node — so a manager-added 'inputs.mcp-matrix' in
agent-dmatrix's flake.nix never surfaced in the dashboard
panel even though it's a real fetched input that nix can
update.
now: BFS the flake.lock graph from root to depth 2. emits
one MetaInputView per fetched (non-follows) node, names are
slash-paths from root — 'hyperhive', 'agent-coder',
'agent-dmatrix/mcp-matrix', 'hyperhive/nixpkgs', etc. that's
the same syntax 'nix flake update' accepts for transitive
inputs, so the existing POST /meta-update path needs no
nix-side change.
depth limit of 2 keeps the panel readable — deeper transitives
(nixpkgs's own deps etc.) would explode it; bumping a level-2
entry re-fetches its sub-inputs anyway.
POST /meta-update's 'which agents to rebuild' derivation
updated for the slash names: anything under hyperhive/
fans out to all agents (shared base); 'agent-<n>/...' picks
out the agent name from before the first slash.
read_meta_locked_revs (used by the deployed:<sha> chip per
container) split out into its own straight root-input lookup
since the chip only cares about the agent's own input.
every snapshot source backing /api/state used .unwrap_or_default()
— sqlite errors, broker errors, nixos-container list failures,
operator_questions decode crashes all degraded to empty lists
without a log line. the 'pending question doesn't render'
bug we've been chasing was likely a row-decode panic in
OperatorQuestions::pending() being swallowed this way.
new log_default(what, result) replaces each call site: same
default value on Err but emits target=api_state warn with the
source name + dbg error first. five sources covered:
nixos-container list, approvals.pending,
approvals.recent_resolved, broker.recent_for(operator),
questions.pending. next time the question goes missing the
journal will say which source failed and how.
todo updated — pending-question entry now points at the new
log instead of three suspect paths.
new section 'M3T4 1NPUTS' between approvals and message flow:
one row per input in meta/flake.lock (hyperhive first, then
agent-<n> alphabetically). each row shows the input name, the
first 12 chars of the locked sha, a relative timestamp from
locked.lastModified, and the original.url when available.
checkbox per row; submit button is disabled until at least one
box is checked; submitting confirms then POSTs the selected
names to /meta-update.
backend:
- meta::lock_update(inputs: &[String]) — runs 'nix flake update
<names>' in the meta dir, commits the lock change with a
combined message ('lock update: hyperhive, agent-coder').
preserves the existing META_LOCK serialization. existing
lock_update_for_rebuild / lock_update_hyperhive stay for
their single-input callers.
- POST /meta-update — comma-separated 'inputs' form field
(JS joins checkboxes since axum::Form doesn't natively
decode repeated keys); spawns a background task that runs
the lock update + per-agent rebuild loop. hyperhive
selection fans out to all agents; agent-<n> selection only
rebuilds <n>. each rebuild fires Rebuilt to the manager
exactly like dashboard / admin-CLI / auto-update.
rebuild loop is sequential — auto_update::run too (was
parallel via tokio::spawn). parallel rebuilds collide on
nix-store's sqlite cache ('sqlite db busy, not using cache')
and the meta META_LOCK contention. nix-daemon serializes the
heavy build steps anyway, so this isn't a throughput loss.