The retry was capped at 12 attempts (~20s of exponential backoff
capped at 2s). Two back-to-back nspawn restarts in #324 left the
previous socket holding the port longer than that budget; once the
cap fired, the web-UI task returned an error and silently died for
the rest of the process lifetime — the agent kept running fine
otherwise (MCP, turn loop), but the operator's dashboard click
hit nothing.
Genuine port collisions are preflighted host-side
(lifecycle::{spawn,rebuild}) and surfaced as a port-conflict banner,
so at this layer a persistent AddrInUse always reflects a
recoverable stale socket. Drop the cap, keep retrying forever with
the same 2s-capped backoff. WARN for the first dozen attempts so a
normal restart-race is visible; INFO after that to avoid spamming
the journal during a long stale-socket hold. Logs a one-line INFO
on success when we did have to retry, so post-mortems can find the
attempt count.
Closes#324.
The previous take put a shared NavLink wire type in hive-sh4re and
duplicated the link-building logic across crates. Per @mara on #326:
that doesn't fit the eventual frontend/backend split goal (#273).
The agent backend is the natural source of truth for what links its
own page exposes; hive-c0re just passes the list through to the
dashboard.
* hive-ag3nt/src/web_ui.rs: agent_links now also serves the
config-repo link + reads agent-declared dashboardLinks extras
from {state_dir}/hyperhive-dashboard-links.json. AgentLink gains a
kind enum (Container | Forge | External) so the frontend can build
the right href no matter which surface is rendering. The host
header is no longer used — URLs are paths for Container/Forge,
absolute for External.
* hive-c0re/src/dashboard.rs: new GET /api/agent/{name}/links route,
a same-origin proxy that fetches the agent's /api/state and
forwards just the links field. No shared wire type — hive-c0re
treats the payload as opaque JSON (serde_json::Value). All failure
modes degrade to an empty list so the dashboard still renders.
* hive-c0re/assets/app.js: container card head row gets an async-
populated icon-only nav strip from the proxy. The hardcoded stats
link, the standalone config-repo trigger, and the extras block are
gone. The deployed:<sha> chip stays — the agent harness can't know
its own deployed sha, so this chip is how the operator sees what's
live alongside the agent's (root-only) config link.
* hive-ag3nt/assets/app.js: agent page meta-links rendered via
el() / textContent (DOM build) so agent-declared icon / label / url
strings never reach innerHTML. kind-based href resolution mirrors
the dashboard side.
* docs/web-ui.md: dashboard + per-agent sections updated for the new
architecture.
Closes#262.
Per @mara on #328: the hand-rolled encoder was over-cautious. Swap
for base64 = 0.22 from crates.io — a standard, widely-trusted dep,
no maintenance surface to carry. Drops the 15-line encoder and its
two RFC 4648 unit tests.
The 'core' Forgejo user (hive-c0re's identity for commits in
core/meta + agent-configs/*) was showing the default hash identicon.
Adds a one-shot ensure_core_avatar in the ensure_all bootstrap that
POSTs the branding PNG to the admin avatar API and writes a marker
file (CORE_AVATAR_MARKER) so subsequent startups skip the call
(delete the marker to re-upload). Best-effort: a non-2xx is logged
and swallowed, doesn't gate startup.
PNG bytes baked in via include_bytes! from branding/hyperhive.png.
Base64 is hand-rolled (one small image in one cold path, not worth
a new workspace dep) with RFC 4648 §10 test vectors.
Closes#320.
PR #321 changed the model priority to: HIVE_DEFAULT_MODEL (from nix config)
> persisted runtime choice > compiled-in DEFAULT_MODEL. The README now
clarifies that the nix config takes precedence and runtime overrides are
reset on rebuild.
Remove the depth-2 cap in walk_meta_inputs so every fetched input
at every depth is surfaced, not just two levels (issue #275). The
uncapped walk needs a guard: a visited-node set makes it a spanning
tree — each fetched node walked once, at its shallowest path — so
shared subtrees don't re-walk and a cycle can't recurse forever.
A two-pass walk (claim a node's direct inputs before descending)
keeps shallow inputs at a shallow path.
Frontend: renderMetaInputs indents each row by its slash-path depth
and shows the leaf segment (full path on hover), plus a select-all /
select-none control so a long input list isn't ticked box by box.
the stream-json result event carries modelUsage.<model>.contextWindow
which is the actual per-inference active window the model enforces.
for claude-sonnet-4-6 this is 200k even though the full prompt cache
can hold millions of tokens via accumulated cache reads.
with the nix-configured sonnet = 1000000 the proactive compact watermark
sat at 750k and was never reached. agents grew context until prompt_too_long
at ~170k — reactive compact, no checkpoint turn.
changes:
- bus gains api_context_window field seeded from modelUsage.*.contextWindow
in each turn's result event. authoritative; falls back to env var, then 200k.
- new effective_context_window(bus) helper used by both watermark functions
- compact_watermark (75%) and auto_reset_watermark (50%) call effective_context_window
- context_tokens() docstring clarified: all three token fields (input +
cache_read + cache_creation) count against the per-inference contextWindow
limit. the large cache_read values seen in the result event are cumulative
across all inferences in a turn, not per-inference.
- /api/state context_window_tokens now reflects the calibrated window
closes#129
extract per-agent forge logic from ensure_all() into sync_agent()
so both the startup sweep and rebuild_agent call identical code.
rebuild now runs: ensure_user_for + ensure_config_repo + push_config
+ meta_read_access + ensure_meta_remote — same as the boot sweep.
missing tokens and drift in any forge state are fixed by rebuild,
not just hive reboot.
if forge_after_first_spawn fails transiently on first spawn the
token is missing. rebuild_agent now calls ensure_user_for so
a manual rebuild (or the startup auto-update scan) recovers
the missing token — no full hive reboot needed.
post_meta_update returns 200 immediately and runs the nix flake
update + agent-rebuild ripple in a background task, so the META
INPUTS panel looked idle for the whole multi-minute window (#259).
Track in-flight runs with a Coordinator atomic counter, exposed via
an RAII MetaUpdateGuard held across run_meta_update. Surface it as
the meta_update_running snapshot field plus a MetaUpdateRunning SSE
event (flipped only when the count crosses 0, so concurrent runs
flip the flag once). The panel shows a pulsing in-progress banner
and disables the update button while a run is active.