The retry was capped at 12 attempts (~20s of exponential backoff
capped at 2s). Two back-to-back nspawn restarts in #324 left the
previous socket holding the port longer than that budget; once the
cap fired, the web-UI task returned an error and silently died for
the rest of the process lifetime — the agent kept running fine
otherwise (MCP, turn loop), but the operator's dashboard click
hit nothing.
Genuine port collisions are preflighted host-side
(lifecycle::{spawn,rebuild}) and surfaced as a port-conflict banner,
so at this layer a persistent AddrInUse always reflects a
recoverable stale socket. Drop the cap, keep retrying forever with
the same 2s-capped backoff. WARN for the first dozen attempts so a
normal restart-race is visible; INFO after that to avoid spamming
the journal during a long stale-socket hold. Logs a one-line INFO
on success when we did have to retry, so post-mortems can find the
attempt count.
Closes#324.
The previous take put a shared NavLink wire type in hive-sh4re and
duplicated the link-building logic across crates. Per @mara on #326:
that doesn't fit the eventual frontend/backend split goal (#273).
The agent backend is the natural source of truth for what links its
own page exposes; hive-c0re just passes the list through to the
dashboard.
* hive-ag3nt/src/web_ui.rs: agent_links now also serves the
config-repo link + reads agent-declared dashboardLinks extras
from {state_dir}/hyperhive-dashboard-links.json. AgentLink gains a
kind enum (Container | Forge | External) so the frontend can build
the right href no matter which surface is rendering. The host
header is no longer used — URLs are paths for Container/Forge,
absolute for External.
* hive-c0re/src/dashboard.rs: new GET /api/agent/{name}/links route,
a same-origin proxy that fetches the agent's /api/state and
forwards just the links field. No shared wire type — hive-c0re
treats the payload as opaque JSON (serde_json::Value). All failure
modes degrade to an empty list so the dashboard still renders.
* hive-c0re/assets/app.js: container card head row gets an async-
populated icon-only nav strip from the proxy. The hardcoded stats
link, the standalone config-repo trigger, and the extras block are
gone. The deployed:<sha> chip stays — the agent harness can't know
its own deployed sha, so this chip is how the operator sees what's
live alongside the agent's (root-only) config link.
* hive-ag3nt/assets/app.js: agent page meta-links rendered via
el() / textContent (DOM build) so agent-declared icon / label / url
strings never reach innerHTML. kind-based href resolution mirrors
the dashboard side.
* docs/web-ui.md: dashboard + per-agent sections updated for the new
architecture.
Closes#262.
the stream-json result event carries modelUsage.<model>.contextWindow
which is the actual per-inference active window the model enforces.
for claude-sonnet-4-6 this is 200k even though the full prompt cache
can hold millions of tokens via accumulated cache reads.
with the nix-configured sonnet = 1000000 the proactive compact watermark
sat at 750k and was never reached. agents grew context until prompt_too_long
at ~170k — reactive compact, no checkpoint turn.
changes:
- bus gains api_context_window field seeded from modelUsage.*.contextWindow
in each turn's result event. authoritative; falls back to env var, then 200k.
- new effective_context_window(bus) helper used by both watermark functions
- compact_watermark (75%) and auto_reset_watermark (50%) call effective_context_window
- context_tokens() docstring clarified: all three token fields (input +
cache_read + cache_creation) count against the per-inference contextWindow
limit. the large cache_read values seen in the result event are cumulative
across all inferences in a turn, not per-inference.
- /api/state context_window_tokens now reflects the calibrated window
closes#129
Per review: build the full forge profile URL in the harness instead
of the client. /api/state now returns forge_url: Option<String>
(assembled from the request Host header — resolves against whatever
host the operator reached the page on), replacing the forge_present
bool. The JS just links forge_url when present — no client-side URL
construction.
Add a '⬡ forge ↗' link to the per-agent page's meta row, next to
the stats + screen links. It opens the agent's Forgejo profile
(http://<host>:3000/<label> — the per-agent forge user is named
after the agent) in a new tab.
- web_ui.rs: StateSnapshot gains forge_present, true when the
agent's forge-token file exists in the state dir (same signal
that tells the agent it has a forge account).
- index.html / app.js: hidden link, shown + href-filled when
forge_present, mirroring the existing gui_enabled/screen-link
pattern. Host comes from window.location so it works off
whatever host the page is served from.
closes#185
PR #171 made relayoutCanvas() set canvas.style.width/height
explicitly, but the canvas is a flex item: flex items default to
min-width/min-height: auto, which resolves to the canvas's intrinsic
framebuffer resolution and clamps the JS-set display size right back
up to native — so fit mode still did nothing (#133, "still same
behavior").
Add min-width/min-height: 0 (+ flex: none) on the canvas in fit mode
so the explicit downscaled size actually sticks. Scoped to
#canvas-wrap.fit so non-fit mode keeps native size + scroll.
re #133
The fit toggle relied on canvas.fit { max-width/max-height: 100% }.
The canvas is a flex item, and a flex item's automatic minimum size
(min-width/min-height: auto → the replaced element's intrinsic size)
overrides max-*, so an oversized desktop never shrank — it just got
centred and clipped, looking like the toggle did nothing.
Replace the CSS approach with explicit JS sizing: relayoutCanvas()
computes a scale factor (min of width/height ratios, capped at 1 so
it never upscales) and sets canvas.style.width/height in px,
preserving aspect ratio. Recomputed on fit toggle, window resize,
and framebuffer-size (ServerInit). Pointer mapping is unaffected —
sendPointer already derives scale from getBoundingClientRect().
recv tool-use rows rendered as a bare recv() regardless of args,
hiding whether a turn is parked on a long-poll (wait_seconds) or
draining a burst (max). fmtToolUse now surfaces both. Bash rows
gain a [bg] flag when run_in_background is set.
closes#158
Consumes the GET /icon endpoint from #139:
- Dashboard: each container card shows the agent's icon next to its
name (26px). Loaded from <agent-url>/icon; onerror hides it for a
stopped container whose web server isn't answering.
- Per-agent web UI: the agent's icon next to the page title (40px),
and /icon as the favicon on the index, stats, and screen pages.
/icon always returns an image (configured SVG or the default
hyperhive logo), so no presence check is needed.
Closes#140
Foundation for the per-agent icon feature (#137).
- harness-base.nix: new hyperhive.icon option (nullable path to an
SVG). An agent commits an SVG into its config repo and references
it as ./icon.svg; when set it lands at /etc/hyperhive/icon.svg.
- web_ui.rs: GET /icon serves the configured SVG, falling back to the
bundled hyperhive logo when none is set — so it always returns an
image and consumers can hit it unconditionally.
Closes#139
Adds a 'fit' toolbar toggle on the /screen VNC viewer. When on (the
default), the canvas is CSS-scaled down to fit the browser viewport
preserving aspect ratio — no more scrolling a desktop larger than the
window. When off, the canvas renders at native resolution with scroll.
The choice persists in localStorage.
The canvas's intrinsic resolution is never touched — only its display
size. sendPointer now rescales client coordinates by the canvas
display-vs-intrinsic ratio so clicks stay accurate in fit mode (this
also fixes a latent off-by-scale bug). Toolbar buttons share a .tbtn
class for consistent styling.
Issue #133 (fit toggle now; dynamic desktop resize tracked separately).
CountPendingReminders and ReminderRollup were hardcoded to
MANAGER_AGENT. Both now take agent: Option<String> — None keeps the
current behavior (manager's own), Some(name) returns that agent's
reminder stats. The broker functions already take an agent name, so
this is a thin wire-protocol change. Callers (web UI stats page,
post-turn counts) pass None.
Closes#122
PR #119 changed get_loose_ends to default to the manager's own items
and accept an optional agent arg ("*" = hive-wide, "<name>" = one
agent), but the manager system prompt still described it as
"hive-wide ... no args". docs/turn-loop.md was already updated;
only manager.md was missed.
Two RFB protocol bugs in the /screen client:
1. ServerInit parsed framebuffer-height from byte offset 1 instead of
2. RFB 3.8 ServerInit is width@0-1, height@2-3; u16be(b,1) mixes
the low byte of width with the high byte of height. The wrong
height went into the initial non-incremental FramebufferUpdate-
Request; neatvnc feeds the client's width/height straight into
pixman_region_union_rect un-clipped (src/server.c), so an
out-of-range height corrupts the server damage region and pixman
reports 'Invalid rectangle passed'.
2. FramebufferUpdate handler had a stray drainTo(1) after the 3-byte
padding+nRects header, consuming the first byte of the first
rectangle and desyncing every update.
Closes#128
Two bugs found via the weston journal (issue #92):
1. MD5 rotation index used (j>>2 & 3) for the round number, which
cycles every 4 steps instead of every 16. Verified against RFC 1321
test vectors: md5("") was 7a1dce5b... instead of d41d8cd9... — the
derived AES key was wrong, so the server decrypted the credentials
to garbage. Fixed to j>>4.
2. weston's vnc_handle_auth calls getpwnam(username) and requires
pw_uid == weston's own uid before PAM is consulted. We sent an empty
username, which fails outright ("VNC: wrong user"). weston runs as
root, so send username "root"; the empty password still passes via
pam_permit.so on the weston-remote-access service.
Fixes#92
GetLooseEnds now takes agent: Option<String>:
- None = manager's own loose ends (default; the bug fix)
- Some("*") = hive-wide view (every approval/question/reminder)
- Some("name") = that agent's loose ends
The get_loose_ends MCP tool exposes this as an optional agent arg, so
the manager can still scan the whole swarm on demand. The web UI and
post-turn counts pass None (manager's own).
rfb_apple_dh_client_msg has encrypted_credentials at offset 0 and
public_key as a flexible array at offset 128. We were sending them
in the wrong order (pub_key first), so neatvnc decrypted the wrong
bytes as credentials and sent the wrong bytes as the DH public key,
causing a mismatched shared secret and SecurityResult=1.
Fixes#92