bus carries a one-shot AtomicBool armed by POST
/api/new-session (or the /new-session slash command). next
turn drops --continue, starting a fresh claude session; the
flag clears automatically so subsequent turns resume normal
behavior. /compact still always uses --continue — compacting
a non-existent session is a no-op anyway.
per-agent page grows an ↻ new session button next to the
cancel-turn one (always visible, amber, confirms before
posting since dropping --continue context isn't reversible).
slash-command surface picks up /new-session for parity with
the button. note row emitted on the live feed both at arm-
time and again when the turn actually consumes the flag, so
the operator can confirm it landed.
claude.md flips 'in flight' → 'just landed' for the meta
overhaul + extends the file map with meta.rs and migrate.rs.
docs/approvals.md replaces the in-flight callout with a
proper 'Meta flake' section (two-phase deploy walkthrough,
sync_agents semantics, single-phase variants), updates the
two-repo box diagram to include the /var/lib/hyperhive/meta/
tree and tracks flake.nix in applied, rewrites the
container --flake reference to meta#<name>, replaces the
'Manager view of applied' section with a unified
'/agents + /applied + /meta' inventory listing every useful
git incantation, and explains the in-place no-state-loss
migration that now runs on hive-c0re startup.
docs/persistence.md grows entries for the meta repo + the
.meta-migration-done marker. readme box diagram picks up the
/meta RO bind; approval-flow paragraph rewritten end to end
to describe the meta lock dance.
lifecycle::flake_base deleted — the meta render hardcodes
the manager vs agent-base choice as nix expression.
ContainerView grows deployed_sha (first 12 chars of the rev
that /var/lib/hyperhive/meta/flake.lock currently has locked
for agent-<name>). renderContainers appends a 'deployed:<sha12>'
chip next to the container name + port — title attribute
explains it's the meta-lock sha. degrades gracefully when the
meta repo isn't seeded yet (missing / unparsable lock = empty
map = no chip). new read_meta_locked_revs helper does the JSON
parsing without unwraps.
agent.nix becomes a plain NixOS module function — flake.nix is
fixed boilerplate the manager mustn't edit; meta flake at /meta/
owns the wrapper. proposed repos ship with an 'applied' remote
pre-wired, so 'git fetch applied' / 'git log applied/main' /
'git show applied/refs/tags/deployed/<id>' all just work without
constructing paths by hand. /meta/ exposes the system-wide
deploy log (git log /meta) + flake.lock for cross-agent sha
introspection.
new migrate module runs before auto_update on hive-c0re boot.
four idempotent phases:
1. for every applied/<n>/ whose flake.nix isn't already the
module-only boilerplate, rewrite + commit + relocate
deployed/0 to HEAD so setup_applied's existence check passes
2. for every proposed/<n>/config without an 'applied' remote,
wire it (delegates to setup_proposed which is now
idempotent and adds the remote itself)
3. meta::sync_agents over the current container list — inits
the meta repo on first call, rerender + relock if drifted
4. nixos-container update <c> --flake meta#<name> for every
container, guarded by /var/lib/hyperhive/.meta-migration-done
so phase 4's expensive eval only runs once across restarts
env kill-switch HIVE_SKIP_META_MIGRATION=1 defers the whole
thing. each agent's failure is logged + skipped so one broken
agent doesn't block the rest. runs ahead of ensure_manager so
the manager auto-spawn comes up against meta from the first
attempt.
auto_update::run now calls meta::lock_update_hyperhive once
up-front so the per-agent rebuilds it kicks off rebuild against
the new base. lifecycle::rebuild already drives sync_agents +
lock_update_for_rebuild per agent, so the rev-marker shortcut
keeps its meaning ('we've ack'd this rev for this agent')
without further plumbing. failures of the hyperhive lock bump
log + continue — individual rebuilds will surface concrete
errors if anything's really wrong.
approval-driven deploys now walk the meta flake via
prepare_deploy / finalize_deploy / abort_deploy so a failed
build leaves no commit in meta's deploy log:
1. capture applied/main sha for rollback
2. tag approved/<id> + building/<id>
3. ff applied/main to proposal/<id>, read-tree sync working tree
4. meta::prepare_deploy(name) — nix flake lock --update-input
agent-<n> without committing
5. lifecycle::rebuild_no_meta — container-level only (new
extracted helper; public lifecycle::rebuild still wraps it
with single-phase meta sync + commit for dashboard / auto
_update callers that don't care about rollback)
6a. on success: tag deployed/<id>, meta::finalize_deploy commits
the staged lock with 'deploy <n> deployed/<id> <sha12>'
6b. on failure: tag failed/<id> annotated with the build error,
git_update_ref applied/main back to prev sha, read-tree to
main, meta::abort_deploy git-restores flake.lock
meta's git log now records only successful deploys; failures
+ denials still live in applied as annotated tags.
once nixos-container destroy lands + per-agent state cleanup is
done, rerender the meta flake from the remaining containers so
the destroyed agent's input + nixosConfiguration drop off and
its flake.lock entry vanishes. log + keep going on meta-sync
failure — the destroy already succeeded at the lifecycle level,
so meta drift here is just bookkeeping. new public
lifecycle::agents_for_meta_listing exposes the agent
enumeration for callers outside the module.
rebuild now does sync_agents (idempotent — no-op when the
rendered flake matches disk; recovers from a divergent meta
repo on the side) followed by lock_update_for_rebuild which
relocks just this agent's input and commits the lock change
if any. flake ref for nixos-container update flips from
applied/<n>#default to meta#<name>. new helper
meta::lock_update_for_rebuild is single-phase (no separate
finalize): rebuild has no failure-revert semantics — it always
wants the latest applied/<n>/main. spawn already syncs meta
before container create; rebuild now picks up the meta side
on every manual ↻ R3BU1LD.
after setup_proposed + setup_applied, spawn now syncs the meta
flake (one input + one nixosConfiguration per agent) so
`--flake /var/lib/hyperhive/meta#<name>` resolves before
nixos-container create runs. flake ref switches from
applied/<n>#default to meta#<name>; the wrapper modules
(identity, HIVE_PORT, HIVE_LABEL, HIVE_DASHBOARD_PORT) now
live in the meta flake's mkAgent. new helper agents_for_meta
builds the AgentSpec list by enumerating containers + optionally
appending a not-yet-present name for the spawn case. spawn
keeps its caller signature; rebuild + auto_update get wired up
in follow-up commits.
setup_proposed now lands a git remote named 'applied' on every
proposed/<n>/config pointing at /applied/<n>/.git — the path as
seen from inside the manager container, where the RO bind in
set_nspawn_flags makes the URL resolve. From the manager:
git fetch applied
git log applied/main
git show applied/refs/tags/deployed/<id>
git diff applied/main HEAD
git rebase applied/main
all work without manually constructing the path each time. The
RO bind blocks push at the kernel level so the remote can only
fetch. Idempotent — also applied to pre-existing proposed repos
(no-op if the remote is already correct, set-url if drifted)
so the startup migration picks up the wiring on existing
agents.
set_nspawn_flags now adds a third manager-only bind alongside
/agents (RW) and /applied (RO): --bind-ro=/var/lib/hyperhive/meta
:/meta. manager can git log /meta to see every deploy across the
swarm and cat /meta/flake.lock to introspect which sha each agent
is currently pinned at. defensive create_dir_all on the host
side so a cold start with no agents (meta repo not yet seeded)
doesn't trip systemd-nspawn's missing-bind-source check before
the migration plants the dir.
leaf module with no runtime callers yet (every public item is
#[allow(dead_code)] until lifecycle / actions / auto_update
rewire to use it). API surface:
- sync_agents — idempotent: render flake.nix for the given
agent set, git-init on first call, nix flake lock, commit if
anything changed.
- prepare_deploy / finalize_deploy / abort_deploy — two-phase
for the request_apply_commit path. prepare runs nix flake
lock --update-input agent-<n> without committing; finalize
commits with a 'deploy <n> deployed/<id> <sha12>' message;
abort git-restores the lock so a failed build leaves no
orphan commit.
- lock_update_hyperhive — one-shot for the auto-update path.
flake.nix template defines mkAgent that pulls each agent's
nixosModules.default from its input and wraps with the
identity / HIVE_PORT / HIVE_LABEL / HIVE_DASHBOARD_PORT
module — what setup_applied used to generate inline. nix
invocations carry --extra-experimental-features as a belt
in case flakes aren't enabled in nix.conf.
setup_proposed now seeds both agent.nix (a regular NixOS module
function) and flake.nix (boilerplate exporting nixosModules.default
= import ./agent.nix) into the manager-editable proposed repo,
committed together. setup_applied's hyperhive_flake + dashboard
port wrapper generation is deleted entirely — the meta flake at
/var/lib/hyperhive/meta/ now owns the wrapper module. setup_
applied just fetches proposed's main on first spawn and tags
deployed/0; subsequent rebuilds touch nothing in applied that
the manager didn't author. spawn + rebuild keep their old param
list with the now-unused hyperhive_flake + dashboard_port
underscored — call sites get cleaned up after the meta module
lands and consumes them.
scratchpad in claude.md and an in-flight callout at the top of
docs/approvals.md describe the upcoming overhaul so subsequent
commits can cite the design. covers: module-only agent flake
shape, /var/lib/hyperhive/meta/ as a hive-c0re-owned single
repo, applied remote pre-wired in proposed for manager git
plumbing, /meta RO bind for the system-wide deploy log,
auto-migration on hive-c0re startup with HIVE_SKIP_META_MIGRATION
kill-switch.
approval_diff now runs git diff refs/heads/main..refs/tags/
proposal/<id> against the applied repo instead of cobbling a
single-file diff from proposed. consequences: multi-file
proposals show every change, manager amendments in proposed
cannot lie about what'll be deployed, no-op proposals render
an explicit '(proposal matches currently-deployed tree)'.
displayed sha prefers fetched_sha (hive-c0re-vouched) and
falls back to commit_ref only for the brief pre-fetch window.
unified_diff helper + similar dep dropped — git diff is the
source of truth now. dead-code allows on the lifecycle git
helpers + approvals.set_fetched_sha come off since all are
wired up. readme picks up the tag flow + /applied RO mount.
manager prompt: explain that arbitrary files now travel with
the proposal, document the /applied/<n>/.git RO mount and the
tag scheme (git show applied/deployed/<id> etc.), call out
that applied/main only advances on deployed so a failed build
isn't terminal. approvals.md: drop the old per-agent
applied.git phrasing in favour of the single /applied RO
bind, mention both manager binds together. claude.md
scratchpad flips from in-flight to just-landed.
set_nspawn_flags now adds --bind-ro=/var/lib/hyperhive/applied
:/applied for the manager container alongside the existing
/agents RW mount. manager can git-fetch deployed/failed/denied
tags out of /applied/<n>/.git to mirror them into its proposed
clones; the read-only bind means git plumbing inside the
container cannot corrupt the authoritative repos. picked up by
the next rebuild of hm1nd (no spawn-time change needed since
set_nspawn_flags runs on every spawn + rebuild).
apply-commit denials now leave a git object behind: tag
denied/<id> annotated with the operator's note (or empty body
if they didn't supply one) at proposal/<id> inside the applied
repo. rejected configs become first-class git history — git
show denied/<id> in the manager's applied.git mount yields the
tree the operator rejected plus the reason. helper event
carries the tag for parity with deployed/failed. spawn denials
fall through unannotated since they have no proposal commit.
deny becomes async (single git plumbing call); dashboard +
admin-socket callers grow .await.
run_apply_commit walks the approval through the tag state
machine in applied: approved/<id> + building/<id> stamped
before the build, then git read-tree --reset to proposal/<id>
populates the working dir without moving HEAD. on rebuild
success deployed/<id> is planted and refs/heads/main fast-
forwards to the proposal. on failure failed/<id> is annotated
with the build error and the working tree resets back to main
so the agent stays evaluable. helper events Rebuilt +
ApprovalResolved both carry the terminal tag so the manager
can git-show the exact tree (and read the failure note from
an annotated tag) against its read-only applied.git mount.
finish_approval grows a terminal_tag param; spawn path passes
None. lifecycle::apply_commit deleted.
submit_apply_commit (1) queues the approval row, (2) git-fetches
the manager-supplied sha from proposed into applied, pins it as
refs/tags/proposal/<id>, (3) persists the resolved sha on the
row via approvals.set_fetched_sha. from this point on the
proposal is immutable from the manager's perspective: amends
or force-pushes in proposed do not change what hive-c0re will
build. fetch failures mark the row failed and surface the error
to the manager so a phantom pending entry can't linger.
new shape: applied is git-init'd at first spawn, fetches
proposed's initial commit into its main, tags deployed/0 there.
the wrapper flake.nix is regenerated on every spawn/rebuild
but no longer tracked — apply churn vanishes, manager-authored
files in the proposal flow now survive untouched. setup_applied
gains an Option<&Path> for proposed (None on rebuild paths
that just refresh the flake). pre-overhaul applied dirs are
detected via the missing deployed/0 tag and bail loudly with
the destroy --purge migration hint. apply_commit is stubbed
with a clear error until the tag-driven approve flow lands.
new plumbing for the upcoming flow: git_fetch_to_tag (pulls a
sha from proposed into applied and pins it as a tag in one
shot), git_rev_parse (normalises shas + reads back tag
targets), git_tag / git_tag_annotated (lightweight vs body-
carrying for failed/denied), git_read_tree_reset (replace
working tree without moving HEAD — lets main stay on last
known-good across an in-flight build), git_update_ref (ff main
on deploy). annotated tag bodies go via stdin to avoid escape
games. all dead-code-allowed; callers land in subsequent
commits.
new column fetched_sha records the canonical sha hive-c0re
plans to fetch from the proposed repo into applied at submit
time. distinct from commit_ref (manager-supplied, may be
amended out from under the queue). set_fetched_sha is unused
until manager_server wires the fetch step next commit.
approval grows fetched_sha (canonical hive-c0re-vouched sha,
distinct from manager-supplied commit_ref). helperevent
{approvalresolved,spawned,rebuilt} grow optional sha + tag so
the manager can git-show the exact tree it's hearing about
(against the upcoming /agents/<n>/applied.git RO mount) and
know which terminal tag landed. all serde-defaulted; existing
construction sites pass none until the tag-driven flow lands.
scratchpad in claude.md marks this as in-flight; docs/approvals.md
gets the new tag state machine (proposal/approved/building/deployed/
failed/denied) and the manager applied.git read-only mount. todo
picks up the unprivileged-containers git-identity caveat and a web
ui for config repos as a downstream follow-up.
readme: manager mcp surface picks up update; operator-surface
recap mentions /model + last-turn + model chip + the three
collapsibles (inbox / journald / agent.nix).
web-ui.md: details-restore-key story under shape; port-conflict
banner mention on containers; agent.nix viewer alongside journald;
notifications use per-event tags + console.debug log on
block/show; deny endpoint takes note=<reason>; data-prompt /
data-prompt-field generalisation noted.
conventions.md: data-prompt and snapshot/restoreOpenDetails added
to the async-forms section.
persistence.md: operator_questions row picks up deadline_at (ttl)
column with a migration note.
todo.md: new 'Bugs' section captures the manager-question
not-rendering issue with three suspect paths to chase.
claude.md scratchpad rewritten as a clean handoff for the
compaction + the upcoming config-git overhaul. flags the
two-repo (proposed/ + applied/) split as the thing to
reconsider.
manager is fixed at 8000, sub-agents are 8100-8999, so collisions
are strictly between two sub-agents hashing to the same value.
the colliding container's harness restart-loops on AddrInUse —
which the user just hit on :8945. previously the only sign was a
buried journalctl warn line.
now surfaced two ways:
- lifecycle::spawn / rebuild preflight: walks the live container
list, computes each agent's hashed port, refuses with
'port N already taken by <other> — rename one of them' if any
running sub-agent shares the new agent's port. so the operator
sees an actionable error in the dashboard's transient pill /
approve-result instead of waiting for the harness to die.
- /api/state grows a port_conflicts: [{port, agents: [...]}]
array; dashboard renders a pulsing red banner above the
containers list listing each cluster. matches the questions
panel pulse so it's hard to miss.
clicking DENY on the dashboard now prompts for an optional reason
('reason for denying (optional, sent to manager):'). the value
rides along as a hidden 'note' form field; backend chain:
POST /deny/{id} { note }
→ actions::deny(coord, id, Some(note))
→ Approvals::mark_denied writes it to the row
→ HelperEvent::ApprovalResolved { ..., note: Some("...") }
manager already had note: Option<String> on the event, just never
populated for denials before. host admin socket (hive-c0re deny)
still passes None.
generalized the prompt-on-submit pattern: any form with a
data-prompt attribute pops a window.prompt() before the POST and
stashes the answer in a hidden input named by data-prompt-field
(default 'note'). reusable for future opt-in note fields.
new GET /api/agent-config/{name} returns the contents of
/var/lib/hyperhive/applied/<name>/agent.nix — the file the
container actually builds against. validated against the live
container list to avoid arbitrary filesystem reads.
frontend mirrors the journald viewer: collapsed <details> on each
container row, lazy-fetches on expand, refresh button re-fetches.
restore-keyed (agent-config:<name>) so it survives the dashboard
heartbeat refresh.
read-only — mutating the applied config goes through the existing
request_apply_commit + operator approval flow.
crash_watch grows two more state-axes alongside running/stopped:
- logged-in (claude session dir populated for the agent)
- up-to-date (recorded flake rev matches current)
per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn /
NeedsUpdate. seed-on-first-tick semantics retained — nothing fires
on harness boot for agents that were already in their state. only
needs_update fires the 'stale appeared' direction; the resolved
direction is already covered by Rebuilt.
new mcp__hyperhive__update(name) on the manager surface: idempotent
rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding)
so the dashboard shows the spinner. login intentionally has NO tool
— it's interactive OAuth, only the operator can complete it.
prompts + approvals doc + turn-loop doc updated. todo grows a
'show per-agent applied config in dashboard' entry (separate
follow-up).
generalises the focus-preservation pattern to expanded details
sections (journald viewer was collapsing on every 5s refresh; same
issue for approval diff blocks). before re-render we snapshot
which <details data-restore-key=...> are open; after render we
re-apply. setting .open = true programmatically also fires the
toggle event, so journald's lazy-fetch listener re-runs cleanly.
tagged: journal:<container>, approval-diff:<id>. anything else
that should survive a refresh just needs a stable data-restore-key
attribute.
send was truncating to 80 chars in the tool_use row, hiding
anything past the first sentence. now renders as a collapsed
<details> like Write/Edit — summary still shows the recipient +
headline (so the operator can scan), expanding reveals the full
body unchanged.
recv side was already covered: the wake prompt shows the full
incoming body, and explicit recv() tool_result rows expand to
the full text via the existing collapsed-results path.
bug: all notifications used tag='hyperhive', so each new fire
replaced the previous — operator only ever saw one at a time and
might miss the fact that a second arrived. now per-event tags
(hyperhive:approval:<id>, hyperhive❓<id>,
hyperhive:msg:<at>:<rand>) so distinct events stack in the OS
notification center.
dropped the bogus icon (was pointing at dashboard.css) — some
browsers refuse to display a notification with an invalid icon.
added console.debug at every block point (not supported, permission
not granted, muted) and a 'shown' log on success, so the operator
can see in the browser console exactly why a notification didn't
fire.
note for the operator: most browsers also suppress notifications
while the originating tab is FOCUSED. that's a browser-level
decision, not ours.
revert the earlier 'operator must set allowUnfree' move:
per-agent containers evaluate their own nixpkgs and the operator's
host-level allowUnfree doesn't propagate in. restoring the scoped
allowUnfreePredicate inside both the claude-unstable overlay and
harness-base.nix; documented in README + gotchas as 'nothing to
set on the operator side'.
docs:
- claude.md file map adds crash_watch.rs, kick_agent on coordinator,
/api/model + journald viewer + bind-with-retry references.
- scratchpad rewritten to reflect the recent run.
- web-ui.md: notification row + browser notifications section,
state row (badge + model chip + last-turn chip + cancel button),
per-agent inbox, /model slash, /cancel-question + journald
endpoints, focus-preservation on refresh.
- turn-loop.md: --model is read from Bus::model() per turn (runtime
override via /model); recv(wait_seconds) up to 180s with the
rationale; ask_operator gains ttl_seconds; new TurnState section;
kick_agent inbox-on-startup hint.
- approvals.md: ttl/cancel resolution paths for operator questions.
- persistence.md: /state/hyperhive-model file.
- gotchas.md: web UI port collision policy (rename, don't probe);
bind retry + SO_REUSEADDR shape; auto-unfree restored.
- todo.md: cleaned up empty sections and stale entries; /model
shipped, dropped from the list.
every refreshState tick does root.innerHTML = '' across the managed
sections, which destroys any focused input. detect the case before
re-rendering: if document.activeElement is an INPUT / TEXTAREA /
SELECT inside one of the managed sections, skip this tick and try
again in 2s. eventually the operator blurs and the refresh lands.
managed section ids: containers / tombstones / questions / inbox /
approvals. msgflow + message-flow SSE rows don't have inputs so
they're not affected.
operator's call: probing-forward + state-file machinery is more
brittle than the bug it tried to fix. revert to the original
deterministic FNV-1a hash mod 900. collisions are real but rare;
operator resolves by renaming (different name → different hash) and
rebuilding. no per-agent port file, no scan, no migration path,
nothing to drift out of sync with the running container.
existing port files on disk are silently ignored — operator
rebuilds affected agents to regenerate flakes from the deterministic
hash.
previous attempt was wrong: the legacy branch returned port_hash
unconditionally, so two legacies hashing to the same port both
wrote that port and the collision persisted (test still trying to
bind coder's port).
new rule: always probe forward from port_hash, with scan_taken_ports
parameterised by include_implicit_hashes:
- legacy migration (applied dir exists, no port file): pass false.
scan only counts other agents' port files. first-queried legacy
claims its hash; subsequent colliders see the first's port file
and probe forward. we don't know which legacy originally won
the bind race, so first-write-wins; the loser was already
crash-looping anyway and gets a fresh port to rebuild to.
- fresh spawn (no applied dir): pass true. counts port files AND
implicit hashes for not-yet-migrated legacies, so a new spawn
doesn't race with an unmigrated peer.
migration note for affected users: agents whose port file says
something other than their hashed port may have been corrupted by
the previous fix. Hit ↻ R3BU1LD on the offender to regenerate the
flake (uses the current port file) and the container will bind
the right port on restart.
three signals fire OS notifications:
- new approval lands in the queue (per id, via /api/state delta)
- new ask_operator question queued (per id)
- broker message sent to operator (live via SSE)
first /api/state render after page load seeds the 'seen' sets
without firing — only items that arrive while the page is open
count. controls in a row under the banner: 🔔 enable
notifications (calls requestPermission, hides on grant), 🔕 mute /
🔔 unmute toggle (localStorage-backed so operator can silence
without revoking the permission), inline status text when blocked
or unsupported.
notification tag='hyperhive' collapses rapid bursts; onclick
focuses the dashboard tab. requires secure context (HTTPS or
localhost) — on other origins the API is unavailable and the
controls hide themselves.
todo: entry dropped.
pure frontend — Notification API + existing /api/state and
/messages/stream signals. Caveats: secure-context requirement
(HTTPS or localhost), per-browser permission grant. Includes a
sketch of the implementation: request-permission button, count
deltas on refreshState, SSE hook on operator-bound sends,
localStorage 'muted' toggle.
model persistence: /model <name> now writes to /state/hyperhive-model
(in-container), Bus::new reads it on init. operator override survives
harness restart and container rebuild; gone on --purge like every
other piece of agent state. path overridable via HYPERHIVE_MODEL_FILE
for tests. failure to persist is a warn, not fatal — runtime override
still applies, just won't survive a restart.
unfree opt-in: drop the auto-allowUnfreePredicate from
harness-base.nix and the claude-unstable overlay. operator now has to
set nixpkgs.config.allowUnfree (or a predicate listing claude-code)
in their own host config. silent unfree bypass was sketchy; this is
honest. readme + gotchas updated to spell out the snippet.
todo: drops model-persistence + container-crash + journald (all
shipped); adds per-agent send allow-list (constrain who an agent can
message).
new hive_c0re::crash_watch task polls every 10s, builds the set of
currently-running containers, and on running→stopped transitions
checks the transient snapshot: if no Stopping / Restarting /
Destroying / Rebuilding flag is set, the container exited
unexpectedly and we fire HelperEvent::ContainerCrash into the
manager's inbox so it can react (typically: start it again).
first poll is a seeding pass — no events on harness startup. dbus
subscription would be lower-latency but polling is honest and
debuggable, and a 10s delay on crash detection is fine for our
scale.
manager prompt + approvals doc updated to advertise the new
event variant. todo drops the entry (and the journald-viewer
entry that already shipped).
- runtime model override: Bus::{model,set_model} + POST /api/model
(form-encoded {model: name}). turn.rs reads bus.model() per turn
so a flip lands on the next claude invocation. /api/state grows
a model field; agent page shows a 'model · <name>' chip in the
state row. '/model <name>' slash command POSTs to the endpoint
and refreshes state.
- port regression fix: agent_web_port no longer probes forward for
*existing* agents (the previous fix shifted ports for any agent
without a port file, including legacy ones whose container was
already bound to the bare hashed port — dashboard rendered the
new port, container was still on the old one, conn errors). new
rule: port file exists → use it; absent + applied flake present
→ legacy, persist port_hash without probing; absent + no applied
flake → fresh spawn, probe forward.
- SO_REUSEADDR on both the dashboard and per-agent web UI binds
via tokio::net::TcpSocket. operator hit 12 retries failing on
manager :8000 — REUSEADDR handles the TIME_WAIT case cleanly
without a new dep; retry still covers the genuine
process-still-alive overlap.
todo: drops the model-override entry (shipped); adds two new
items — model persistence (optional, future), and custom
per-agent MCP tools (groundwork for moving bitburner-agent into
hyperhive).
recv-with-timeout is strictly better than a fixed sleep because it
wakes instantly on incoming messages. drop the half-written nap MCP
tool, raise the recv wait_seconds cap from 60s to 180s on both
agent and manager sockets.
prompts updated: agent.md + manager.md now spell out the pattern —
when there's nothing else useful to do, call recv with
wait_seconds=180 to park the turn; do NOT use Bash sleep for the
same purpose. todo drops the nap entry and the napping-state-badge
follow-up; both replaced by 'just use a long recv'.
AgentRequest::Recv and ManagerRequest::Recv grow an optional
wait_seconds field (default None → 30s, capped at 60s server-side).
agent_server / manager_server clamp via recv_timeout(). MCP tool
schemas advertise the param so claude can pick its own poll window
— useful when an agent wants to throttle wakes without entering a
distinct nap state.
both harness loops still pass None, keeping the existing 30s
default behaviour for system-level Recvs.
new TurnState { Idle, Thinking, Compacting } on hive_ag3nt::events::Bus
with set_state + state_snapshot. the turn loops in hive-ag3nt and
hive-m1nd flip Thinking before drive_turn and Idle after; the
web_ui's /api/compact handler flips Compacting around compact_session.
per-agent /api/state grows turn_state + turn_state_since (unix
seconds). frontend prefers the server-reported state over the
client-derived one — setStateAbs takes the absolute since-time so
the 'last turn' chip reads the actual server-side duration instead
of the client's perceived gap between SSE events. SSE turn_start /
turn_end still drive state instantly between renders; /api/state
re-anchors on each turn_end refresh.
new compacting state gets its own purple badge with pulse
animation (mirrors thinking's amber). napping will slot in the
same way once the nap tool lands.
new GET /api/journal/{name}?unit=&lines= shells out journalctl -M
<container> -b --no-pager --output=short-iso --lines=<N> (cap 5000).
optional unit filter, restricted to hive-ag3nt.service /
hive-m1nd.service so the shell-out can't be coerced into reading
unrelated units. validates the container name against the live list
before invoking journalctl.
frontend renders a collapsed '↳ logs · <container>' details block
on each container row. expanding triggers a lazy fetch; refresh
button re-fetches; unit dropdown switches between the harness
service (default) and the full machine journal. output sits in a
24em-tall monospace pre, auto-scrolled to the bottom on fresh
fetch.
hive-c0re's systemd unit already runs as root, so journalctl has
the access it needs.
operator hit 'coder' and 'test' colliding on the same hashed port —
fnv-1a mod 900 has ~0.1% collision probability per pair and clearly
that's not enough. agent_web_port goes stateful:
- per-agent port persisted to /var/lib/hyperhive/agents/<name>/port
- on first call, look up the file; if absent, hash, then probe
forward through the allocated range skipping any port other
agents already claim, then write the chosen value back
- subsequent calls return the persisted port (sticky)
other agents' ports come from their port file if present, else the
fallback is the hashed value — that handles existing deployments
without forcing a rebuild-all just to migrate. rebuilding the
colliding agent re-runs agent_web_port, sees its peer's implicit
hash port as taken, picks the next free slot, persists.
range exhaustion (very unlikely — 900 slots) logs a warning and
returns the hash; the bind-with-retry on the harness will surface
the failure honestly rather than silently looping.