docs sync + revert auto-unfree removal
revert the earlier 'operator must set allowUnfree' move: per-agent containers evaluate their own nixpkgs and the operator's host-level allowUnfree doesn't propagate in. restoring the scoped allowUnfreePredicate inside both the claude-unstable overlay and harness-base.nix; documented in README + gotchas as 'nothing to set on the operator side'. docs: - claude.md file map adds crash_watch.rs, kick_agent on coordinator, /api/model + journald viewer + bind-with-retry references. - scratchpad rewritten to reflect the recent run. - web-ui.md: notification row + browser notifications section, state row (badge + model chip + last-turn chip + cancel button), per-agent inbox, /model slash, /cancel-question + journald endpoints, focus-preservation on refresh. - turn-loop.md: --model is read from Bus::model() per turn (runtime override via /model); recv(wait_seconds) up to 180s with the rationale; ask_operator gains ttl_seconds; new TurnState section; kick_agent inbox-on-startup hint. - approvals.md: ttl/cancel resolution paths for operator questions. - persistence.md: /state/hyperhive-model file. - gotchas.md: web UI port collision policy (rename, don't probe); bind retry + SO_REUSEADDR shape; auto-unfree restored. - todo.md: cleaned up empty sections and stale entries; /model shipped, dropped from the list.
This commit is contained in:
parent
d275b50177
commit
62d1a74929
10 changed files with 239 additions and 95 deletions
|
|
@ -54,14 +54,13 @@ socket without needing a clean reinstall.
|
|||
## `claude-code` is unfree
|
||||
|
||||
The flake pins it to **nixpkgs-unstable** via
|
||||
`overlays.claude-unstable` (stable lags too far). The overlay
|
||||
imports unstable inheriting the user's `nixpkgs.config`, so the
|
||||
operator must opt in by setting `allowUnfree = true` (or an
|
||||
`allowUnfreePredicate` that whitelists `claude-code`) on their host
|
||||
config. hyperhive deliberately does NOT auto-allow — silent unfree
|
||||
bypass would be sketchy, and the error message is clear enough that
|
||||
the operator can fix it once and forget about it. Same on the
|
||||
per-agent containers (they inherit through the same nixpkgs).
|
||||
`overlays.claude-unstable` (stable lags too far). The overlay sets
|
||||
`config.allowUnfreePredicate` on its unstable import to whitelist
|
||||
`claude-code` specifically — scoped, only this one package.
|
||||
`harness-base.nix` does the same at the container level because
|
||||
each per-agent `nixosConfiguration` evaluates its own nixpkgs
|
||||
instance and the operator's host-level `allowUnfree` does **not**
|
||||
propagate in. Operators don't need to set anything on their side.
|
||||
|
||||
## Claude credentials are per-agent
|
||||
|
||||
|
|
@ -79,6 +78,28 @@ across `destroy`/recreate (`--purge` wipes them).
|
|||
writes its events log here (`/state/hyperhive-events.sqlite`).
|
||||
Survives `destroy`/recreate alongside the claude dir.
|
||||
|
||||
## Web UI ports collide on hash
|
||||
|
||||
Sub-agent web UI ports are deterministic FNV-1a of the agent name
|
||||
modulo 900 (range 8100..8999). With ~30 agents the birthday-paradox
|
||||
collision rate gets meaningful; at 2–3 agents you can still get
|
||||
unlucky. Operator resolves a collision by renaming the offending
|
||||
agent (different hash → different port) and rebuilding. No state
|
||||
file, no probing, no port-allocation drift — the value is
|
||||
reproducible from just the name. Manager is fixed at 8000;
|
||||
dashboard at `cfg.dashboardPort` (default 7000).
|
||||
|
||||
## Restart races on TCP bind
|
||||
|
||||
Both the dashboard and per-agent web UI use `tokio::net::TcpSocket`
|
||||
with `SO_REUSEADDR` plus a retry-on-`AddrInUse` loop (12 tries,
|
||||
exponential backoff capped at 2s, ~22s total). REUSEADDR handles
|
||||
the `TIME_WAIT` case from a clean previous exit; retry covers the
|
||||
genuine "previous process is still alive during a systemd restart
|
||||
overlap" case. REUSEADDR does **not** allow two simultaneous
|
||||
`LISTEN` sockets on the same port (that would be `SO_REUSEPORT`,
|
||||
which we don't use) — exclusivity is preserved.
|
||||
|
||||
## Orphan approvals
|
||||
|
||||
If state dirs are wiped out from under a pending approval (test
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue