müde 62d1a74929 docs sync + revert auto-unfree removal

revert the earlier 'operator must set allowUnfree' move:
per-agent containers evaluate their own nixpkgs and the operator's
host-level allowUnfree doesn't propagate in. restoring the scoped
allowUnfreePredicate inside both the claude-unstable overlay and
harness-base.nix; documented in README + gotchas as 'nothing to
set on the operator side'.

docs:
- claude.md file map adds crash_watch.rs, kick_agent on coordinator,
  /api/model + journald viewer + bind-with-retry references.
- scratchpad rewritten to reflect the recent run.
- web-ui.md: notification row + browser notifications section,
  state row (badge + model chip + last-turn chip + cancel button),
  per-agent inbox, /model slash, /cancel-question + journald
  endpoints, focus-preservation on refresh.
- turn-loop.md: --model is read from Bus::model() per turn (runtime
  override via /model); recv(wait_seconds) up to 180s with the
  rationale; ask_operator gains ttl_seconds; new TurnState section;
  kick_agent inbox-on-startup hint.
- approvals.md: ttl/cancel resolution paths for operator questions.
- persistence.md: /state/hyperhive-model file.
- gotchas.md: web UI port collision policy (rename, don't probe);
  bind retry + SO_REUSEADDR shape; auto-unfree restored.
- todo.md: cleaned up empty sections and stale entries; /model
  shipped, dropped from the list.

2026-05-15 21:26:13 +02:00

6.8 KiB

Raw Blame History

Approvals + manager + helper events

The approval queue is hyperhive's pivot: nothing that changes the shape of an agent (its config, whether it exists) happens without an operator click. The manager (hm1nd) is the policy gate in front of that queue; helper events are how it stays informed about what happens after a decision lands.

End-to-end approval flow

Manager edits /agents/<name>/config/agent.nix (bind-mounted from the host's per-agent proposed repo) and commits.
Manager submits the commit sha via request_apply_commit(agent, commit_ref).
Operator sees the diff on the dashboard, clicks ◆ APPR0VE (or hive-c0re approve <id> on the CLI).
hive-c0re reads the file at that sha from proposed, applies into applied, commits there, runs nixos-container update.
HelperEvent::ApprovalResolved lands in the manager's inbox.

Spawn approvals follow the same shape but skip the commit-diff step — the operator just sees the name. On approve, hive-c0re creates the container in a background task while the dashboard shows a spinner.

Two repos per agent

/var/lib/hyperhive/agents/<name>/config/    proposed
└── agent.nix                               # the only file the
                                            # manager can change
                                            # (initial commit by
                                            # hive-c0re on first
                                            # spawn, never touched
                                            # again).

/var/lib/hyperhive/applied/<name>/          applied — hive-c0re-only
├── flake.nix                               # auto-generated
└── agent.nix                               # overwritten by approve
                                            # from the proposed commit

The container's --flake ref is <applied_dir>#default. The flake extends hyperhive.nixosConfigurations.{agent-base|manager} with ./agent.nix plus an inline module setting programs.git.config.user (committer identity = the agent's name) and systemd.services.<harness>.environment (HIVE_PORT, HIVE_LABEL, HIVE_DASHBOARD_PORT).

Manager (`hm1nd`) is hive-c0re-managed

The manager container runs through the same lifecycle as sub-agents. On hive-c0re serve startup, if hm1nd is missing, hive-c0re creates it. The manager's flake lives at /var/lib/hyperhive/applied/hm1nd/; its proposed config at /var/lib/hyperhive/agents/hm1nd/config/. Manager can edit its own agent.nix (visible inside the container at /agents/hm1nd/config/) and submit request_apply_commit("hm1nd", <sha>) for operator approval.

Differences from sub-agents:

flake.nix extends hyperhive.nixosConfigurations.manager (vs agent-base).
Container name is hm1nd (no h- prefix).
Fixed web UI port (MANAGER_PORT = 8000).
set_nspawn_flags adds an extra bind: /var/lib/hyperhive/agents → /agents (RW), so the manager can edit per-agent proposed repos.
First-deploy spawn bypasses the approval queue (manager is required infrastructure).
Per-agent socket lives at /run/hyperhive/manager/, owned by manager_server::start.

Migration note (for older hosts): drop any containers.hm1nd = { ... } block from your host NixOS config. hyperhive creates and updates the manager itself.

Manager policy

From hive-ag3nt/prompts/manager.md: the manager does NOT rubber-stamp sub-agent config requests. It verifies (role match, package legitimacy, cheaper alternative, blast radius) before committing and calling request_apply_commit.

For ambiguous cases or anything that needs human signal, the manager calls ask_operator(question, options?, multi?, ttl_seconds?) — queues the question on the dashboard and returns the id immediately. The operator's answer arrives later as HelperEvent::OperatorAnswered in the manager inbox. Storage is hive-c0re::operator_questions (sqlite); the answer flow is:

POST /answer-question/{id}
  → OperatorQuestions::answer
  → notify_manager(OperatorAnswered { id, question, answer })

Two more paths resolve a pending question with a sentinel answer:

POST /cancel-question/{id} (✗ CANC3L button on the dashboard) resolves with [cancelled]. The manager sees a terminal state and can fall back.
ttl_seconds deadline: a tokio watchdog spawned at submit time fires answer(id, "[expired]") once the ttl runs out. Already- resolved races no-op. The dashboard surfaces a ⏳ MM:SS chip on each pending question with a deadline.

Helper events to the manager

Coordinator::notify_manager(&HelperEvent) enqueues an inbox message from sender system with the event JSON in the body. The manager harness no longer short-circuits these — they drive a regular claude turn so the manager can react. Variants (hive_sh4re::HelperEvent):

ApprovalResolved { id, agent, commit_ref, status, note } — fired by actions::approve + actions::deny whenever an approval transitions to its terminal state.
Spawned { agent, ok, note } — actions::approve (Spawn-kind)
- admin HostRequest::Spawn.
Rebuilt { agent, ok, note } — auto_update::rebuild_agent (covers startup scan + manual /rebuild from dashboard) + actions::approve (ApplyCommit).
Killed { agent } — admin HostRequest::Kill + dashboard /kill + manager Kill MCP tool.
Destroyed { agent } — actions::destroy.
ContainerCrash { agent, note } — crash_watch: a previously- running container went away with no operator-initiated transient state (Stopping / Restarting / Destroying / Rebuilding). Manager can start it again or escalate.
OperatorAnswered { id, question, answer } — dashboard /answer-question/{id} after the operator submits the answer form.

To add a new event: new HelperEvent variant + call sites + update prompts/manager.md so the manager knows the new shape.

Auto-update on startup

hive-c0re serve runs auto_update::run in a background task right after opening the coordinator. It enumerates managed containers and rebuilds any whose recorded hyperhive rev differs from the current one — sub-agents and manager go through the same lifecycle::rebuild path.

"Rev" = canonical filesystem path of cfg.hyperhiveFlake. Marker file: /var/lib/hyperhive/applied/.<name>.hyperhive-rev. If the flake input has no canonical path (e.g. a github: URL), auto-update is a no-op — rebuild manually.

The dashboard surfaces pending updates per agent: a clickable "needs update ↻" badge appears whenever the marker differs from current rev. The badge POSTs /rebuild/<name>, calling the same auto_update::rebuild_agent path so manual triggers and the startup scan can't drift. When at least one container is stale, a top-level ↻ UPD4TE 4LL button appears that loops over every stale container.

6.8 KiB Raw Blame History