Commit graph

306 commits

Author SHA1 Message Date
müde
3e3c27ac48 hive-forge: disable F3 (federation) — defaults to RO nix-store path
forgejo's F3 federation subsystem resolves its data dir relative to
the binary, which under nixos lands at /run/current-system/sw/bin/data/f3
(read-only nix store) and fatals the daemon at boot. we don't
federate; turn it off.
2026-05-17 00:03:41 +02:00
müde
4a06615c5c fix /state paths: sub-agents use /agents/<name>/state, not /state
sub-agent containers post-refactor bind their state at
/agents/<name>/state (manager keeps the legacy /state — see
lifecycle.rs:751). agent.md still said /state/forge-token; corrected
to /agents/{label}/state/forge-token (template-substituted at
boot). tea-login systemd unit now walks both candidates so the same
harness module works for the manager and sub-agents.
2026-05-16 23:37:49 +02:00
müde
9fc7cae132 prompts: tell agents + manager about the code forge; todo: shared docs repo
system prompts now describe the hyperhive Forgejo at localhost:3000,
the per-agent user, the pre-configured tea CLI, and the REST API
fallback with /state/forge-token. todo gains the shared docs/skills
RO-repo follow-up (org-shared + per-agent read membership).
2026-05-16 23:36:05 +02:00
müde
787c058c71 harness: install tea + auto-login from /state/forge-token
agents get `pkgs.tea` (gitea/forgejo CLI) and a tea-login oneshot
that runs `tea login add --url <hyperhive.forge.url> --token
$(cat /state/forge-token)` before the harness starts. idempotent:
exits 0 when the token file is absent (hive-forge not on) or when
~/.config/tea/config.yml already exists. new
`hyperhive.forge.url` option (default http://localhost:3000) so
operators can point at a non-default forge port. claude can now
shell out to `tea repos create`, `tea pulls create`, etc.
2026-05-16 23:35:28 +02:00
müde
dccbd99b0c forge: broaden token scopes for repo create / PRs / orgs / misc
bumped from (read:user,write:repository,write:issue) to also include
write:user (own profile + create repos under own namespace),
write:organization (share namespaces between agents), write:misc
(hooks/attachments). still excludes admin and package scopes.
2026-05-16 20:58:20 +02:00
müde
480d646f69 forge: auto-create a user + token per agent on spawn / startup
new forge module probes the hive-forge nixos-container (no-op when
absent), and ensures every agent + the manager has a forgejo user
named after them with an access token at `<state>/forge-token`
(visible inside the container as `/state/forge-token`).

idempotent: skips user creation when forgejo reports 'already
exists', skips token issuance when the file is present, scopes the
token to read:user,write:repository,write:issue. token-name suffixed
with a clock so re-issuing doesn't collide with a stale name. shells
out via `nixos-container run hive-forge -- runuser -u forgejo --
forgejo admin` (runuser instead of sudo since sudo isn't in the
container by default).

hooks: ensure_all sweeps existing containers at hive-c0re startup
(backgrounded), and the actions.rs spawn task calls ensure_user_for
the new agent right after lifecycle::spawn succeeds. failures log a
warning but don't abort spawn — a missing token is recoverable from
the next startup sweep.
2026-05-16 20:55:13 +02:00
müde
6e9c67dd94 hive-forge: wrap forgejo in a nixos-container
avoids fighting an operator-side `services.forgejo` over the
singleton module options. container shares host netns
(`privateNetwork = false`) so agents still dial the forge via
plain `localhost:<httpPort>` and the host firewall is the only
layer that matters. container name is `hive-forge` (no `h-`
prefix) so hive-c0re's lifecycle scanner ignores it — operator
manages it with the standard `nixos-container` CLI. state lives
at `/var/lib/nixos-containers/hive-forge/var/lib/forgejo/` and
survives restarts.
2026-05-16 20:52:36 +02:00
müde
c2d176ed13 add hive-forge module: private forgejo for agents
new `services.hive-forge.enable` (off by default) wraps
`services.forgejo` with hyperhive-friendly defaults: sqlite (no
extra db service), built-in ssh on 2222 so it doesn't fight the
host's openssh, http on 3000 (outside hyperhive's 7000/8000/8100-8999
ranges), registration off (operator seeds agent users), private
repos by default. exported as `nixosModules.hive-forge` — operator
imports it on the host alongside hive-c0re. container-side wiring
(MCP tools or a bind-mounted token) is deferred; containers already
share the host netns so they can reach http://localhost:3000 today.
2026-05-16 20:50:36 +02:00
damocles
824acee134 include agent label in turn failure notification body 2026-05-16 20:45:19 +02:00
damocles
1023acf69f add get_logs tool to manager mcp surface 2026-05-16 20:45:19 +02:00
damocles
fca480b86e add turn lock to prevent /compact racing with in-flight turns 2026-05-16 20:45:19 +02:00
damocles
25508d7399 fix manager loop: pending wake + move sleep into Empty arm only 2026-05-16 20:45:19 +02:00
müde
f2a0dc4107 re-apply TodoWrite removal + deny list (lost in subsequent merge) 2026-05-16 19:47:55 +02:00
müde
313121a6e9 fix: transient state leak via RAII guard
bare set_transient/clear_transient pairs leak the in-memory transient
on task cancellation, panics, or any early return between the two
calls — dashboard then shows the agent stuck in 'rebuilding…'
forever (coder hit this today). add Coordinator::transient_guard
returning a TransientGuard whose Drop clears, and convert every
caller (dashboard lifecycle_action, auto_update::rebuild_agent,
manager_server Update, actions::destroy, actions Spawn task,
migrate phase 4). destroy() now takes &Arc<Coordinator> so it can
hold a guard. existing stuck transients clear on next hive-c0re
restart since transient state is in-memory only.
2026-05-16 19:47:52 +02:00
damocles
1a36c38a54 fix broadcast send for manager, deduplicate into coordinator.broadcast_send 2026-05-16 19:31:53 +02:00
damocles
f3739d2b8e update plugin marketplaces before install at harness boot 2026-05-16 18:51:06 +02:00
damocles
dc53615686 fix stale /state refs in agent and manager prompts 2026-05-16 18:50:15 +02:00
müde
772fdd8320 forward plugin install failures to manager from sub-agents
install_configured now takes an optional notify recipient. on a
non-zero or spawn-failed 'claude plugin install', sub-agents send
the spec + stderr to manager via the hyperhive socket; manager
passes None so it doesn't message itself. boot still proceeds either
way — notification is best-effort.
2026-05-16 17:24:04 +02:00
müde
3e040d5b16 agent: forward unhandled turn failures to manager
run_claude now keeps a 20-line stderr ring buffer and bails with it
inline (was just 'exit <status>'). agent serve loop, on Failed (not
PromptTooLong — that's already absorbed by drive_turn's compaction
retry), sends the error body to manager via the normal hyperhive
send. swallows transport errors — failure is already in journald
and the events sqlite. manager-only harness (hive-m1nd) is unchanged
so it doesn't try to notify itself.
2026-05-16 16:04:35 +02:00
müde
7ec658851a back out bypassPermissions: claude refuses it under root uid
claude-code rejects --dangerously-skip-permissions / defaultMode=
bypassPermissions when running as root, which all hyperhive
containers do. revert to the previous explicit allow-list plumbing
(per-flavor list spliced into permissions.allow + --tools enable
list), keep TodoWrite out of the built-in allow set, and keep the
deny list (TodoWrite, WebFetch, WebSearch, Task) as belt-and-braces
in case anything sneaks past the allow gate.
2026-05-16 15:58:41 +02:00
müde
36c7f3d1c7 mirror claude stderr to tracing so journald captures it
bus-only note made post-mortems require the web UI / events sqlite;
now stderr lines also land in 'journalctl -M <container> -b' alongside
the existing LiveEvent::Note for the dashboard.
2026-05-16 15:30:03 +02:00
müde
7d33da3727 retry hive socket up to 5x over 60s, surface retry count to claude
socket client now retries connect/IO failures with 2-4-8-16-30s
backoffs (60s total budget). transparent for non-tool callers via
request(); tool handlers go through request_retried() which also
returns the retry count, then annotate_retries() appends a one-line
note to the tool result so claude knows the slow round-trip was a
c0re flicker, not a content failure — avoids burning tokens on an
LLM-level retry.
2026-05-16 15:28:18 +02:00
damocles
4a8a668348 feat: add optional description to request_apply_commit and request_spawn 2026-05-16 15:18:32 +02:00
damocles
a6d1464071 refactor: per-agent state paths (/agents/{label}/state), centralize in paths.rs 2026-05-16 15:18:32 +02:00
damocles
a82009cf8c docs: update agent prompt to reference /agents/{label}/state and /agents/{label}/claude 2026-05-16 15:18:19 +02:00
damocles
ecaa178199 refactor: compute per-agent mount points for /agents/<name>/ structure 2026-05-16 15:18:19 +02:00
müde
6dd17864ac auto-install claude plugins at harness boot
new hyperhive.claudePlugins NixOS option (list of strings) rendered
to /etc/hyperhive/claude-plugins.json. both hive-ag3nt and hive-m1nd
shell out 'claude plugin install <spec>' for each entry once at
startup before the turn loop opens. failures log a warning but don't
abort boot.
2026-05-16 15:17:34 +02:00
müde
8e7405db13 bypass-mode perms + deny list, drop allow-list plumbing
claude-settings.json now sets permissions.defaultMode=bypassPermissions
with a small deny list (WebFetch, WebSearch, Task, TodoWrite). The
per-flavor allow list and --tools / --allowedTools CLI flags are gone
— anything not denied auto-approves. mcp.rs loses ALLOWED_BUILTIN_TOOLS,
builtin_tools_arg, allow_list, allowed_mcp_tools. The extraMcpServers
allowedTools field is parsed for back-compat but no longer wired
anywhere; restrict via permissions.deny instead.
2026-05-16 15:17:30 +02:00
damocles
3d2a7ffec7 fix: auto-wake after turn if pending messages exist, don't block on recv 2026-05-16 13:50:11 +02:00
damocles
d99e0812d0 fix: move sleep to only occur when recv returns empty, avoid message delivery delay 2026-05-16 13:48:04 +02:00
damocles
4ceae6cf67 todo: add bug - pending message wake-up issue 2026-05-16 13:43:41 +02:00
damocles
ab0df71068 docs: update system prompts to document /shared directory 2026-05-16 13:43:05 +02:00
damocles
37e56af6ba add /shared mount: new shared directory accessible to all agents 2026-05-16 13:42:41 +02:00
damocles
3642ae1a61 todo: add dashboard ui for pending reminders 2026-05-16 13:40:15 +02:00
damocles
abcf7a0c41 implement broadcast messaging: send to '*' reaches all agents with hint 2026-05-16 13:16:13 +02:00
damocles
a57e500f48 todo: add multi-agent restart coordination item 2026-05-16 13:14:17 +02:00
damocles
3b8cdc7e20 todo: add broadcast messaging feature 2026-05-16 13:07:31 +02:00
damocles
22cea88c7e remove unused broker/coordinator methods 2026-05-16 13:02:53 +02:00
damocles
24eec69418 fix reminder tool issues: error on time overflow, optimize scheduler query 2026-05-16 13:00:56 +02:00
damocles
bc27113967 docs: add hyperhive feature TODOs 2026-05-16 12:52:08 +02:00
damocles
f38510930a reminder: add background scheduler loop - checks & delivers due reminders every 5s 2026-05-16 12:49:59 +02:00
damocles
4fc9c02934 reminder: add sqlite storage + broker methods + dispatch 2026-05-16 12:49:59 +02:00
damocles
7e9fd8e978 agent: add Remind request + ReminderTiming enum (stub implementation) 2026-05-16 12:49:59 +02:00
damocles
862bc1de44 Revert "agent: add Wake command - co-process self-wake via agent socket"
This reverts commit 68a9b8575b1647643c87bd753767acabf96528c3.
2026-05-16 12:49:59 +02:00
damocles
f0e87f0bc5 agent: add Wake command - co-process self-wake via agent socket 2026-05-16 12:49:59 +02:00
damocles
286da8980e Revert "mcp: wire extra server allowedTools into --allowedTools arg"
This reverts commit e0b18ff3c2ec5a7f771ab9a1a247ff4a24a8c475.
2026-05-16 12:49:59 +02:00
damocles
caa495aeda mcp: wire extra server allowedTools into --allowedTools arg 2026-05-16 12:49:59 +02:00
müde
d06b598c56 kick_agent on every rebuild + apply path
agents weren't being woken with the 'you were rebuilt — check
/state/ for notes, --continue intact' system message after
several recent rebuild surfaces:

- auto_update::rebuild_agent — used by the dashboard rebuild
  button, admin-CLI rebuild via lifecycle_action, the startup
  rev-scan, AND the new meta-input update batch loop. kick
  moves *into* rebuild_agent's success arm so all four
  paths benefit. (the dashboard's lifecycle_action extra
  closure was already firing kick — now it's a no-op for the
  rebuild path since rebuild_agent does it.)
- actions::run_apply_commit — apply-commit approve flow built
  + tagged deployed/<id> but never kicked. add kick on
  success with the more specific 'config update applied' hint.
- server.rs::HostRequest::Rebuild — the admin-CLI direct path
  calls lifecycle::rebuild bypassing rebuild_agent. add kick
  on success.

dashboard's restart / start lifecycle_action extras still
kick via their own closures since they don't route through
rebuild_agent. stop / kill / destroy intentionally don't
kick — there's nothing to wake.
2026-05-16 04:20:01 +02:00
müde
78aa830430 meta inputs panel: walk transitive inputs, slash-path names
read_meta_inputs() previously only included direct inputs of
meta's root node — so a manager-added 'inputs.mcp-matrix' in
agent-dmatrix's flake.nix never surfaced in the dashboard
panel even though it's a real fetched input that nix can
update.

now: BFS the flake.lock graph from root to depth 2. emits
one MetaInputView per fetched (non-follows) node, names are
slash-paths from root — 'hyperhive', 'agent-coder',
'agent-dmatrix/mcp-matrix', 'hyperhive/nixpkgs', etc. that's
the same syntax 'nix flake update' accepts for transitive
inputs, so the existing POST /meta-update path needs no
nix-side change.

depth limit of 2 keeps the panel readable — deeper transitives
(nixpkgs's own deps etc.) would explode it; bumping a level-2
entry re-fetches its sub-inputs anyway.

POST /meta-update's 'which agents to rebuild' derivation
updated for the slash names: anything under hyperhive/
fans out to all agents (shared base); 'agent-<n>/...' picks
out the agent name from before the first slash.

read_meta_locked_revs (used by the deployed:<sha> chip per
container) split out into its own straight root-input lookup
since the chip only cares about the agent's own input.
2026-05-16 04:12:04 +02:00
müde
67e4242b9f per-agent send allow-list via hyperhive.allowedRecipients
new NixOS option in harness-base.nix:
  hyperhive.allowedRecipients = [ 'alice' 'manager' ];  # whitelist
  hyperhive.allowedRecipients = [ ];                    # default = unrestricted

module writes the list as JSON to /etc/hyperhive/send-allow
.json at activation. AgentServer::send reads the file before
issuing the broker request; if the list is non-empty and
`to` isn't on it, the tool returns a claude-readable refusal
string without touching the broker. the manager is always
implicitly permitted regardless of the list — otherwise a
misconfigured allow-list could strand a sub-agent without an
escalation path.

enforcement is in the in-container MCP server (not on the
host's per-agent socket) because the agent's nix config is the
trust boundary anyway — the operator audits agent.nix at
deploy time, the activation-time /etc/hyperhive/send-allow
.json is r/o under /nix/store, so the agent can't tamper at
runtime without going through a new approval.

agent prompt mentions the option + tells claude to route
through the manager when refused. retires the matching TODO
under Permissions / policy.
2026-05-16 03:59:28 +02:00