a failed tea-login oneshot used to abort `nixos-container update`
(switch-to-configuration exits 4), which blocked every rebuild
whether the agent needed tea or not. drop `set -e`, exit 0 on
every failure path (mkdir, tea login add, missing forge). also fix
the unit description, which hardcoded /state (manager-only) — sub-
agents have /agents/<name>/state.
startup sweep adds ensure_repo('meta', core_token) after the orgs
so the first push isn't a 404. meta::git_commit now calls
forge::push_meta after every successful commit — token-in-URL
`git push http://core:$token@localhost:3000/core/meta.git` —
gated on the core token file existing (no-op when forge isn't
seeded). push failures log warn, don't bubble up.
no tea needed on the host; git is already on the hive-c0re service
PATH via /run/current-system/sw.
new ensure_core_user_and_token mints a site-admin 'core' user with
its token at /var/lib/hyperhive/forge-core-token (root 0600) —
hive-c0re's own forge identity for pushing the meta repo + driving
the admin API. that token then drives ensure_org for 'core' (meta
repo lives here) and 'agents' (per-agent applied config repos).
both org-create calls are idempotent: HTTP 422/409 treated as
success. failures log but don't abort the rest of the sweep.
curl is shelled out from the host — already on the hive-c0re
service PATH via /run/current-system/sw, no new dep.
manager has /agents bind-mounted too, so /agents/hm1nd/state
resolves there alongside the legacy /state. one canonical path in
the wake message instead of branching on MANAGER_NAME.
manager keeps /state (legacy mount); sub-agents see their state at
/agents/<name>/state. wake message hardcoded /state/ for everyone,
which is wrong for sub-agents post-refactor — they get a path they
can't ls. switch on MANAGER_NAME and format the right path.
so every agent has the official Anthropic marketplace registered
out of the box and plugin specs like 'foo@claude-plugins-official'
resolve without per-agent.nix wiring. operators add more entries
(community marketplace, etc.) or override to [] to opt out.
new `hyperhive.claudeMarketplaces` option (list of strings — URL,
path, or github:owner/repo). harness boot adds each via
`claude plugin marketplace add` before updating + installing the
configured plugins, so specs like `foo@some-marketplace` resolve
on a fresh container. idempotent: 'already exists' stderr is
treated as success.
nixpkgs's services.forgejo defaults to forgejo-lts (11.0.13 today);
LTS lags far enough behind that any prior non-LTS run against the
same state dir leaves the DB at a migration the LTS binary can't
read ('database newer than binary, refusing to start'). default to
the latest release line and let operators opt down to LTS by
overriding services.hive-forge.package.
home-manager / nix-managed git configs ship the file from the nix
store, so `git config --global` errors out. catch the failure and
print the equivalent home-manager snippet instead of aborting — the
tea + netrc steps still want to run.
forge-create-token.sh mints an access token for an existing user
(prints to stdout — forgejo only shows it once). forge-login.sh
configures the operator's shell: git config --global user.name /
user.email, ~/.netrc entry for HTTP clones, and `tea login add`
when tea is on PATH. takes the token interactively (hidden input)
so it doesn't land in shell history.
without --work-path, forgejo's admin CLI defaults WorkPath to the
binary's directory (RO nix store), can't find custom/conf/app.ini
there, falls back to defaults, and F3 init mkdir-fails inside the
store. systemd unit sets WORK_PATH for the daemon; mirror it here
for every nixos-container-driven 'forgejo admin' invocation.
forgejo's F3 init resolves data-dir before checking ENABLED, so
`forgejo admin user create` still fataled on the RO nix-store
default. set [F3] PATH = /var/lib/forgejo/data/f3 alongside the
disable.
forgejo's F3 federation subsystem resolves its data dir relative to
the binary, which under nixos lands at /run/current-system/sw/bin/data/f3
(read-only nix store) and fatals the daemon at boot. we don't
federate; turn it off.
sub-agent containers post-refactor bind their state at
/agents/<name>/state (manager keeps the legacy /state — see
lifecycle.rs:751). agent.md still said /state/forge-token; corrected
to /agents/{label}/state/forge-token (template-substituted at
boot). tea-login systemd unit now walks both candidates so the same
harness module works for the manager and sub-agents.
system prompts now describe the hyperhive Forgejo at localhost:3000,
the per-agent user, the pre-configured tea CLI, and the REST API
fallback with /state/forge-token. todo gains the shared docs/skills
RO-repo follow-up (org-shared + per-agent read membership).
agents get `pkgs.tea` (gitea/forgejo CLI) and a tea-login oneshot
that runs `tea login add --url <hyperhive.forge.url> --token
$(cat /state/forge-token)` before the harness starts. idempotent:
exits 0 when the token file is absent (hive-forge not on) or when
~/.config/tea/config.yml already exists. new
`hyperhive.forge.url` option (default http://localhost:3000) so
operators can point at a non-default forge port. claude can now
shell out to `tea repos create`, `tea pulls create`, etc.
bumped from (read:user,write:repository,write:issue) to also include
write:user (own profile + create repos under own namespace),
write:organization (share namespaces between agents), write:misc
(hooks/attachments). still excludes admin and package scopes.
new forge module probes the hive-forge nixos-container (no-op when
absent), and ensures every agent + the manager has a forgejo user
named after them with an access token at `<state>/forge-token`
(visible inside the container as `/state/forge-token`).
idempotent: skips user creation when forgejo reports 'already
exists', skips token issuance when the file is present, scopes the
token to read:user,write:repository,write:issue. token-name suffixed
with a clock so re-issuing doesn't collide with a stale name. shells
out via `nixos-container run hive-forge -- runuser -u forgejo --
forgejo admin` (runuser instead of sudo since sudo isn't in the
container by default).
hooks: ensure_all sweeps existing containers at hive-c0re startup
(backgrounded), and the actions.rs spawn task calls ensure_user_for
the new agent right after lifecycle::spawn succeeds. failures log a
warning but don't abort spawn — a missing token is recoverable from
the next startup sweep.
avoids fighting an operator-side `services.forgejo` over the
singleton module options. container shares host netns
(`privateNetwork = false`) so agents still dial the forge via
plain `localhost:<httpPort>` and the host firewall is the only
layer that matters. container name is `hive-forge` (no `h-`
prefix) so hive-c0re's lifecycle scanner ignores it — operator
manages it with the standard `nixos-container` CLI. state lives
at `/var/lib/nixos-containers/hive-forge/var/lib/forgejo/` and
survives restarts.
new `services.hive-forge.enable` (off by default) wraps
`services.forgejo` with hyperhive-friendly defaults: sqlite (no
extra db service), built-in ssh on 2222 so it doesn't fight the
host's openssh, http on 3000 (outside hyperhive's 7000/8000/8100-8999
ranges), registration off (operator seeds agent users), private
repos by default. exported as `nixosModules.hive-forge` — operator
imports it on the host alongside hive-c0re. container-side wiring
(MCP tools or a bind-mounted token) is deferred; containers already
share the host netns so they can reach http://localhost:3000 today.
bare set_transient/clear_transient pairs leak the in-memory transient
on task cancellation, panics, or any early return between the two
calls — dashboard then shows the agent stuck in 'rebuilding…'
forever (coder hit this today). add Coordinator::transient_guard
returning a TransientGuard whose Drop clears, and convert every
caller (dashboard lifecycle_action, auto_update::rebuild_agent,
manager_server Update, actions::destroy, actions Spawn task,
migrate phase 4). destroy() now takes &Arc<Coordinator> so it can
hold a guard. existing stuck transients clear on next hive-c0re
restart since transient state is in-memory only.
install_configured now takes an optional notify recipient. on a
non-zero or spawn-failed 'claude plugin install', sub-agents send
the spec + stderr to manager via the hyperhive socket; manager
passes None so it doesn't message itself. boot still proceeds either
way — notification is best-effort.
run_claude now keeps a 20-line stderr ring buffer and bails with it
inline (was just 'exit <status>'). agent serve loop, on Failed (not
PromptTooLong — that's already absorbed by drive_turn's compaction
retry), sends the error body to manager via the normal hyperhive
send. swallows transport errors — failure is already in journald
and the events sqlite. manager-only harness (hive-m1nd) is unchanged
so it doesn't try to notify itself.
claude-code rejects --dangerously-skip-permissions / defaultMode=
bypassPermissions when running as root, which all hyperhive
containers do. revert to the previous explicit allow-list plumbing
(per-flavor list spliced into permissions.allow + --tools enable
list), keep TodoWrite out of the built-in allow set, and keep the
deny list (TodoWrite, WebFetch, WebSearch, Task) as belt-and-braces
in case anything sneaks past the allow gate.
bus-only note made post-mortems require the web UI / events sqlite;
now stderr lines also land in 'journalctl -M <container> -b' alongside
the existing LiveEvent::Note for the dashboard.
socket client now retries connect/IO failures with 2-4-8-16-30s
backoffs (60s total budget). transparent for non-tool callers via
request(); tool handlers go through request_retried() which also
returns the retry count, then annotate_retries() appends a one-line
note to the tool result so claude knows the slow round-trip was a
c0re flicker, not a content failure — avoids burning tokens on an
LLM-level retry.
new hyperhive.claudePlugins NixOS option (list of strings) rendered
to /etc/hyperhive/claude-plugins.json. both hive-ag3nt and hive-m1nd
shell out 'claude plugin install <spec>' for each entry once at
startup before the turn loop opens. failures log a warning but don't
abort boot.