phase 8 step 1: per-agent claude creds bind + destroy keeps state

This commit is contained in:
müde 2026-05-15 12:39:22 +02:00
parent 0fc287c768
commit a42fdb3a5c
9 changed files with 158 additions and 24 deletions

View file

@ -140,10 +140,13 @@ docs/damocles-migration.md options for moving damocles onto hyperhive
(stable lags too far). The overlay imports unstable with its own
`allowUnfreePredicate` so the access inside the overlay doesn't itself trip.
- **Claude credentials are stateful and per-container.** No `ANTHROPIC_API_KEY`
env var path. For now: `nixos-container root-login h-<name>``claude`
(interactive) → log in once. The harness falls back to echo replies when
`claude --print` fails. Future: bind-mount a shared `~/.claude` dir from the
host so creds survive container destroy/recreate.
env var path. Today's stopgap: `nixos-container root-login h-<name>`
`claude` (interactive) → log in once. The harness falls back to echo
replies when `claude --print` fails. **Phase 8** moves this to a per-agent
persistent dir at `/var/lib/hyperhive/agents/<name>/claude/` bind-mounted
into the container, with the interactive login driven from the agent's web
UI. Sharing one `~/.claude` across agents is NOT viable — OAuth refresh
tokens rotate, so any sibling refresh invalidates all the others.
- **Echo guard.** `hive-ag3nt serve` skips auto-reply when the incoming body
starts with `"echo: "`. Prevents ping-pong loops when both sides fall back
to echo. Real conversations between claude-backed agents *will* runaway —
@ -217,6 +220,34 @@ already.
`set_nspawn_flags` so sub-agent web UI ports are reachable on the host
- `HYPERHIVE_GIT` env var (absolute path) bypasses PATH ambiguity
## Phase 8 — real claude in containers + login UX (in progress)
See PLAN.md → "Phase 8" for the full design. Summary:
- **Per-agent persistent creds dir.** Bind
`/var/lib/hyperhive/agents/<name>/claude/``/root/.claude` (RW) in
`set_nspawn_flags`. One OAuth lineage per agent; refresh rotations stay
contained to that agent.
- **State dirs persist by default.** `destroy` keeps
`/var/lib/hyperhive/agents/<name>/` unless the operator passes an explicit
wipe flag. Recreating an agent of the same name reuses prior creds.
- **First spawn is approval-gated.** New agent names go through the same
approval queue as config edits. Dashboard shows a spinner during
`nixos-container create` + `update` + `start`.
- **"needs login" partial-run state.** No valid session in `~/.claude/`
harness binds the web UI but does NOT start the turn loop. Dashboard
surfaces this state per-agent.
- **Login from the per-agent web UI.** Spawn `claude /login` with plain
stdio pipes (no PTY initially), surface the OAuth URL from stdout on the
page, accept the resulting code via a paste field, write it to the process
stdin. On success, harness transitions out of "needs login" and enters the
turn loop. If pipes turn out to be insufficient (claude refuses without a
TTY, raw-mode input, ANSI-only output) we redo the backend with a PTY.
Implementation order: bind-mount/dir creation → approval-gated spawn +
spinner → "needs login" partial run → PTY login endpoint. The login UI has
nowhere to live until the partial-run mode exists, so don't ship it earlier.
## Approval flow
End-to-end: manager edits per-agent `proposed` repo → commits → submits commit