phase 8 step 1: per-agent claude creds bind + destroy keeps state

2026-05-15 12:39:22 +02:00 · 2026-05-15 12:39:22 +02:00 · a42fdb3a5c
commit a42fdb3a5c
parent 0fc287c768
9 changed files with 158 additions and 24 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -140,10 +140,13 @@ docs/damocles-migration.md   options for moving damocles onto hyperhive
  (stable lags too far). The overlay imports unstable with its own
  `allowUnfreePredicate` so the access inside the overlay doesn't itself trip.
 - **Claude credentials are stateful and per-container.** No `ANTHROPIC_API_KEY`
-  env var path. For now: `nixos-container root-login h-<name>` → `claude`
-  (interactive) → log in once. The harness falls back to echo replies when
-  `claude --print` fails. Future: bind-mount a shared `~/.claude` dir from the
-  host so creds survive container destroy/recreate.
+  env var path. Today's stopgap: `nixos-container root-login h-<name>` →
+  `claude` (interactive) → log in once. The harness falls back to echo
+  replies when `claude --print` fails. **Phase 8** moves this to a per-agent
+  persistent dir at `/var/lib/hyperhive/agents/<name>/claude/` bind-mounted
+  into the container, with the interactive login driven from the agent's web
+  UI. Sharing one `~/.claude` across agents is NOT viable — OAuth refresh
+  tokens rotate, so any sibling refresh invalidates all the others.
 - **Echo guard.** `hive-ag3nt serve` skips auto-reply when the incoming body
  starts with `"echo: "`. Prevents ping-pong loops when both sides fall back
  to echo. Real conversations between claude-backed agents *will* runaway —
@ -217,6 +220,34 @@ already.
    `set_nspawn_flags` so sub-agent web UI ports are reachable on the host
  - `HYPERHIVE_GIT` env var (absolute path) bypasses PATH ambiguity

+## Phase 8 — real claude in containers + login UX (in progress)
+
+See PLAN.md → "Phase 8" for the full design. Summary:
+
+- **Per-agent persistent creds dir.** Bind
+  `/var/lib/hyperhive/agents/<name>/claude/` → `/root/.claude` (RW) in
+  `set_nspawn_flags`. One OAuth lineage per agent; refresh rotations stay
+  contained to that agent.
+- **State dirs persist by default.** `destroy` keeps
+  `/var/lib/hyperhive/agents/<name>/` unless the operator passes an explicit
+  wipe flag. Recreating an agent of the same name reuses prior creds.
+- **First spawn is approval-gated.** New agent names go through the same
+  approval queue as config edits. Dashboard shows a spinner during
+  `nixos-container create` + `update` + `start`.
+- **"needs login" partial-run state.** No valid session in `~/.claude/` →
+  harness binds the web UI but does NOT start the turn loop. Dashboard
+  surfaces this state per-agent.
+- **Login from the per-agent web UI.** Spawn `claude /login` with plain
+  stdio pipes (no PTY initially), surface the OAuth URL from stdout on the
+  page, accept the resulting code via a paste field, write it to the process
+  stdin. On success, harness transitions out of "needs login" and enters the
+  turn loop. If pipes turn out to be insufficient (claude refuses without a
+  TTY, raw-mode input, ANSI-only output) we redo the backend with a PTY.
+
+Implementation order: bind-mount/dir creation → approval-gated spawn +
+spinner → "needs login" partial run → PTY login endpoint. The login UI has
+nowhere to live until the partial-run mode exists, so don't ship it earlier.
+
 ## Approval flow

 End-to-end: manager edits per-agent `proposed` repo → commits → submits commit