phase 8 step 1: per-agent claude creds bind + destroy keeps state
This commit is contained in:
parent
0fc287c768
commit
a42fdb3a5c
9 changed files with 158 additions and 24 deletions
57
PLAN.md
57
PLAN.md
|
|
@ -99,7 +99,13 @@ A multi-Claude-Code-agent setup on a single host:
|
|||
|
||||
**Manager concurrency = event loop.** `hive-m1nd` pulls from a heterogeneous `next_event` stream: inbound agent messages, replies to sync sends, lifecycle events from `hive-c0re` (crash, OOM, approval-resolved), and dashboard signals. One queue, claude turn per event.
|
||||
|
||||
**Anthropic credentials.** Shared key on host, bind-mounted into every container. No per-agent keys in v1.
|
||||
**Anthropic credentials.** ~~Shared key on host~~ — revised in Phase 8.
|
||||
Per-agent persistent `~/.claude/` dir bind-mounted from
|
||||
`/var/lib/hyperhive/agents/<name>/claude/`. OAuth refresh tokens rotate, so
|
||||
sharing across agents is a non-starter (any sibling refresh invalidates all
|
||||
the others). One interactive login per agent, ever; creds survive
|
||||
`destroy`/recreate by default. Login flow runs from the per-agent web UI
|
||||
(see Phase 8).
|
||||
|
||||
**Workdir bootstrap.** Each agent's `state/` starts empty. Initial-task message tells the agent what to clone/set up. Manager can drop big artefacts into `state/` directly (it has RW) and pass the path as a message reference.
|
||||
|
||||
|
|
@ -230,6 +236,55 @@ The original open-decisions list, with what we picked:
|
|||
subcommand (`serve` / `spawn` / `kill` / `rebuild` / `list` / `pending` /
|
||||
`approve` / `deny`).
|
||||
|
||||
### ⏳ Phase 8 — real claude in containers + login UX
|
||||
|
||||
Until this lands the harness falls back to the echo path; we've never run an
|
||||
end-to-end turn with a real model in a real container.
|
||||
|
||||
**Credential model.** Per-agent persistent dir at
|
||||
`/var/lib/hyperhive/agents/<name>/claude/` bind-mounted RW to `/root/.claude`
|
||||
inside the container. *Not* shared across agents: OAuth refresh tokens rotate,
|
||||
and sharing one dir means the first refresh by any sibling invalidates all the
|
||||
others. Each agent owns its own token lineage from first login onward.
|
||||
|
||||
**State-dir persistence.** Agent state dirs (including the claude creds dir)
|
||||
persist across `destroy`/recreate by default. The `destroy` verb only purges
|
||||
state when given an explicit "wipe" flag from the operator — recreating an
|
||||
agent of the same name reuses prior creds with no re-login.
|
||||
|
||||
**First-deploy approval.** Spawning a brand-new agent name goes through the
|
||||
existing approval queue (same path as config edits). The dashboard shows a
|
||||
spinner while `nixos-container create` + `update` + `start` run.
|
||||
|
||||
**"needs login" agent state.** If the bound `~/.claude/` has no valid session,
|
||||
the harness boots in a partial mode: per-agent web UI is up, but the turn
|
||||
loop does NOT start. Dashboard surfaces the state per-agent so the operator
|
||||
knows where to click.
|
||||
|
||||
**Login over the per-agent web UI.** No more `nixos-container root-login` for
|
||||
the common case. The agent's web UI exposes a "log in" action that:
|
||||
1. Spawns `claude /login` (or equivalent) inside the container with plain
|
||||
stdio pipes — no PTY unless we discover we need one.
|
||||
2. Reads the OAuth URL from the process stdout and shows it on the page.
|
||||
3. Provides a paste field for the resulting code; writes it to the process
|
||||
stdin.
|
||||
4. On success, transitions out of "needs login" and starts the turn loop.
|
||||
|
||||
If `claude` turns out to require a TTY (refuses on `!isatty()`, uses raw-mode
|
||||
input, or only renders the URL with ANSI styling), redo the backend with a
|
||||
PTY (e.g. `portable-pty`). Don't pre-build for that — start simple.
|
||||
|
||||
**Sequence.** Ship in this order — don't do (4) before (3) or there's nowhere
|
||||
for the login UI to live: (1) bind-mount + per-agent dir creation in
|
||||
`lifecycle::set_nspawn_flags`, (2) approval-gated first spawn + dashboard
|
||||
spinner, (3) harness "needs login" partial-run mode, (4) PTY-backed login
|
||||
endpoint on the per-agent UI.
|
||||
|
||||
**Exit:** spawn a new agent from the dashboard → approve → wait for spinner
|
||||
→ click "log in" on the agent's page → complete OAuth in the browser →
|
||||
paste code → agent enters the turn loop and replies to a T4LK message via
|
||||
real `claude --print`.
|
||||
|
||||
## Polish backlog (not phased)
|
||||
|
||||
See CLAUDE.md → "Polish backlog" for the live list. Highlights: operator
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue