apply_commit handles first-time spawns, request_spawn deprecated

This commit is contained in:
damocles 2026-05-22 09:20:50 +02:00
parent 6974634326
commit 66f1568e8f
6 changed files with 166 additions and 34 deletions

View file

@ -203,8 +203,9 @@ read them à la carte.
`actions.rs` and `dashboard.rs::lifecycle_action`, so the two `actions.rs` and `dashboard.rs::lifecycle_action`, so the two
surfaces never drift. surfaces never drift.
- **Two-step spawn:** `request_init_config` → edit `agent.nix` - **Two-step spawn:** `request_init_config` → edit `agent.nix`
`request_spawn`. Never submit a Spawn approval without first `request_apply_commit`. The first apply_commit creates the
reviewing the config template. container; subsequent ones rebuild it. `request_spawn` still
works but is deprecated.
- **Rate-limit sentinel:** `{state_dir}/hyperhive-rate-limited` - **Rate-limit sentinel:** `{state_dir}/hyperhive-rate-limited`
is written by the harness on 429 and cleared on retry. is written by the harness on 429 and cleared on retry.
`ContainerView.rate_limited` reads it for the dashboard badge. `ContainerView.rate_limited` reads it for the dashboard badge.

View file

@ -4,15 +4,14 @@ Tools (hyperhive surface):
- `mcp__hyperhive__recv(wait_seconds?, max?)` — drain inbox messages. Without `wait_seconds` (or with `0`) it returns immediately — a cheap inbox peek you can drop between actions. To **wait** when you have nothing else to do, call with a long wait (e.g. `wait_seconds: 180`, the max) — you'll wake instantly on new work, otherwise return after the timeout. Use that instead of ending the turn or sleeping in a Bash command. `max` (default 1, cap 32) drains several queued messages in one call. - `mcp__hyperhive__recv(wait_seconds?, max?)` — drain inbox messages. Without `wait_seconds` (or with `0`) it returns immediately — a cheap inbox peek you can drop between actions. To **wait** when you have nothing else to do, call with a long wait (e.g. `wait_seconds: 180`, the max) — you'll wake instantly on new work, otherwise return after the timeout. Use that instead of ending the turn or sleeping in a Bash command. `max` (default 1, cap 32) drains several queued messages in one call.
- `mcp__hyperhive__send(to, body)` — message an agent (by name), another peer, or the operator (`operator` surfaces in the dashboard). Use `to: "*"` to broadcast to all agents (they receive a hint that it's a broadcast and may not need action). - `mcp__hyperhive__send(to, body)` — message an agent (by name), another peer, or the operator (`operator` surfaces in the dashboard). Use `to: "*"` to broadcast to all agents (they receive a hint that it's a broadcast and may not need action).
- `mcp__hyperhive__request_init_config(name, description?)`**step 1 of a two-step spawn.** Queues an `InitConfig` approval (≤9 char name). On operator approve, hive-c0re seeds the proposed config repo at `/agents/<name>/config/` with a default `agent.nix` template and delivers a `config_ready` system event to your inbox. You then review, edit, and commit `agent.nix` before calling `request_spawn`. - `mcp__hyperhive__request_init_config(name, description?)`**step 1 of spawning a new agent.** Queues an `InitConfig` approval (≤9 char name). On operator approve, hive-c0re seeds the proposed config repo at `/agents/<name>/config/` with a default `agent.nix` template and delivers a `config_ready` system event to your inbox. You then review, edit, and commit `agent.nix` before calling `request_apply_commit`.
- `mcp__hyperhive__request_spawn(name, description?)` — **step 2 of a two-step spawn.** Queues a Spawn approval; requires the proposed config repo to already exist (from a prior approved `request_init_config`). On operator approve, hive-c0re creates the container. Pass an optional `description` for the dashboard card. - `mcp__hyperhive__request_apply_commit(agent, commit_ref, description?)` — **step 2 of spawning a new agent, and the only step for config changes.** Submit a commit sha from the agent's proposed config repo for operator approval. For a new agent this creates the container; for an existing agent it rebuilds with the new config. At submit time hive-c0re pins the commit as `proposal/<id>` — your proposed branch can continue moving freely without affecting what the operator will build.
- `mcp__hyperhive__kill(name)` — graceful stop on a sub-agent. No approval required. - `mcp__hyperhive__kill(name)` — graceful stop on a sub-agent. No approval required.
- `mcp__hyperhive__start(name)` — start a stopped sub-agent. No approval required. - `mcp__hyperhive__start(name)` — start a stopped sub-agent. No approval required.
- `mcp__hyperhive__restart(name)` — stop + start a sub-agent. No approval required. - `mcp__hyperhive__restart(name)` — stop + start a sub-agent. No approval required.
- `mcp__hyperhive__update(name)` — rebuild a sub-agent (re-applies the current hyperhive flake + agent.nix, restarts the container). No approval required — idempotent. Use when you receive a `needs_update` system event. - `mcp__hyperhive__update(name)` — rebuild a sub-agent (re-applies the current hyperhive flake + agent.nix, restarts the container). No approval required — idempotent. Use when you receive a `needs_update` system event.
- `mcp__hyperhive__request_update_meta_inputs(inputs?, description?)` — queue an approval for the operator to run `nix flake update [inputs...]` on the meta flake. Pass specific input names (e.g. `["bitburner-agent"]`) or omit / pass `[]` for all inputs. Returns immediately; lock update runs on operator approval. Does NOT trigger rebuilds — call `update(name)` on affected agents after approval resolves. - `mcp__hyperhive__request_update_meta_inputs(inputs?, description?)` — queue an approval for the operator to run `nix flake update [inputs...]` on the meta flake. Pass specific input names (e.g. `["bitburner-agent"]`) or omit / pass `[]` for all inputs. Returns immediately; lock update runs on operator approval. Does NOT trigger rebuilds — call `update(name)` on affected agents after approval resolves.
- `mcp__hyperhive__get_logs(agent, lines?)` — fetch recent journal lines for a sub-agent container. Use to diagnose MCP-server registration failures, startup crashes, or harness issues you can't see from inside. Pass the plain logical agent name; `lines` defaults to 50 (capped at 500). - `mcp__hyperhive__get_logs(agent, lines?)` — fetch recent journal lines for a sub-agent container. Use to diagnose MCP-server registration failures, startup crashes, or harness issues you can't see from inside. Pass the plain logical agent name; `lines` defaults to 50 (capped at 500).
- `mcp__hyperhive__request_apply_commit(agent, commit_ref, description?)` — submit a config change for any agent (`hm1nd` for self) for operator approval. Pass an optional `description` and it appears on the dashboard approval card so the operator knows what changed without opening the diff. At submit time hive-c0re fetches your commit into the agent's applied repo and pins it as `proposal/<id>`; from that moment your proposed-side commit can be amended or force-pushed freely without changing what the operator will build.
- `mcp__hyperhive__ask(question, options?, multi?, ttl_seconds?, to?)` — surface a structured question to the operator (default, or `to: "operator"`) OR a sub-agent (`to: "<agent-name>"`). Returns immediately with a question id; the answer arrives later as a system `question_answered { id, question, answer, answerer }` event in your inbox. Options are advisory: the dashboard always lets the operator type a free-text answer in addition. Set `multi: true` to render options as checkboxes (operator can pick multiple); the answer comes back as `, `-separated. Set `ttl_seconds` to auto-cancel after a deadline (capped at 6h server-side) — on expiry the answer is `[expired]` and `answerer` is `"ttl-watchdog"`. Do not poll inside the same turn — finish the current work and react when the event lands. - `mcp__hyperhive__ask(question, options?, multi?, ttl_seconds?, to?)` — surface a structured question to the operator (default, or `to: "operator"`) OR a sub-agent (`to: "<agent-name>"`). Returns immediately with a question id; the answer arrives later as a system `question_answered { id, question, answer, answerer }` event in your inbox. Options are advisory: the dashboard always lets the operator type a free-text answer in addition. Set `multi: true` to render options as checkboxes (operator can pick multiple); the answer comes back as `, `-separated. Set `ttl_seconds` to auto-cancel after a deadline (capped at 6h server-side) — on expiry the answer is `[expired]` and `answerer` is `"ttl-watchdog"`. Do not poll inside the same turn — finish the current work and react when the event lands.
- `mcp__hyperhive__answer(id, answer)` — answer a question that was routed to YOU (a sub-agent did `ask(to: "manager", ...)`). The triggering event in your inbox is `question_asked { id, asker, question, options, multi }`. The answer surfaces in the asker's inbox as a `question_answered` event. - `mcp__hyperhive__answer(id, answer)` — answer a question that was routed to YOU (a sub-agent did `ask(to: "manager", ...)`). The triggering event in your inbox is `question_asked { id, asker, question, options, multi }`. The answer surfaces in the asker's inbox as a `question_answered` event.
- `mcp__hyperhive__get_loose_ends(agent?)` — loose ends. Omit `agent` for your own: pending approvals you submitted + unanswered questions where you are asker/target + your own pending reminders. Pass `agent: "*"` for a hive-wide sweep — every pending approval, unanswered question, and reminder across the swarm — to find stalled threads (sub-agent A asked B something three days ago and B never answered) before they rot. Pass `agent: "<name>"` to inspect one agent's threads. Cheap server-side query. - `mcp__hyperhive__get_loose_ends(agent?)` — loose ends. Omit `agent` for your own: pending approvals you submitted + unanswered questions where you are asker/target + your own pending reminders. Pass `agent: "*"` for a hive-wide sweep — every pending approval, unanswered question, and reminder across the swarm — to find stalled threads (sub-agent A asked B something three days ago and B never answered) before they rot. Pass `agent: "<name>"` to inspect one agent's threads. Cheap server-side query.
@ -20,7 +19,7 @@ Tools (hyperhive surface):
- `mcp__hyperhive__remind(message, delay_seconds? | at_unix_timestamp?, file_path?)` — schedule a message to land in your own inbox at a future time (sender shows as `reminder`). Set exactly one of `delay_seconds` (relative) or `at_unix_timestamp` (absolute). Good for deadline follow-ups — "check whether agent X answered the question I relayed". Large payloads auto-spill to a file under `/state/reminders/`; pass `file_path` to control the destination. - `mcp__hyperhive__remind(message, delay_seconds? | at_unix_timestamp?, file_path?)` — schedule a message to land in your own inbox at a future time (sender shows as `reminder`). Set exactly one of `delay_seconds` (relative) or `at_unix_timestamp` (absolute). Good for deadline follow-ups — "check whether agent X answered the question I relayed". Large payloads auto-spill to a file under `/state/reminders/`; pass `file_path` to control the destination.
- `mcp__hyperhive__whoami()` — self-introspection: canonical name (`manager`), role, current hyperhive rev. No args. Useful for boot announcements and cross-agent attribution that won't drift across config reloads. - `mcp__hyperhive__whoami()` — self-introspection: canonical name (`manager`), role, current hyperhive rev. No args. Useful for boot announcements and cross-agent attribution that won't drift across config reloads.
Approval boundary: lifecycle ops on *existing* sub-agents (`kill`, `start`, `restart`) are at your discretion — no operator approval. *Creating* a new agent (`request_spawn`) and *changing* any agent's config (`request_apply_commit`) still go through the approval queue. The operator only signs off on changes; you run the day-to-day. Approval boundary: lifecycle ops on *existing* sub-agents (`kill`, `start`, `restart`) are at your discretion — no operator approval. *Creating* a new agent (two-step: `request_init_config` + `request_apply_commit`) and *changing* any agent's config (`request_apply_commit`) both go through the approval queue. The operator only signs off on changes; you run the day-to-day.
Your own editable config lives at `/agents/hm1nd/config/`; every sub-agent's lives at `/agents/<name>/config/`. `agent.nix` is a plain NixOS module function — `{ config, pkgs, lib, flakeInputs, ... }: { ... }`. Add packages, services, imports, sibling `.nix` files; the whole committed tree gets deployed together. Your own editable config lives at `/agents/hm1nd/config/`; every sub-agent's lives at `/agents/<name>/config/`. `agent.nix` is a plain NixOS module function — `{ config, pkgs, lib, flakeInputs, ... }: { ... }`. Add packages, services, imports, sibling `.nix` files; the whole committed tree gets deployed together.
@ -74,7 +73,7 @@ Two ways to talk to the operator: `send(to: "operator", ...)` for fire-and-forge
Messages from sender `system` are hyperhive helper events (JSON body, `event` field discriminates): `approval_resolved`, `config_ready`, `spawned`, `rebuilt`, `killed`, `destroyed`, `container_crash`, `needs_login`, `logged_in`, `needs_update`, `question_asked`, `question_answered`. Use these to react to lifecycle changes: Messages from sender `system` are hyperhive helper events (JSON body, `event` field discriminates): `approval_resolved`, `config_ready`, `spawned`, `rebuilt`, `killed`, `destroyed`, `container_crash`, `needs_login`, `logged_in`, `needs_update`, `question_asked`, `question_answered`. Use these to react to lifecycle changes:
- `config_ready` — the proposed config repo for a new agent was just seeded (post-`InitConfig` approval). Review and edit `/agents/<agent>/config/agent.nix`, commit your changes, then call `request_spawn` to proceed to the container-creation approval. - `config_ready` — the proposed config repo for a new agent was just seeded (post-`InitConfig` approval). Review and edit `/agents/<agent>/config/agent.nix`, commit your changes, then call `request_apply_commit` with the commit sha — this will create the container on approval (first spawn) and rebuild on every subsequent deploy.
- `needs_login` — agent has no claude session yet. You can't help directly (login is interactive OAuth on the operator side); flag the operator if it's been long. - `needs_login` — agent has no claude session yet. You can't help directly (login is interactive OAuth on the operator side); flag the operator if it's been long.
- `logged_in` — agent just completed login; first useful turn is imminent. Good time to brief them on what to do. - `logged_in` — agent just completed login; first useful turn is imminent. Good time to brief them on what to do.
- `needs_update` — agent's flake rev is stale. Call `update(name)` to rebuild — it's idempotent and doesn't need approval. - `needs_update` — agent's flake rev is stale. Call `update(name)` to rebuild — it's idempotent and doesn't need approval.

View file

@ -41,7 +41,7 @@ pub async fn approve(coord: Arc<Coordinator>, id: i64) -> Result<()> {
match approval.kind { match approval.kind {
ApprovalKind::ApplyCommit => { ApprovalKind::ApplyCommit => {
let (result, terminal_tag) = run_apply_commit( let (result, terminal_tag, is_first_spawn) = run_apply_commit(
&coord, &coord,
&approval, &approval,
&agent_dir, &agent_dir,
@ -55,7 +55,21 @@ pub async fn approve(coord: Arc<Coordinator>, id: i64) -> Result<()> {
if let Err(e) = crate::forge::push_config(&approval.agent).await { if let Err(e) = crate::forge::push_config(&approval.agent).await {
tracing::warn!(agent = %approval.agent, error = ?e, "forge: push_config after apply failed"); tracing::warn!(agent = %approval.agent, error = ?e, "forge: push_config after apply failed");
} }
finish_approval(&coord, &approval, result, terminal_tag) if is_first_spawn && result.is_ok() {
// First-spawn bookkeeping: create the per-agent forge user
// and mirror the applied repo into agent-configs/<n>.
if let Err(e) = crate::forge::ensure_user_for(&approval.agent).await {
tracing::warn!(agent = %approval.agent, error = ?e, "forge: ensure_user after first spawn failed");
}
if let Err(e) = crate::forge::ensure_config_repo(&approval.agent).await {
tracing::warn!(agent = %approval.agent, error = ?e, "forge: ensure_config_repo after first spawn failed");
}
// New container row appeared — rescan so the dashboard
// reflects the post-spawn state without a manual refetch.
coord.rescan_containers_and_emit().await;
crate::dashboard::emit_tombstones_snapshot(&coord).await;
}
finish_approval(&coord, &approval, result, terminal_tag, is_first_spawn)
} }
ApprovalKind::InitConfig => { ApprovalKind::InitConfig => {
// Seed the proposed config repo. Runs synchronously — it's just // Seed the proposed config repo. Runs synchronously — it's just
@ -67,7 +81,7 @@ pub async fn approve(coord: Arc<Coordinator>, id: i64) -> Result<()> {
Ok(()) Ok(())
} }
.await; .await;
finish_approval(&coord, &approval, result, None) finish_approval(&coord, &approval, result, None, false)
} }
ApprovalKind::UpdateMetaInputs => { ApprovalKind::UpdateMetaInputs => {
// Decode the inputs from the commit_ref field (stored as JSON // Decode the inputs from the commit_ref field (stored as JSON
@ -117,7 +131,7 @@ pub async fn approve(coord: Arc<Coordinator>, id: i64) -> Result<()> {
tracing::warn!(agent = %agent_bg, error = ?e, "forge: push_config after spawn failed"); tracing::warn!(agent = %agent_bg, error = ?e, "forge: push_config after spawn failed");
} }
} }
if let Err(e) = finish_approval(&coord_bg, &approval_bg, result, None) { if let Err(e) = finish_approval(&coord_bg, &approval_bg, result, None, false) {
tracing::warn!(agent = %agent_bg, error = ?e, "spawn approval failed"); tracing::warn!(agent = %agent_bg, error = ?e, "spawn approval failed");
} }
// New container row appeared (or didn't, on failure // New container row appeared (or didn't, on failure
@ -139,6 +153,7 @@ fn finish_approval(
approval: &hive_sh4re::Approval, approval: &hive_sh4re::Approval,
result: Result<()>, result: Result<()>,
terminal_tag: Option<String>, terminal_tag: Option<String>,
is_first_spawn: bool,
) -> Result<()> { ) -> Result<()> {
let (status, note, ok) = match &result { let (status, note, ok) = match &result {
Ok(()) => (ApprovalStatus::Approved, None, true), Ok(()) => (ApprovalStatus::Approved, None, true),
@ -201,6 +216,14 @@ fn finish_approval(
note, note,
sha: approval.fetched_sha.clone(), sha: approval.fetched_sha.clone(),
}), }),
ApprovalKind::ApplyCommit if is_first_spawn => {
coord.notify_manager(&HelperEvent::Spawned {
agent: approval.agent.clone(),
ok,
note,
sha: approval.fetched_sha.clone(),
});
}
ApprovalKind::ApplyCommit => coord.notify_manager(&HelperEvent::Rebuilt { ApprovalKind::ApplyCommit => coord.notify_manager(&HelperEvent::Rebuilt {
agent: approval.agent.clone(), agent: approval.agent.clone(),
ok, ok,
@ -232,10 +255,14 @@ async fn run_apply_commit(
applied_dir: &std::path::Path, applied_dir: &std::path::Path,
claude_dir: &std::path::Path, claude_dir: &std::path::Path,
notes_dir: &std::path::Path, notes_dir: &std::path::Path,
) -> (Result<()>, Option<String>) { ) -> (Result<()>, Option<String>, bool) {
let id = approval.id; let id = approval.id;
let proposal_ref = format!("refs/tags/proposal/{id}"); let proposal_ref = format!("refs/tags/proposal/{id}");
// Detect first spawn before we touch anything so we can branch on it
// throughout this function.
let is_first_spawn = !lifecycle::container_exists(&approval.agent).await;
// Defensive: submit-time should have planted proposal/<id>, but if // Defensive: submit-time should have planted proposal/<id>, but if
// the row was migrated from an older schema or the tag got pruned // the row was migrated from an older schema or the tag got pruned
// we fail early with a clear note rather than building a stale // we fail early with a clear note rather than building a stale
@ -246,6 +273,7 @@ async fn run_apply_commit(
"missing proposal tag {proposal_ref}: {e:#}" "missing proposal tag {proposal_ref}: {e:#}"
)), )),
None, None,
is_first_spawn,
); );
} }
@ -253,16 +281,30 @@ async fn run_apply_commit(
// (and the meta lock indirectly) back if the build fails. // (and the meta lock indirectly) back if the build fails.
let prev_main_sha = match lifecycle::git_rev_parse(applied_dir, "refs/heads/main").await { let prev_main_sha = match lifecycle::git_rev_parse(applied_dir, "refs/heads/main").await {
Ok(s) => s, Ok(s) => s,
Err(e) => return (Err(anyhow::anyhow!("read applied/main: {e:#}")), None), Err(e) => {
return (
Err(anyhow::anyhow!("read applied/main: {e:#}")),
None,
is_first_spawn,
)
}
}; };
if let Err(e) = lifecycle::git_tag(applied_dir, &format!("approved/{id}"), &proposal_ref).await if let Err(e) = lifecycle::git_tag(applied_dir, &format!("approved/{id}"), &proposal_ref).await
{ {
return (Err(anyhow::anyhow!("plant approved/{id}: {e:#}")), None); return (
Err(anyhow::anyhow!("plant approved/{id}: {e:#}")),
None,
is_first_spawn,
);
} }
if let Err(e) = lifecycle::git_tag(applied_dir, &format!("building/{id}"), &proposal_ref).await if let Err(e) = lifecycle::git_tag(applied_dir, &format!("building/{id}"), &proposal_ref).await
{ {
return (Err(anyhow::anyhow!("plant building/{id}: {e:#}")), None); return (
Err(anyhow::anyhow!("plant building/{id}: {e:#}")),
None,
is_first_spawn,
);
} }
// Fast-forward applied/main to proposal/<id> + sync the working // Fast-forward applied/main to proposal/<id> + sync the working
@ -274,23 +316,70 @@ async fn run_apply_commit(
return ( return (
Err(anyhow::anyhow!("ff main to {proposal_ref}: {e:#}")), Err(anyhow::anyhow!("ff main to {proposal_ref}: {e:#}")),
None, None,
is_first_spawn,
); );
} }
if let Err(e) = lifecycle::git_read_tree_reset(applied_dir, "refs/heads/main").await { if let Err(e) = lifecycle::git_read_tree_reset(applied_dir, "refs/heads/main").await {
// main is ahead; working tree didn't sync. Roll main back to // main is ahead; working tree didn't sync. Roll main back to
// keep the two consistent before bailing. // keep the two consistent before bailing.
let _ = lifecycle::git_update_ref(applied_dir, "refs/heads/main", &prev_main_sha).await; let _ = lifecycle::git_update_ref(applied_dir, "refs/heads/main", &prev_main_sha).await;
return (Err(anyhow::anyhow!("read-tree to main: {e:#}")), None); return (
Err(anyhow::anyhow!("read-tree to main: {e:#}")),
None,
is_first_spawn,
);
}
// First spawn: sync_agents must add this agent to the meta flake
// before prepare_deploy can update its input lock (which won't
// exist yet if this is the agent's first deploy).
if is_first_spawn {
let agents = match lifecycle::agents_for_meta_listing_with(&approval.agent).await {
Ok(a) => a,
Err(e) => {
let _ =
lifecycle::git_update_ref(applied_dir, "refs/heads/main", &prev_main_sha)
.await;
let _ = lifecycle::git_read_tree_reset(applied_dir, "refs/heads/main").await;
return (
Err(anyhow::anyhow!("agents_for_meta_listing_with: {e:#}")),
None,
is_first_spawn,
);
}
};
if let Err(e) = crate::meta::sync_agents(
&coord.hyperhive_flake,
coord.dashboard_port,
&coord.operator_pronouns,
&coord.context_window_tokens,
&agents,
)
.await
{
let _ =
lifecycle::git_update_ref(applied_dir, "refs/heads/main", &prev_main_sha).await;
let _ = lifecycle::git_read_tree_reset(applied_dir, "refs/heads/main").await;
return (
Err(anyhow::anyhow!("meta sync_agents for first spawn: {e:#}")),
None,
is_first_spawn,
);
}
} }
// Phase 1 of the meta two-phase deploy: relock without committing. // Phase 1 of the meta two-phase deploy: relock without committing.
if let Err(e) = crate::meta::prepare_deploy(&approval.agent).await { if let Err(e) = crate::meta::prepare_deploy(&approval.agent).await {
let _ = lifecycle::git_update_ref(applied_dir, "refs/heads/main", &prev_main_sha).await; let _ = lifecycle::git_update_ref(applied_dir, "refs/heads/main", &prev_main_sha).await;
let _ = lifecycle::git_read_tree_reset(applied_dir, "refs/heads/main").await; let _ = lifecycle::git_read_tree_reset(applied_dir, "refs/heads/main").await;
return (Err(anyhow::anyhow!("meta prepare_deploy: {e:#}")), None); return (
Err(anyhow::anyhow!("meta prepare_deploy: {e:#}")),
None,
is_first_spawn,
);
} }
// Container-level rebuild against meta#<name>. // Container-level rebuild (or first-time create) against meta#<name>.
let build_result = lifecycle::rebuild_no_meta( let build_result = lifecycle::rebuild_no_meta(
&approval.agent, &approval.agent,
agent_dir, agent_dir,
@ -324,7 +413,7 @@ async fn run_apply_commit(
// proposal, agent picks up where it left off with the // proposal, agent picks up where it left off with the
// new env / packages. // new env / packages.
coord.kick_agent(&approval.agent, "config update applied"); coord.kick_agent(&approval.agent, "config update applied");
(Ok(()), Some(tag)) (Ok(()), Some(tag), is_first_spawn)
} }
Err(e) => { Err(e) => {
let tag = format!("failed/{id}"); let tag = format!("failed/{id}");
@ -350,7 +439,7 @@ async fn run_apply_commit(
tracing::warn!(agent = %approval.agent, %id, error = ?ae, "meta abort_deploy failed"); tracing::warn!(agent = %approval.agent, %id, error = ?ae, "meta abort_deploy failed");
} }
let _ = coord; let _ = coord;
(Err(e), Some(tag)) (Err(e), Some(tag), is_first_spawn)
} }
} }
} }

View file

@ -206,6 +206,14 @@ async fn agents_after_spawn(name: &str) -> Result<Vec<crate::meta::AgentSpec>> {
agents_for_meta(Some(name)).await agents_for_meta(Some(name)).await
} }
/// Like `agents_for_meta_listing` but with an extra agent added (for a
/// container that doesn't exist yet). Used by the first-spawn path in
/// `actions::run_apply_commit` to register the new agent in meta before
/// `prepare_deploy` tries to update its input lock.
pub async fn agents_for_meta_listing_with(extra: &str) -> Result<Vec<crate::meta::AgentSpec>> {
agents_for_meta(Some(extra)).await
}
/// Public enumeration of currently-existing agents (whatever /// Public enumeration of currently-existing agents (whatever
/// `nixos-container list` says), sorted, no extras. For callers /// `nixos-container list` says), sorted, no extras. For callers
/// outside this module that need to reseed meta after lifecycle /// outside this module that need to reseed meta after lifecycle
@ -214,6 +222,19 @@ pub async fn agents_for_meta_listing() -> Result<Vec<crate::meta::AgentSpec>> {
agents_for_meta(None).await agents_for_meta(None).await
} }
/// True when the named container already exists (appears in
/// `nixos-container list`). Used by the apply-commit path to decide
/// between first-spawn (`nixos-container create`) and normal rebuild
/// (`nixos-container update`).
pub async fn container_exists(name: &str) -> bool {
let container = container_name(name);
list()
.await
.unwrap_or_default()
.iter()
.any(|c| c == &container)
}
pub async fn kill(name: &str) -> Result<()> { pub async fn kill(name: &str) -> Result<()> {
validate(name)?; validate(name)?;
let container = container_name(name); let container = container_name(name);
@ -314,13 +335,24 @@ pub async fn rebuild_no_meta(
ensure_state_dir(notes_dir)?; ensure_state_dir(notes_dir)?;
let container = container_name(name); let container = container_name(name);
let flake_ref = format!("{}#{name}", crate::meta::meta_dir().display()); let flake_ref = format!("{}#{name}", crate::meta::meta_dir().display());
if container_exists(name).await {
// Existing container: update nspawn flags, then rebuild + restart
// so any bind-mount / networking changes take effect.
set_nspawn_flags(&container, agent_dir, claude_dir, notes_dir)?; set_nspawn_flags(&container, agent_dir, claude_dir, notes_dir)?;
set_resource_limits(&container)?; set_resource_limits(&container)?;
systemd_daemon_reload().await?; systemd_daemon_reload().await?;
run(&["update", &container, "--flake", &flake_ref]).await?; run(&["update", &container, "--flake", &flake_ref]).await?;
// Restart so any nspawn-level changes (bind mounts, networking, etc.) apply.
run(&["stop", &container]).await?; run(&["stop", &container]).await?;
run(&["start", &container]).await run(&["start", &container]).await
} else {
// First spawn: create the container first (which writes the nspawn
// conf file), then overwrite with our flags and start.
run(&["create", &container, "--flake", &flake_ref]).await?;
set_nspawn_flags(&container, agent_dir, claude_dir, notes_dir)?;
set_resource_limits(&container)?;
systemd_daemon_reload().await?;
run(&["start", &container]).await
}
} }
pub async fn list() -> Result<Vec<String>> { pub async fn list() -> Result<Vec<String>> {

View file

@ -561,10 +561,13 @@ async fn submit_apply_commit(
); );
} }
if !applied_dir.join(".git").exists() { if !applied_dir.join(".git").exists() {
anyhow::bail!( // First deploy: seed the applied repo from proposed so we can plant
"applied repo at {} is uninitialised — spawn the agent first", // the proposal/<id> tag below. The applied repo starts at the
applied_dir.display() // template commit (deployed/0); run_apply_commit will fast-forward
); // main to the manager's commit on approval and create the container.
lifecycle::setup_applied(&applied_dir, Some(&proposed_dir), agent)
.await
.context("seed applied repo for first spawn")?;
} }
let id = coord let id = coord
.approvals .approvals

View file

@ -590,7 +590,9 @@ pub enum HelperEvent {
}, },
/// A new agent's proposed config repo was initialised (post-`InitConfig` /// A new agent's proposed config repo was initialised (post-`InitConfig`
/// approval). The manager can now edit `/agents/<agent>/config/agent.nix`, /// approval). The manager can now edit `/agents/<agent>/config/agent.nix`,
/// commit the changes, and submit a `RequestSpawn` to create the container. /// commit the changes, and submit a `RequestApplyCommit` — which will
/// create the container on first deploy, then rebuild on every subsequent
/// deploy.
ConfigReady { agent: String }, ConfigReady { agent: String },
/// A sub-agent's container was stopped (the systemd unit is down; /// A sub-agent's container was stopped (the systemd unit is down;
/// persistent state is unchanged). /// persistent state is unchanged).
@ -691,9 +693,11 @@ pub enum ManagerRequest {
/// approval for the operator to review. On approval hive-c0re seeds /// approval for the operator to review. On approval hive-c0re seeds
/// `/agents/<name>/config/` with the default `agent.nix` template, /// `/agents/<name>/config/` with the default `agent.nix` template,
/// giving the manager RW access so it can customise the config and /// giving the manager RW access so it can customise the config and
/// commit changes before calling `request_spawn`. Fails if a proposed /// commit changes. After the `ConfigReady` event arrives, edit
/// `agent.nix`, commit, and call `request_apply_commit` — which
/// creates the container on the first deploy. Fails if a proposed
/// repo for this name already exists (use `request_apply_commit` to /// repo for this name already exists (use `request_apply_commit` to
/// update an existing agent's config). Must precede `request_spawn`. /// update an existing agent's config).
RequestInitConfig { RequestInitConfig {
name: String, name: String,
/// Optional description shown on the dashboard approval card. /// Optional description shown on the dashboard approval card.
@ -704,6 +708,10 @@ pub enum ManagerRequest {
/// creates and starts the container. Requires a prior approved /// creates and starts the container. Requires a prior approved
/// `request_init_config` so the manager can customise `agent.nix` first. /// `request_init_config` so the manager can customise `agent.nix` first.
/// Fails if the proposed config repo for this name does not exist yet. /// Fails if the proposed config repo for this name does not exist yet.
///
/// Deprecated: prefer `request_apply_commit` after `config_ready` —
/// it pins the exact commit sha, handles both first-time spawns and
/// subsequent rebuilds, and always reflects the committed config.
RequestSpawn { RequestSpawn {
name: String, name: String,
/// Optional description shown on the dashboard approval card. /// Optional description shown on the dashboard approval card.