per-agent /state dir for durable notes; manager sees them via /agents

This commit is contained in:
müde 2026-05-15 18:00:08 +02:00
parent 7be64c5e66
commit ff8f8c7c56
10 changed files with 66 additions and 10 deletions

View file

@ -115,6 +115,10 @@ nix/
viable — OAuth refresh tokens rotate, so any sibling refresh invalidates
all the others. Login flow runs from the per-agent web UI; creds persist
across `destroy`/recreate.
- **Persistent notes dir per agent.** `/var/lib/hyperhive/agents/<name>/state/`
bind-mounts to `/state` (RW). System prompts tell agents to keep durable
knowledge here (`/state/notes.md`, anything else under `/state/`).
Survives destroy/recreate alongside the claude dir.
- **Orphan approvals.** If state dirs are wiped out from under a pending
approval (test scripts, manual `rm -rf`), the dashboard's next render
marks them `failed` with note `"agent state dir missing"` so they fall out

View file

@ -24,7 +24,8 @@ host (NixOS, runs hive-c0re.service)
│ └── sockets /run/hyperhive/{host,manager,agents/<n>}/mcp.sock
└── nixos-containers (each bind-mounts its socket dir → /run/hive,
│ its credentials dir → /root/.claude;
│ credentials dir → /root/.claude,
│ durable notes dir → /state;
│ manager additionally gets /agents RW)
├── hm1nd hive-m1nd serve : claude turn loop +

11
TODO.md
View file

@ -36,11 +36,12 @@ Pick anything from here when relevant. Cross-cutting design notes live in
## Loop substance
- **Notes / state persistence.** Per-agent `notes.md` for durable scratch
memory across turns. Compaction-on-overflow runs a separate short-lived
claude session (à la bitburner-agent). The `--continue` session already
gives short-term memory, but notes give cross-session durable knowledge
that isn't lost on a `/compact` boundary.
- **Notes compaction.** `/state/` is bind-mounted persistently and agents
are told (in the system prompt) to keep `/state/notes.md` for durable
knowledge — but we don't currently nudge them to compact when notes
grow. Bitburner-agent's pattern: a short-lived secondary claude session
that takes the existing notes + a "compact this" prompt and rewrites
them in place. Add when the notes start bloating.
## Lifecycle / reliability

View file

@ -7,4 +7,6 @@ Tools (hyperhive surface):
Need new packages, env vars, or other NixOS config for yourself? You can't edit your own config directly — message the manager (recipient `manager`) describing what you need. The manager edits `/agents/{label}/config/agent.nix` on your behalf, commits, and submits an approval that the operator can accept on the dashboard; on approve hive-c0re rebuilds your container with the new config.
Durable knowledge: write to `/state/notes.md` (free-form) or any other path under `/state/`. That directory is bind-mounted from the host and persists across container destroy/recreate — claude's `--continue` session only carries short-term context, but `/state/` is forever. Read it back at the start of relevant turns to remember things across resets.
When your inbox has a message, handle it and stop. Don't narrate intent — act.

View file

@ -12,4 +12,9 @@ Your own editable config lives at `/agents/hm1nd/config/agent.nix`; every sub-ag
Messages from sender `system` are hyperhive helper events (JSON body, `event` field discriminates): `approval_resolved`, `spawned`, `rebuilt`, `killed`, `destroyed`. Use these to react to lifecycle changes — e.g. greet a freshly-spawned agent, retry a failed rebuild, or note the change to the operator.
Durable knowledge:
- Your own: `/state/notes.md` (free-form) or anything else under `/state/`. Bind-mounted from the host — survives destroy/recreate. Claude's `--continue` session only carries short-term context; `/state/` is forever. Good place for a roster of active sub-agents, ongoing initiatives, decisions you've made.
- Sub-agents': every sub-agent has its own `/state/` too. From your container that's `/agents/<name>/state/` (your `/agents` mount is RW), so you can read what they've recorded and write notes for them when you need to leave a heads-up or task list.
When your inbox has a message, handle it and stop. Don't narrate intent — act.

View file

@ -37,6 +37,7 @@ pub async fn approve(coord: Arc<Coordinator>, id: i64) -> Result<()> {
let proposed_dir = Coordinator::agent_proposed_dir(&approval.agent);
let applied_dir = Coordinator::agent_applied_dir(&approval.agent);
let claude_dir = Coordinator::agent_claude_dir(&approval.agent);
let notes_dir = Coordinator::agent_notes_dir(&approval.agent);
match approval.kind {
ApprovalKind::ApplyCommit => {
@ -48,6 +49,7 @@ pub async fn approve(coord: Arc<Coordinator>, id: i64) -> Result<()> {
&agent_dir,
&applied_dir,
&claude_dir,
&notes_dir,
coord.dashboard_port,
)
.await
@ -70,6 +72,7 @@ pub async fn approve(coord: Arc<Coordinator>, id: i64) -> Result<()> {
&proposed_dir,
&applied_dir,
&claude_dir,
&notes_dir,
coord_bg.dashboard_port,
)
.await;

View file

@ -58,12 +58,14 @@ pub async fn rebuild_agent(coord: &Arc<Coordinator>, name: &str, current_rev: &s
.with_context(|| format!("ensure_runtime {name}"))?;
let applied_dir = Coordinator::agent_applied_dir(name);
let claude_dir = Coordinator::agent_claude_dir(name);
let notes_dir = Coordinator::agent_notes_dir(name);
let result = lifecycle::rebuild(
name,
&coord.hyperhive_flake,
&agent_dir,
&applied_dir,
&claude_dir,
&notes_dir,
coord.dashboard_port,
)
.await;
@ -122,6 +124,7 @@ pub async fn ensure_manager(coord: &Arc<Coordinator>) -> Result<()> {
let proposed = Coordinator::agent_proposed_dir(MANAGER_NAME);
let applied = Coordinator::agent_applied_dir(MANAGER_NAME);
let claude_dir = Coordinator::agent_claude_dir(MANAGER_NAME);
let notes_dir = Coordinator::agent_notes_dir(MANAGER_NAME);
lifecycle::spawn(
MANAGER_NAME,
&coord.hyperhive_flake,
@ -129,6 +132,7 @@ pub async fn ensure_manager(coord: &Arc<Coordinator>) -> Result<()> {
&proposed,
&applied,
&claude_dir,
&notes_dir,
coord.dashboard_port,
)
.await?;

View file

@ -178,6 +178,14 @@ impl Coordinator {
Self::agent_state_root(name).join("claude")
}
/// Per-agent durable knowledge dir. Bind-mounted RW into the agent
/// container at `/state`. Survives destroy/recreate alongside the
/// claude dir. Agents are told (via the system prompt) to write
/// long-lived notes / scratch state here.
pub fn agent_notes_dir(name: &str) -> PathBuf {
Self::agent_state_root(name).join("state")
}
/// Authoritative applied config repo. Hive-c0re-only.
pub fn agent_applied_dir(name: &str) -> PathBuf {
PathBuf::from(format!("{APPLIED_STATE_ROOT}/{name}"))

View file

@ -26,6 +26,11 @@ pub const CONTAINER_RUNTIME_MOUNT: &str = "/run/hive";
/// Persistent across destroy/recreate so OAuth login survives.
pub const CONTAINER_CLAUDE_MOUNT: &str = "/root/.claude";
/// Mount point of the per-agent durable knowledge dir inside the container.
/// Agents are told (system prompt) to keep `notes.md` and any other scratch
/// state here; persists across destroy/recreate.
pub const CONTAINER_NOTES_MOUNT: &str = "/state";
const GIT_NAME: &str = "hive-c0re";
const GIT_EMAIL: &str = "hive-c0re@hyperhive";
@ -98,6 +103,7 @@ fn validate(name: &str) -> Result<()> {
Ok(())
}
#[allow(clippy::too_many_arguments)]
pub async fn spawn(
name: &str,
hyperhive_flake: &str,
@ -105,16 +111,18 @@ pub async fn spawn(
proposed_dir: &Path,
applied_dir: &Path,
claude_dir: &Path,
notes_dir: &Path,
dashboard_port: u16,
) -> Result<()> {
validate(name)?;
setup_proposed(proposed_dir, name).await?;
setup_applied(applied_dir, name, hyperhive_flake, dashboard_port).await?;
ensure_claude_dir(claude_dir)?;
ensure_state_dir(notes_dir)?;
let container = container_name(name);
let flake_ref = format!("{}#default", applied_dir.display());
run(&["create", &container, "--flake", &flake_ref]).await?;
set_nspawn_flags(&container, agent_dir, claude_dir)?;
set_nspawn_flags(&container, agent_dir, claude_dir, notes_dir)?;
set_resource_limits(&container)?;
systemd_daemon_reload().await?;
run(&["start", &container]).await
@ -176,14 +184,16 @@ pub async fn rebuild(
agent_dir: &Path,
applied_dir: &Path,
claude_dir: &Path,
notes_dir: &Path,
dashboard_port: u16,
) -> Result<()> {
validate(name)?;
setup_applied(applied_dir, name, hyperhive_flake, dashboard_port).await?;
ensure_claude_dir(claude_dir)?;
ensure_state_dir(notes_dir)?;
let container = container_name(name);
let flake_ref = format!("{}#default", applied_dir.display());
set_nspawn_flags(&container, agent_dir, claude_dir)?;
set_nspawn_flags(&container, agent_dir, claude_dir, notes_dir)?;
set_resource_limits(&container)?;
systemd_daemon_reload().await?;
run(&["update", &container, "--flake", &flake_ref]).await?;
@ -349,6 +359,14 @@ fn ensure_claude_dir(claude_dir: &Path) -> Result<()> {
Ok(())
}
fn ensure_state_dir(notes_dir: &Path) -> Result<()> {
if !notes_dir.exists() {
std::fs::create_dir_all(notes_dir)
.with_context(|| format!("create {}", notes_dir.display()))?;
}
Ok(())
}
fn initial_agent_nix(name: &str) -> String {
format!(
"{{ ... }}:\n{{\n # Per-agent overrides for {name}. The manager edits this\n # file (and commits) to customise the agent's NixOS config.\n}}\n",
@ -458,13 +476,19 @@ pub const CONTAINER_MANAGER_AGENTS_MOUNT: &str = "/agents";
/// here so lifecycle stays usable as a leaf module).
const HOST_AGENTS_ROOT: &str = "/var/lib/hyperhive/agents";
fn set_nspawn_flags(container: &str, runtime_dir: &Path, claude_dir: &Path) -> Result<()> {
fn set_nspawn_flags(
container: &str,
runtime_dir: &Path,
claude_dir: &Path,
notes_dir: &Path,
) -> Result<()> {
let path = format!("/etc/nixos-containers/{container}.conf");
let original = std::fs::read_to_string(&path).with_context(|| format!("read {path}"))?;
let mut binds = format!(
"--bind={runtime}:{CONTAINER_RUNTIME_MOUNT} --bind={claude}:{CONTAINER_CLAUDE_MOUNT}",
"--bind={runtime}:{CONTAINER_RUNTIME_MOUNT} --bind={claude}:{CONTAINER_CLAUDE_MOUNT} --bind={notes}:{CONTAINER_NOTES_MOUNT}",
runtime = runtime_dir.display(),
claude = claude_dir.display(),
notes = notes_dir.display(),
);
if container == MANAGER_NAME {
// Manager edits sub-agent proposed/ repos and its own. RW so it can

View file

@ -65,6 +65,7 @@ async fn dispatch(req: &HostRequest, coord: Arc<Coordinator>) -> HostResponse {
let proposed_dir = Coordinator::agent_proposed_dir(name);
let applied_dir = Coordinator::agent_applied_dir(name);
let claude_dir = Coordinator::agent_claude_dir(name);
let notes_dir = Coordinator::agent_notes_dir(name);
match lifecycle::spawn(
name,
&coord.hyperhive_flake,
@ -72,6 +73,7 @@ async fn dispatch(req: &HostRequest, coord: Arc<Coordinator>) -> HostResponse {
&proposed_dir,
&applied_dir,
&claude_dir,
&notes_dir,
coord.dashboard_port,
)
.await
@ -122,12 +124,14 @@ async fn dispatch(req: &HostRequest, coord: Arc<Coordinator>) -> HostResponse {
let agent_dir = coord.ensure_runtime(name)?;
let applied_dir = Coordinator::agent_applied_dir(name);
let claude_dir = Coordinator::agent_claude_dir(name);
let notes_dir = Coordinator::agent_notes_dir(name);
lifecycle::rebuild(
name,
&coord.hyperhive_flake,
&agent_dir,
&applied_dir,
&claude_dir,
&notes_dir,
coord.dashboard_port,
)
.await?;