diff --git a/TODO-ops.md b/TODO-ops.md new file mode 100644 index 0000000..148c9bf --- /dev/null +++ b/TODO-ops.md @@ -0,0 +1,119 @@ +# Hyperhive — deployment, ops & boundaries + +Tracking the deployment-shape + operational-hardening work: +container network isolation, the unifying gateway, the +operator-vs-agent trust boundary, and process privilege +separation. + +These items interlock. Today "the operator surface" and "the +agent surface" are a *convention*, not a boundary — nothing +stops a container from curling the core daemon on +`localhost:`, or another agent's web UI. The gateway, +network isolation, and privsep together turn that convention +into an enforced boundary. Sequencing matters; see the order at +the bottom. + +## The boundary we're building toward + +Two principals, two paths: + +- **Operator** — reaches every UI (the dashboard + every + per-agent page) through the gateway, on one origin. + Operator-authority actions (approve / deny, answer-as-operator, + lifecycle POSTs) are served by the core daemon and only + reachable via the gateway. +- **Agent** — speaks only for itself, only over its per-agent + unix socket. The socket's identity *is* the agent (see + `docs/conventions.md`, "identity = socket"). An agent must not + be able to reach the core daemon's HTTP surface, another + agent's socket, or another agent's web UI. + +Design rule that falls out of this: **operator-authority +actions never get a per-agent-socket entry point.** They live on +the core backend. Worked example — answering an +operator-targeted question is a `POST /answer-question/{id}` on +the core dashboard, *never* an `AgentRequest` variant. If it +were a per-agent-socket request, an agent could `curl` its own +socket and spoof an operator answer. The per-agent web UI POSTs +cross-origin to the core for these (see the inline-answer +feature — the loose-ends section on each agent page). + +## Workstreams + +### 1. Container network isolation + +Today containers share the host network namespace, so a +container can reach `localhost:`, the dashboard, and +every other agent's web port. **Until this changes, nothing +below is actually enforced** — the operator/agent split is on +the honour system. + +- Give each container a private veth / bridge with no route to + the host's loopback-bound services. +- The per-agent unix socket stays the only host-bound channel + (it already is the intended one). +- Open question: the per-agent web UI still needs to be + reachable *by the operator's browser* — that is what the + gateway is for (below). The container itself should not be + able to reach the gateway or the core daemon. + +### 2. Unifying gateway / reverse proxy + +(Moved here from TODO.md "Dashboard".) + +Today every agent's web UI is reached at +`:/`, so operators juggle a port list. +Stand up nginx (or similar) terminating one domain that fans +requests to `/agent//...` out to each container's web +port, and `/` to the main dashboard. Touches: a NixOS module on +the host, the dashboard's per-agent link rendering, and the +per-agent web server's base-path handling (currently assumes +root). Lets bookmarks survive port reshuffles and unblocks +per-agent stats links being relative URLs instead of hard-coded +ports. + +Boundary payoff: once the dashboard and the per-agent pages are +same-origin behind the gateway, the cross-origin CORS shim on +`POST /answer-question/{id}` (added with the inline-answer +feature) can be deleted — the per-agent page's POST becomes a +plain same-origin request. Grep for `with_cors` / +`Access-Control-Allow-Origin` in `hive-c0re/src/dashboard.rs` +and remove it when this lands. + +The gateway is also the natural home for auth, if/when the +operator surface ever needs it. + +### 3. Privsep the core daemon from the web UI + +(Moved here from TODO.md "Security".) + +hive-c0re runs as root (it has to — `nixos-container` create / +start / destroy, the meta git repo, every per-agent bind +mount). The HTTP server lives in the same process, so every +read-endpoint (`/api/state-file`, `/api/journal/{name}`, +`/api/agent-config/{name}`) is one allow-list bug away from +serving arbitrary host files. Split it: keep the privileged +daemon doing lifecycle + git + ipc, run the web UI as an +unprivileged user that talks to the daemon over a unix socket +with a narrow request surface (`ReadAgentStateFile { agent, +rel_path }` etc.). The unprivileged process can't read +`/etc/shadow` even if every check in `get_state_file` is +bypassed — it doesn't have the bits. Container-lifecycle POSTs +(`/restart`, `/destroy`, etc.) become forwarded RPCs the +privileged side authorises on its terms. + +Cheaper once the harness/state split lands (see TODO.md "Split +harness-internal state from agent-visible state") — the +unprivileged web server then only needs read access to +`/agents//state/`, not `/agents//harness/`. + +## Suggested sequencing + +1. **Gateway** first — pure ergonomics win, unblocks + same-origin, no behavioural risk. +2. **Network isolation** next — the step that makes the + operator/agent boundary *real*. Everything before it is + honour-system. +3. **Privsep** last — defence in depth on the core process + itself; valuable independent of the other two, but the + biggest refactor. diff --git a/TODO.md b/TODO.md index 5d2549b..b142df9 100644 --- a/TODO.md +++ b/TODO.md @@ -5,6 +5,10 @@ > for the operator is not. Use that as a hint when picking up items, > not a hard rule. +**Deployment / ops / boundaries:** the unifying gateway, container +network isolation, the operator-vs-agent trust boundary, and process +privsep are tracked separately in [`TODO-ops.md`](TODO-ops.md). + ## Architecture / Features - Shared space for all agents to access documents/files without manager routing @@ -23,13 +27,8 @@ ## Dashboard -- **Unified URL scheme via reverse proxy**: today every agent's web UI is reached at `:/`, so operators juggle a port list. Stand up nginx (or similar) terminating one domain that fans requests to `/agent//...` out to each container's web port, and to `/` for the main dashboard. Touches: a NixOS module on the host, the dashboard's per-agent link rendering, and the per-agent web server's base-path handling (currently assumes root). Lets bookmarks survive port reshuffles and unblocks per-agent stats links being relative URLs instead of hard-coded ports. - **Delivered-reminder rollup on the per-agent stats page**: surface attempt / success / failure counts for reminders this agent fired (in the existing `/stats` page). Needs an `AgentRequest::ReminderRollup { since_secs }` / matching `ManagerRequest::ReminderRollup` RPC so the agent can pull the counts from the host's broker DB (the reminders table is host-owned; agent state doesn't have them). Deferred from the initial stats page so the first cut stays self-contained to data the agent already owns. -## Security - -- **Privsep the dashboard from the privileged daemon**: hive-c0re runs as root (it has to — `nixos-container` create / start / destroy, the meta git repo, every per-agent bind mount). The HTTP server lives in the same process, so every read-endpoint (`/api/state-file`, `/api/journal/{name}`, `/api/agent-config/{name}`) is one allow-list bug away from serving arbitrary host files. Split the architecture: keep the privileged daemon doing lifecycle + git + ipc, run the web UI as an unprivileged user that talks to the daemon over a unix socket with a narrow request surface (`ReadAgentStateFile { agent, rel_path }` etc.). The unprivileged process can't read `/etc/shadow` even if every check in `get_state_file` is bypassed — it doesn't have the bits. Container-lifecycle POSTs (`/restart`, `/destroy`, etc.) become forwarded RPCs the privileged side authorises on its terms. - ## Harness Ergonomics (agent-side wishlist) Filed by damocles, who actually lives in this thing. Loosely ranked by diff --git a/hive-ag3nt/assets/agent.css b/hive-ag3nt/assets/agent.css index d8db0df..86f9f1a 100644 --- a/hive-ag3nt/assets/agent.css +++ b/hive-ag3nt/assets/agent.css @@ -151,6 +151,42 @@ pre.diff { .agent-inbox .inbox-sep { color: var(--muted); } .agent-inbox .inbox-body { color: var(--fg); white-space: pre-wrap; word-break: break-word; } +.agent-inbox .answer-form { + grid-column: 1 / -1; + display: flex; + gap: 0.4em; + align-items: flex-start; + margin-top: 0.25em; +} +.agent-inbox .answer-form textarea { + flex: 1; + font-family: inherit; + font-size: inherit; + background: var(--bg); + color: var(--fg); + border: 1px solid var(--border); + border-radius: 3px; + padding: 0.3em; + resize: vertical; +} +.agent-inbox .answer-form button { + font-family: inherit; + font-size: inherit; + background: var(--bg-elev); + color: var(--fg); + border: 1px solid var(--border); + border-radius: 3px; + padding: 0.3em 0.7em; + cursor: pointer; + white-space: nowrap; +} +.agent-inbox .answer-form button:hover:not(:disabled) { + border-color: var(--purple); + color: var(--purple); +} +.agent-inbox .answer-form button:disabled { opacity: 0.5; cursor: default; } +.agent-inbox .answer-status { color: var(--muted); align-self: center; } + .last-turn { color: var(--muted); font-size: 0.8em; diff --git a/hive-ag3nt/assets/app.js b/hive-ag3nt/assets/app.js index 6b8f5d2..781c373 100644 --- a/hive-ag3nt/assets/app.js +++ b/hive-ag3nt/assets/app.js @@ -22,6 +22,12 @@ return e; }; + // Base URL of the host dashboard (core backend). Set once the first + // /api/state lands. Operator-authority actions (answering a question + // as the operator) POST here rather than to this agent's own socket — + // see TODO-ops.md for why the boundary lives on the core side. + let dashboardBase = ''; + // ─── async-form submit (shared with dashboard) ────────────────────────── document.addEventListener('submit', async (e) => { const f = e.target; @@ -68,6 +74,7 @@ // ↑ DASHB04RD — back-link to the host dashboard. Opens in a new // tab to keep the agent page anchored where the operator is. const dashUrl = `${location.protocol}//${location.hostname}:${dashboardPort}/`; + dashboardBase = dashUrl; title.append( el('a', { href: dashUrl, target: '_blank', rel: 'noopener', @@ -454,6 +461,7 @@ el('span', { class: 'inbox-sep' }, t.asker + ' → ' + target), ' ', el('span', { class: 'inbox-ts' }, fmtAge(t.age_seconds || 0) + ' ago'), el('div', { class: 'inbox-body' }, t.question || ''), + buildAnswerForm(t.id), ); } else if (t.kind === 'reminder') { // due_at is an absolute unix-seconds value; show time-until-fire @@ -474,6 +482,42 @@ } } + // Inline "answer as operator" form for a question loose-end. POSTs to + // the host dashboard (core backend), never this agent's socket — the + // core is the only place that can stamp `operator` as the answerer. + function buildAnswerForm(id) { + const wrap = el('div', { class: 'answer-form' }); + const ta = el('textarea', { rows: '2', placeholder: 'answer as operator…' }); + const btn = el('button', { type: 'button' }, 'send answer'); + const status = el('span', { class: 'answer-status' }); + btn.addEventListener('click', async () => { + const answer = ta.value.trim(); + if (!answer) { status.textContent = 'answer required'; return; } + if (!dashboardBase) { status.textContent = 'dashboard url unknown'; return; } + btn.disabled = true; + status.textContent = 'sending…'; + try { + const resp = await fetch(dashboardBase + 'answer-question/' + id, { + method: 'POST', + headers: { 'Content-Type': 'application/x-www-form-urlencoded' }, + body: 'answer=' + encodeURIComponent(answer), + }); + if (resp.ok) { + status.textContent = 'answered ✓'; + refreshLooseEnds(); + } else { + status.textContent = 'failed: ' + (await resp.text()); + btn.disabled = false; + } + } catch (err) { + status.textContent = 'failed: ' + err; + btn.disabled = false; + } + }); + wrap.append(ta, btn, status); + return wrap; + } + function renderInbox(rows) { const root = $('inbox-section'); const list = $('inbox-list'); diff --git a/hive-c0re/src/dashboard.rs b/hive-c0re/src/dashboard.rs index c5efe5b..cee5bd9 100644 --- a/hive-c0re/src/dashboard.rs +++ b/hive-c0re/src/dashboard.rs @@ -759,6 +759,20 @@ struct AnswerForm { answer: String, } +/// Attach a permissive CORS header so the per-agent web UI — served on +/// a different port — can POST an operator answer here and read the +/// result. The dashboard has no auth, so `*` exposes nothing a plain +/// cross-origin form-POST couldn't already reach. This shim disappears +/// once the unifying gateway makes the agent page same-origin; see +/// `TODO-ops.md`. +fn with_cors(mut resp: Response) -> Response { + resp.headers_mut().insert( + axum::http::header::ACCESS_CONTROL_ALLOW_ORIGIN, + axum::http::HeaderValue::from_static("*"), + ); + resp +} + async fn post_answer_question( State(state): State, AxumPath(id): AxumPath, @@ -766,9 +780,9 @@ async fn post_answer_question( ) -> Response { let answer = form.answer.trim(); if answer.is_empty() { - return error_response("answer: required"); + return with_cors(error_response("answer: required")); } - match state + let resp = match state .coord .questions .answer(id, answer, hive_sh4re::OPERATOR_RECIPIENT) @@ -794,7 +808,8 @@ async fn post_answer_question( (StatusCode::OK, "ok").into_response() } Err(e) => error_response(&format!("answer {id} failed: {e:#}")), - } + }; + with_cors(resp) } /// Resolve a pending operator question with a sentinel answer when