loose-ends question rows get a textarea + send button; the operator answers as operator by POSTing to the core dashboard's /answer-question route, not the per-agent socket — keeps the operator-authority path off the agent's own socket. cross-origin POST needs a CORS shim on that route for now; drops out once the gateway makes the page same-origin. also splits deployment/ops/boundaries/gateway work into TODO-ops.md.
5 KiB
Hyperhive — deployment, ops & boundaries
Tracking the deployment-shape + operational-hardening work: container network isolation, the unifying gateway, the operator-vs-agent trust boundary, and process privilege separation.
These items interlock. Today "the operator surface" and "the
agent surface" are a convention, not a boundary — nothing
stops a container from curling the core daemon on
localhost:<port>, or another agent's web UI. The gateway,
network isolation, and privsep together turn that convention
into an enforced boundary. Sequencing matters; see the order at
the bottom.
The boundary we're building toward
Two principals, two paths:
- Operator — reaches every UI (the dashboard + every per-agent page) through the gateway, on one origin. Operator-authority actions (approve / deny, answer-as-operator, lifecycle POSTs) are served by the core daemon and only reachable via the gateway.
- Agent — speaks only for itself, only over its per-agent
unix socket. The socket's identity is the agent (see
docs/conventions.md, "identity = socket"). An agent must not be able to reach the core daemon's HTTP surface, another agent's socket, or another agent's web UI.
Design rule that falls out of this: operator-authority
actions never get a per-agent-socket entry point. They live on
the core backend. Worked example — answering an
operator-targeted question is a POST /answer-question/{id} on
the core dashboard, never an AgentRequest variant. If it
were a per-agent-socket request, an agent could curl its own
socket and spoof an operator answer. The per-agent web UI POSTs
cross-origin to the core for these (see the inline-answer
feature — the loose-ends section on each agent page).
Workstreams
1. Container network isolation
Today containers share the host network namespace, so a
container can reach localhost:<core-port>, the dashboard, and
every other agent's web port. Until this changes, nothing
below is actually enforced — the operator/agent split is on
the honour system.
- Give each container a private veth / bridge with no route to the host's loopback-bound services.
- The per-agent unix socket stays the only host-bound channel (it already is the intended one).
- Open question: the per-agent web UI still needs to be reachable by the operator's browser — that is what the gateway is for (below). The container itself should not be able to reach the gateway or the core daemon.
2. Unifying gateway / reverse proxy
(Moved here from TODO.md "Dashboard".)
Today every agent's web UI is reached at
<host>:<per-agent-port>/, so operators juggle a port list.
Stand up nginx (or similar) terminating one domain that fans
requests to /agent/<name>/... out to each container's web
port, and / to the main dashboard. Touches: a NixOS module on
the host, the dashboard's per-agent link rendering, and the
per-agent web server's base-path handling (currently assumes
root). Lets bookmarks survive port reshuffles and unblocks
per-agent stats links being relative URLs instead of hard-coded
ports.
Boundary payoff: once the dashboard and the per-agent pages are
same-origin behind the gateway, the cross-origin CORS shim on
POST /answer-question/{id} (added with the inline-answer
feature) can be deleted — the per-agent page's POST becomes a
plain same-origin request. Grep for with_cors /
Access-Control-Allow-Origin in hive-c0re/src/dashboard.rs
and remove it when this lands.
The gateway is also the natural home for auth, if/when the operator surface ever needs it.
3. Privsep the core daemon from the web UI
(Moved here from TODO.md "Security".)
hive-c0re runs as root (it has to — nixos-container create /
start / destroy, the meta git repo, every per-agent bind
mount). The HTTP server lives in the same process, so every
read-endpoint (/api/state-file, /api/journal/{name},
/api/agent-config/{name}) is one allow-list bug away from
serving arbitrary host files. Split it: keep the privileged
daemon doing lifecycle + git + ipc, run the web UI as an
unprivileged user that talks to the daemon over a unix socket
with a narrow request surface (ReadAgentStateFile { agent, rel_path } etc.). The unprivileged process can't read
/etc/shadow even if every check in get_state_file is
bypassed — it doesn't have the bits. Container-lifecycle POSTs
(/restart, /destroy, etc.) become forwarded RPCs the
privileged side authorises on its terms.
Cheaper once the harness/state split lands (see TODO.md "Split
harness-internal state from agent-visible state") — the
unprivileged web server then only needs read access to
/agents/<n>/state/, not /agents/<n>/harness/.
Suggested sequencing
- Gateway first — pure ergonomics win, unblocks same-origin, no behavioural risk.
- Network isolation next — the step that makes the operator/agent boundary real. Everything before it is honour-system.
- Privsep last — defence in depth on the core process itself; valuable independent of the other two, but the biggest refactor.