The retry was capped at 12 attempts (~20s of exponential backoff
capped at 2s). Two back-to-back nspawn restarts in #324 left the
previous socket holding the port longer than that budget; once the
cap fired, the web-UI task returned an error and silently died for
the rest of the process lifetime — the agent kept running fine
otherwise (MCP, turn loop), but the operator's dashboard click
hit nothing.
Genuine port collisions are preflighted host-side
(lifecycle::{spawn,rebuild}) and surfaced as a port-conflict banner,
so at this layer a persistent AddrInUse always reflects a
recoverable stale socket. Drop the cap, keep retrying forever with
the same 2s-capped backoff. WARN for the first dozen attempts so a
normal restart-race is visible; INFO after that to avoid spamming
the journal during a long stale-socket hold. Logs a one-line INFO
on success when we did have to retry, so post-mortems can find the
attempt count.
Closes#324.