reminders: persist + surface delivery failures

Broker schema gains attempt_count INTEGER + last_error TEXT
columns via idempotent ALTER TABLE migration (pragma-probed so
fresh + existing dbs converge). reminder_scheduler::tick calls
record_reminder_failure on every deliver_reminder error,
bumping the counter + stashing the message. get_due_reminders
filters out rows where attempt_count >= MAX_REMINDER_ATTEMPTS
(5) so the scheduler stops retrying a stuck row until the
operator intervenes.

new POST /retry-reminder/{id} → reset_reminder_failure clears
the counters; next 5s tick re-attempts. cancel-reminder
unchanged (hard-delete).

dashboard renders failed rows with a red left rule, the error
text inline, and a ⚠ N failed badge. ↻ R3TRY button appears
when attempt_count > 0 — sits next to ✗ C4NC3L in a small
actions row below the body.
This commit is contained in:
müde 2026-05-18 00:08:09 +02:00
parent d395bdc945
commit 978a3cf391
5 changed files with 173 additions and 8 deletions

View file

@ -57,6 +57,7 @@ pub async fn serve(port: u16, coord: Arc<Coordinator>) -> Result<()> {
.route("/api/state-file", get(get_state_file))
.route("/api/reminders", get(api_reminders))
.route("/cancel-reminder/{id}", post(post_cancel_reminder))
.route("/retry-reminder/{id}", post(post_retry_reminder))
.route("/api/agent-config/{name}", get(get_agent_config))
.route("/request-spawn", post(post_request_spawn))
.route("/op-send", post(post_op_send))
@ -1126,6 +1127,25 @@ async fn post_cancel_reminder(
}
}
/// Reset a pending reminder's failure state so the scheduler
/// retries it on the next tick. Useful when the failure was
/// transient (sqlite lock contention, disk full → freed up) and
/// the operator wants delivery to resume immediately instead of
/// the row sitting in attempt-count-capped purgatory.
async fn post_retry_reminder(
State(state): State<AppState>,
AxumPath(id): AxumPath<i64>,
) -> Response {
match state.coord.broker.reset_reminder_failure(id) {
Ok(0) => error_response(&format!("reminder {id} not pending (already delivered?)")),
Ok(_) => {
tracing::info!(%id, "operator reset reminder failure for retry");
(StatusCode::OK, "ok").into_response()
}
Err(e) => error_response(&format!("retry reminder {id} failed: {e:#}")),
}
}
async fn post_purge_tombstone(
State(state): State<AppState>,
AxumPath(name): AxumPath<String>,