Recovery And Retries
Recovery keeps operation work from getting stuck when a worker, provider, or container path does not finish cleanly.
How To Use This Page
- Check whether an operation is still waiting before retrying manually.
- Use terminal status and recent events to decide whether a retry is safe.
- Expect recovery to ignore stale results once newer durable state exists.
Status And Event Map
Operation statuses: queued, running, waiting, blocked, completed, failed, canceled.
| Status | What It Means | User Review |
|---|---|---|
| queued | Work has been accepted but has not started. | Wait for the timeline to move or check queue health. |
| running | A bot, tool, provider, or container is actively working. | Review progress events before retrying. |
| waiting | The operation is waiting on a callback, schedule, user action, or follow-up turn. | Check what dependency is named in the timeline. |
| blocked | Work intentionally stopped until a person or external condition changes. | Read the blocking reason before taking action. |
| completed | Work reached a successful terminal state. | Inspect produced artifacts, blobs, links, or messages. |
| failed | Work reached an error terminal state. | Capture the failing step, visible error, and affected refs. |
| canceled | Work was stopped before completion. | Confirm whether a replacement operation exists. |
Review Checklist
- Start from the operation or artifact visible in the app.
- Follow events in timestamp order.
- Open produced artifacts or blobs before sharing conclusions.
- Capture status, route, artifact, operation, and visible error details when escalating.
Media To Add
- Timeline: failed callback, recovery wake, retried work, and final status. It helps support teams distinguish safe retry from duplicate work. Source: test operation recovery case.