Background Jobs & Queues
Nexa runs every long-running step of a case (allocation, booking, notifications, payments, agent runs) on background workers. Customers do not operate queues or workers — Nexa runs them. This page describes how the platform behaves so integrators can reason about timing, retries, and failure modes from the API.
Pipeline at a glance
A single disruption case flows through five queues:
| Stage | Typical duration | Retries | Backoff |
|---|---|---|---|
| Allocation | 1–10 s | 3 | Exp 2 s |
| Booking | 1–30 s per group | 5 | Exp 5 s |
| Payment / token capture | < 1 s | 5 | Exp 3 s |
| Notification (email / SMS) | < 1 s | 5 | Exp 3 s |
| Exception agent | 5–30 s | 1 | none |
When you POST /cases, the response returns immediately with status: OPEN. The API enqueues an allocation job and returns. Subsequent stages happen asynchronously.
Subscribe to GET /events/stream for real-time progress, or poll the case for state transitions (OPEN → ALLOCATING → BOOKING → CONFIRMED).
Idempotency keys
Every background job carries an idempotency key so re-enqueues, retries, and replays are safe:
| Stage | Key |
|---|---|
| Allocation | (caseUrn, demandRequestUrn, waveSeq) |
| Booking | (caseUrn, waveUrn, groupId) |
| Payment | (caseUrn, reservationUrn) |
| Notification | (groupId, type, channel, reservationUrn) |
| Agent | (itemUrn, runSeq) |
If your DCS replays the same disruption event during a network blip, Nexa returns the existing case rather than creating a duplicate. See Idempotency for details.
Retries and dead letters
Retries use exponential backoff with the budgets in the table above. When retries exhaust, Nexa never silently drops the work:
- Writes an audit entry with the last error.
- Opens a manual-review item with the appropriate category.
- Transitions the case to
MANUAL_REVIEW(where applicable).
Operators see the failure in the console and resolve via Manual Review.
Concurrency limits you'll observe
The relevant limits as a consumer:
- Allocation: bounded by upstream provider rate limits (Amadeus + Hotelbeds). Effective ceiling ≈ 10 concurrent searches per tenant.
- Booking: bounded by provider rate limits and PSP cap. Effective ceiling ≈ 20 concurrent bookings per tenant.
- Notifications: bounded by your SendGrid plan. The dashboard shows the burn-rate vs. quota; lift the plan if you saturate.
- Agent: bounded by LLM provider concurrency. Default 3 concurrent triages per tenant.
Beyond these ceilings, jobs queue rather than fail.
Real-time observability
Every stage emits events to /events/stream:
prediction.created— the predictor finished a forecast.signal.ingested—{ source, count }, one per provider per cycle.job.activity—{ queue, state, detail? }at start/finish/error of each job.
Per-tenant Grafana dashboards (queue depth, job duration, failure rate by reason) are available on request.
Replay and reprocess
If your team needs to replay a failed booking after fixing an upstream issue (for example a contract that was uploaded late), the operations console exposes a one-click "Retry" on the manual-review item. Server-to-server replay is also available:
curl -X POST "$NEXA_BASE_URL/manual-review/$ITEM_URN/retry" \
-H "Authorization: Bearer $TOKEN"
Idempotency keys make the replay safe — if a booking already succeeded, the replay returns the existing reservation.