Build an AI fallback and retry system
Handle errors and switch models gracefully — Northwind's AI calls never hard-fail.
Published Dec 21, 2025
Any automation that calls a remote API will eventually meet a 429 rate-limit or a 503 from the other end. When that happens inside a Northwind script, the whole run dies — and a job that batched a hundred rows might fail on row 41 because of a blip that cleared two seconds later. The fix is not to hope; it is to retry.
This is not a standalone automation — it is the call layer the other AI scripts
should sit on. resilientClaude wraps the raw API call with two safety nets.
First, it tells transient failures (server errors, rate limits) apart from
permanent ones (a bad request, a revoked key) and only retries the transient
ones. Second, it backs off exponentially between attempts and rotates through a
list of models, so if one model is overloaded the next attempt tries another.
What you’ll need
- An Anthropic API key saved as
ANTHROPIC_API_KEYin Script Properties — see Store API keys and secrets securely. - Nothing else — this is a drop-in replacement for a plain
callClaudecall. Anywhere another script callscallClaude(prompt), callresilientClaude(prompt)instead.
The script
// Models tried in order, cheapest first. Each retry rotates to the next.
const MODELS = ['claude-haiku-4-5-20251001', 'claude-sonnet-4-6'];
// How many full passes through MODELS before giving up.
const MAX_PASSES = 3;
// Cap on the backoff wait so a long retry chain never stalls past a minute.
const MAX_BACKOFF_MS = 60_000;
/**
* Calls Claude with exponential backoff and model rotation. Retries only
* transient failures; permanent ones (4xx that is not 429) throw at once.
*
* @param {string} prompt The prompt to send.
* @param {number} attempt Internal retry counter — leave unset when calling.
* @return {string} Claude's reply text.
*/
function resilientClaude(prompt, attempt = 0) {
// 1. Give up once every model has had MAX_PASSES tries.
if (attempt >= MODELS.length * MAX_PASSES) {
throw new Error('Exhausted retries after ' + attempt + ' attempts');
}
// 2. Rotate through the model list so a busy model is not retried alone.
const model = MODELS[attempt % MODELS.length];
try {
return callClaude(prompt, model);
} catch (e) {
// 3. A "Permanent:" error is not worth retrying — surface it now.
if (String(e.message).startsWith('Permanent:')) throw e;
// 4. Otherwise back off exponentially (capped) and try again.
const wait = Math.min(MAX_BACKOFF_MS, 500 * Math.pow(2, attempt));
Logger.log('Attempt ' + attempt + ' failed (' + e.message +
') — retrying in ' + wait + 'ms.');
Utilities.sleep(wait);
return resilientClaude(prompt, attempt + 1);
}
}
/**
* Single Anthropic API call. Classifies the HTTP status so the caller
* can tell a transient failure from a permanent one.
*
* @param {string} prompt The prompt to send.
* @param {string} model The model id to use.
* @return {string} Claude's reply text.
*/
function callClaude(prompt, model) {
const key = PropertiesService.getScriptProperties()
.getProperty('ANTHROPIC_API_KEY');
const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
method: 'post',
contentType: 'application/json',
headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
payload: JSON.stringify({
model,
max_tokens: 400,
messages: [{ role: 'user', content: prompt }],
}),
// muteHttpExceptions lets us read the status code instead of crashing.
muteHttpExceptions: true,
});
const code = res.getResponseCode();
// 5xx and 429 are transient — throw a plain error so the retry kicks in.
if (code >= 500 || code === 429) throw new Error('HTTP ' + code);
// Any other 4xx is the caller's fault — mark it Permanent so it never retries.
if (code >= 400) throw new Error('Permanent: ' + res.getContentText());
return JSON.parse(res.getContentText()).content[0].text.trim();
}
How it works
resilientClaudetracks anattemptcounter. Once it has madeMODELS.length * MAX_PASSESattempts — every model tried three times — it gives up and throws.- Each attempt picks a model with
attempt % MODELS.length, so consecutive retries rotate through the list rather than hammering one overloaded model. callClauderuns withmuteHttpExceptions: true, which lets the script read the HTTP status code instead of crashing on a non-200.- The status code decides everything. A 5xx or a 429 is transient, so
callClaudethrows a plainHTTP nnnerror. Any other 4xx — a malformed request, a bad key — throws an error prefixedPermanent:. - Back in
resilientClaude, aPermanent:error is re-thrown immediately: retrying a bad request just wastes time and money. Anything else triggers an exponential backoff (500ms,1s,2s, …, capped at 60s) before the next recursive attempt.
Example run
A run where the rate limit clears on the third try:
| Attempt | Model | Result | Action |
|---|---|---|---|
| 0 | haiku | HTTP 429 | wait 500ms, retry |
| 1 | sonnet | HTTP 429 | wait 1,000ms, retry |
| 2 | haiku | HTTP 200 | return the reply |
And a run that fails fast on a bad key — no pointless retries:
| Attempt | Model | Result | Action |
|---|---|---|---|
| 0 | haiku | HTTP 401 → Permanent: | throw immediately |
The log makes the retry path visible: Attempt 0 failed (HTTP 429) — retrying in 500ms.
Use it
This is a library function, not a job you run on its own. Swap it in wherever another Northwind script calls the API directly:
// Before — one blip and the whole run dies:
const reply = callClaude(prompt, 'claude-haiku-4-5-20251001');
// After — transient failures retry, permanent ones fail fast:
const reply = resilientClaude(prompt);
There is nothing to schedule. The behaviour only matters when the caller hits trouble, and then it is automatic.
Watch out for
- Backoff burns execution time. Apps Script caps a single run at six minutes. A full retry chain can sleep for well over a minute — fine for one call, risky inside a loop over hundreds of rows. For big batches, catch the final throw per row and move on rather than letting one row stall the job.
- Only retry idempotent calls. Asking Claude the same prompt twice is safe. If your wrapper also writes to a Sheet or sends an email, make sure a retry does not duplicate that side effect.
- A 400 is your bug, not a server’s. The
Permanent:branch deliberately fails fast on malformed requests. If you see it, fix the prompt or payload — do not widen the retry to cover it. - Model rotation assumes interchangeable models. Haiku and Sonnet return the same response shape, so rotation is transparent. If you add a model with a different API contract, the rotation will break the caller’s parsing.
- Recursion depth is bounded but real. With the defaults it is at most six
nested calls, which is safe — but raise
MAX_PASSEScarefully if you ever do.
Related
Generate and test email subject lines
A/B test AI-written Northwind subject lines for open rate — outputs ranked by past performance.
Updated Mar 3, 2026
Build retrieval-augmented Q&A over your data
Answer Northwind questions grounded in your own Sheet data — pass relevant rows as context.
Updated Feb 27, 2026
Build an AI weekly-report narrator
Turn Northwind metrics into a written executive summary — numbers in, prose out.
Updated Feb 23, 2026
Build a multi-step AI agent workflow
Chain Claude prompts to complete a Northwind task end to end — research → draft → critique → finalise.
Updated Feb 11, 2026
Adapt marketing copy per region
Localise Northwind tone and references by market with AI — same message, regional flavour.
Updated Jan 30, 2026