Build an AI fallback and retry system

Any automation that calls a remote API will eventually meet a 429 rate-limit or a 503 from the other end. When that happens inside a Northwind script, the whole run dies — and a job that batched a hundred rows might fail on row 41 because of a blip that cleared two seconds later. The fix is not to hope; it is to retry.

This is not a standalone automation — it is the call layer the other AI scripts should sit on. resilientClaude wraps the raw API call with two safety nets. First, it tells transient failures (server errors, rate limits) apart from permanent ones (a bad request, a revoked key) and only retries the transient ones. Second, it backs off exponentially between attempts and rotates through a list of models, so if one model is overloaded the next attempt tries another.

What you’ll need

An Anthropic API key saved as ANTHROPIC_API_KEY in Script Properties — see Store API keys and secrets securely.
Nothing else — this is a drop-in replacement for a plain callClaude call. Anywhere another script calls callClaude(prompt), call resilientClaude(prompt) instead.

The script

// Models tried in order, cheapest first. Each retry rotates to the next.
const MODELS = ['claude-haiku-4-5-20251001', 'claude-sonnet-4-6'];

// How many full passes through MODELS before giving up.
const MAX_PASSES = 3;

// Cap on the backoff wait so a long retry chain never stalls past a minute.
const MAX_BACKOFF_MS = 60_000;

/**
 * Calls Claude with exponential backoff and model rotation. Retries only
 * transient failures; permanent ones (4xx that is not 429) throw at once.
 *
 * @param {string} prompt The prompt to send.
 * @param {number} attempt Internal retry counter — leave unset when calling.
 * @return {string} Claude's reply text.
 */
function resilientClaude(prompt, attempt = 0) {
  // 1. Give up once every model has had MAX_PASSES tries.
  if (attempt >= MODELS.length * MAX_PASSES) {
    throw new Error('Exhausted retries after ' + attempt + ' attempts');
  }

  // 2. Rotate through the model list so a busy model is not retried alone.
  const model = MODELS[attempt % MODELS.length];

  try {
    return callClaude(prompt, model);
  } catch (e) {
    // 3. A "Permanent:" error is not worth retrying — surface it now.
    if (String(e.message).startsWith('Permanent:')) throw e;

    // 4. Otherwise back off exponentially (capped) and try again.
    const wait = Math.min(MAX_BACKOFF_MS, 500 * Math.pow(2, attempt));
    Logger.log('Attempt ' + attempt + ' failed (' + e.message +
      ') — retrying in ' + wait + 'ms.');
    Utilities.sleep(wait);
    return resilientClaude(prompt, attempt + 1);
  }
}

/**
 * Single Anthropic API call. Classifies the HTTP status so the caller
 * can tell a transient failure from a permanent one.
 *
 * @param {string} prompt The prompt to send.
 * @param {string} model The model id to use.
 * @return {string} Claude's reply text.
 */
function callClaude(prompt, model) {
  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');
  const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model,
      max_tokens: 400,
      messages: [{ role: 'user', content: prompt }],
    }),
    // muteHttpExceptions lets us read the status code instead of crashing.
    muteHttpExceptions: true,
  });

  const code = res.getResponseCode();

  // 5xx and 429 are transient — throw a plain error so the retry kicks in.
  if (code >= 500 || code === 429) throw new Error('HTTP ' + code);

  // Any other 4xx is the caller's fault — mark it Permanent so it never retries.
  if (code >= 400) throw new Error('Permanent: ' + res.getContentText());

  return JSON.parse(res.getContentText()).content[0].text.trim();
}

How it works

resilientClaude tracks an attempt counter. Once it has made MODELS.length * MAX_PASSES attempts — every model tried three times — it gives up and throws.
Each attempt picks a model with attempt % MODELS.length, so consecutive retries rotate through the list rather than hammering one overloaded model.
callClaude runs with muteHttpExceptions: true, which lets the script read the HTTP status code instead of crashing on a non-200.
The status code decides everything. A 5xx or a 429 is transient, so callClaude throws a plain HTTP nnn error. Any other 4xx — a malformed request, a bad key — throws an error prefixed Permanent:.
Back in resilientClaude, a Permanent: error is re-thrown immediately: retrying a bad request just wastes time and money. Anything else triggers an exponential backoff (500ms, 1s, 2s, …, capped at 60s) before the next recursive attempt.

Example run

A run where the rate limit clears on the third try:

Attempt	Model	Result	Action
0	haiku	HTTP 429	wait 500ms, retry
1	sonnet	HTTP 429	wait 1,000ms, retry
2	haiku	HTTP 200	return the reply

And a run that fails fast on a bad key — no pointless retries:

Attempt	Model	Result	Action
0	haiku	HTTP 401 → `Permanent:`	throw immediately

The log makes the retry path visible: Attempt 0 failed (HTTP 429) — retrying in 500ms.

Use it

This is a library function, not a job you run on its own. Swap it in wherever another Northwind script calls the API directly:

// Before — one blip and the whole run dies:
const reply = callClaude(prompt, 'claude-haiku-4-5-20251001');

// After — transient failures retry, permanent ones fail fast:
const reply = resilientClaude(prompt);

There is nothing to schedule. The behaviour only matters when the caller hits trouble, and then it is automatic.

Watch out for

Backoff burns execution time. Apps Script caps a single run at six minutes. A full retry chain can sleep for well over a minute — fine for one call, risky inside a loop over hundreds of rows. For big batches, catch the final throw per row and move on rather than letting one row stall the job.
Only retry idempotent calls. Asking Claude the same prompt twice is safe. If your wrapper also writes to a Sheet or sends an email, make sure a retry does not duplicate that side effect.
A 400 is your bug, not a server’s. The Permanent: branch deliberately fails fast on malformed requests. If you see it, fix the prompt or payload — do not widen the retry to cover it.
Model rotation assumes interchangeable models. Haiku and Sonnet return the same response shape, so rotation is transparent. If you add a model with a different API contract, the rotation will break the caller’s parsing.
Recursion depth is bounded but real. With the defaults it is at most six nested calls, which is safe — but raise MAX_PASSES carefully if you ever do.

Build an AI fallback and retry system

What you’ll need

The script

How it works

Example run

Use it

Watch out for

Related

Generate and test email subject lines

Build retrieval-augmented Q&A over your data

Build an AI weekly-report narrator

Build a multi-step AI agent workflow

Adapt marketing copy per region