appscript.dev
Automation Advanced

Batch and rate-limit AI calls safely

Stay within quotas on large Northwind jobs — fan-out with a token-bucket throttle.

Published Dec 17, 2025

A one-off AI call is easy. The trouble starts when Northwind needs to run the same prompt over a few hundred rows — summarise every support ticket, draft a reply for every lead. Fire those calls in a tight loop and the API starts returning 429 Too Many Requests, the script half-finishes, and you are left guessing which rows ran.

This pattern gives you two safe ways to run a batch of prompts. The first paces the calls one at a time so they never exceed a set rate. The second uses Apps Script’s fetchAll to send them in parallel when the job is small enough to fit in a single execution. Pick whichever suits the size of the job — both keep you the right side of the quota.

What you’ll need

  • An Anthropic API key saved as ANTHROPIC_API_KEY in Script Properties — see Store API keys and secrets securely.
  • An array of prompt strings to run — built however your job produces them (one per spreadsheet row, one per file, and so on).
  • Knowledge of your account’s rate limit so you can set RATE_PER_SECOND sensibly — check it in the Anthropic console.

The script

// Maximum requests per second for the throttled runner. Set this at or
// below your Anthropic account's rate limit.
const RATE_PER_SECOND = 10;

// Model and token budget shared by every call in a batch.
const MODEL = 'claude-haiku-4-5-20251001';
const MAX_TOKENS = 400;

/**
 * Runs a list of prompts one at a time, pausing between calls so the
 * request rate never exceeds RATE_PER_SECOND. Use this for large jobs.
 *
 * @param {string[]} prompts  The prompts to run.
 * @return {string[]} One reply per prompt, in order.
 */
function runJobs(prompts) {
  if (!prompts || !prompts.length) {
    Logger.log('No prompts to run — nothing to do.');
    return [];
  }

  // Spacing between calls, derived from the target rate.
  const intervalMs = Math.ceil(1000 / RATE_PER_SECOND);
  const out = [];

  for (let i = 0; i < prompts.length; i++) {
    out.push(callClaude(prompts[i]));
    // Sleep between calls — but not after the last one.
    if (i < prompts.length - 1) Utilities.sleep(intervalMs);
  }
  return out;
}

/**
 * Runs all prompts in parallel with UrlFetchApp.fetchAll. Faster, but
 * every call leaves at once — only use it when the batch is small enough
 * to stay under your rate limit in a single burst.
 *
 * @param {string[]} prompts  The prompts to run.
 * @return {string[]} One reply per prompt, in order.
 */
function runJobsBatch(prompts) {
  if (!prompts || !prompts.length) {
    Logger.log('No prompts to run — nothing to do.');
    return [];
  }

  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');

  // Build one request object per prompt.
  const requests = prompts.map((p) => ({
    url: 'https://api.anthropic.com/v1/messages',
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model: MODEL,
      max_tokens: MAX_TOKENS,
      messages: [{ role: 'user', content: p }],
    }),
    muteHttpExceptions: true,
  }));

  // fetchAll sends them together; optional chaining keeps a failed
  // call from breaking the whole batch.
  return UrlFetchApp.fetchAll(requests).map((r) =>
    JSON.parse(r.getContentText()).content?.[0]?.text?.trim() || '');
}

/**
 * Single Anthropic API call. The key lives in Script Properties — it
 * is never pasted into the code.
 *
 * @param {string} prompt  The prompt text.
 * @return {string} The model's reply text.
 */
function callClaude(prompt) {
  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');
  const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model: MODEL,
      max_tokens: MAX_TOKENS,
      messages: [{ role: 'user', content: prompt }],
    }),
  });
  return JSON.parse(res.getContentText()).content[0].text.trim();
}

How it works

  1. runJobs is the throttled runner. It works out intervalMs — the gap between calls needed to hit RATE_PER_SECOND — then loops through the prompts, calling Claude once per iteration.
  2. After each call it sleeps for intervalMs, except after the last one, so the request rate stays flat instead of bursting.
  3. runJobsBatch is the parallel runner. It builds one request object per prompt and hands the whole array to UrlFetchApp.fetchAll, which Google sends concurrently.
  4. The batch runner sets muteHttpExceptions: true and uses optional chaining (content?.[0]?.text) so one failed call returns an empty string instead of throwing and losing the rest of the results.
  5. callClaude is the single-call helper the throttled runner uses. All three functions share the MODEL and MAX_TOKENS config so a batch is consistent.

Example run

Say you have 50 ticket summaries to generate. The throttled runner spaces them out:

function summariseTickets() {
  const prompts = getTicketTexts().map((t) => 'Summarise this ticket: ' + t);
  const summaries = runJobs(prompts);   // ~5 seconds at 10/sec
  Logger.log(summaries.length + ' summaries done.');
}

With RATE_PER_SECOND at 10, the 50 calls finish in roughly five seconds and never trip a 429. The same 50 prompts through runJobsBatch finish in about one second — fine here, but a 500-prompt batch sent that way would burst well past the limit.

Run it

This is a helper pattern, not a standalone job — you call runJobs or runJobsBatch from whatever function builds your prompts:

  1. Write a wrapper (like summariseTickets above) that assembles the prompt array and calls one of the runners.
  2. Use runJobs for large jobs where staying under the rate limit matters more than speed; use runJobsBatch for small jobs where speed wins.
  3. For a recurring job, add a time-driven trigger on the wrapper.

Watch out for

  • Apps Script caps a single execution at 6 minutes. A large runJobs batch can hit that ceiling — for hundreds of prompts, process them in chunks across several trigger runs and store progress between runs.
  • fetchAll ignores RATE_PER_SECOND entirely. It sends every request at once, so a big batch can still trip a 429. Keep batches small or split them.
  • Neither runner retries. For production, wrap calls in a retry with exponential backoff so a transient 429 or 503 does not lose a row.
  • UrlFetchApp itself has a daily quota (around 20,000 calls on consumer accounts). A very large recurring job can exhaust it.
  • You still pay per token. Batching changes the timing, not the cost — a thousand prompts cost the same whether paced or parallel.

Related