Batch and rate-limit AI calls safely
Stay within quotas on large Northwind jobs — fan-out with a token-bucket throttle.
Published Dec 17, 2025
A one-off AI call is easy. The trouble starts when Northwind needs to run the
same prompt over a few hundred rows — summarise every support ticket, draft a
reply for every lead. Fire those calls in a tight loop and the API starts
returning 429 Too Many Requests, the script half-finishes, and you are left
guessing which rows ran.
This pattern gives you two safe ways to run a batch of prompts. The first paces
the calls one at a time so they never exceed a set rate. The second uses Apps
Script’s fetchAll to send them in parallel when the job is small enough to
fit in a single execution. Pick whichever suits the size of the job — both keep
you the right side of the quota.
What you’ll need
- An Anthropic API key saved as
ANTHROPIC_API_KEYin Script Properties — see Store API keys and secrets securely. - An array of prompt strings to run — built however your job produces them (one per spreadsheet row, one per file, and so on).
- Knowledge of your account’s rate limit so you can set
RATE_PER_SECONDsensibly — check it in the Anthropic console.
The script
// Maximum requests per second for the throttled runner. Set this at or
// below your Anthropic account's rate limit.
const RATE_PER_SECOND = 10;
// Model and token budget shared by every call in a batch.
const MODEL = 'claude-haiku-4-5-20251001';
const MAX_TOKENS = 400;
/**
* Runs a list of prompts one at a time, pausing between calls so the
* request rate never exceeds RATE_PER_SECOND. Use this for large jobs.
*
* @param {string[]} prompts The prompts to run.
* @return {string[]} One reply per prompt, in order.
*/
function runJobs(prompts) {
if (!prompts || !prompts.length) {
Logger.log('No prompts to run — nothing to do.');
return [];
}
// Spacing between calls, derived from the target rate.
const intervalMs = Math.ceil(1000 / RATE_PER_SECOND);
const out = [];
for (let i = 0; i < prompts.length; i++) {
out.push(callClaude(prompts[i]));
// Sleep between calls — but not after the last one.
if (i < prompts.length - 1) Utilities.sleep(intervalMs);
}
return out;
}
/**
* Runs all prompts in parallel with UrlFetchApp.fetchAll. Faster, but
* every call leaves at once — only use it when the batch is small enough
* to stay under your rate limit in a single burst.
*
* @param {string[]} prompts The prompts to run.
* @return {string[]} One reply per prompt, in order.
*/
function runJobsBatch(prompts) {
if (!prompts || !prompts.length) {
Logger.log('No prompts to run — nothing to do.');
return [];
}
const key = PropertiesService.getScriptProperties()
.getProperty('ANTHROPIC_API_KEY');
// Build one request object per prompt.
const requests = prompts.map((p) => ({
url: 'https://api.anthropic.com/v1/messages',
method: 'post',
contentType: 'application/json',
headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
payload: JSON.stringify({
model: MODEL,
max_tokens: MAX_TOKENS,
messages: [{ role: 'user', content: p }],
}),
muteHttpExceptions: true,
}));
// fetchAll sends them together; optional chaining keeps a failed
// call from breaking the whole batch.
return UrlFetchApp.fetchAll(requests).map((r) =>
JSON.parse(r.getContentText()).content?.[0]?.text?.trim() || '');
}
/**
* Single Anthropic API call. The key lives in Script Properties — it
* is never pasted into the code.
*
* @param {string} prompt The prompt text.
* @return {string} The model's reply text.
*/
function callClaude(prompt) {
const key = PropertiesService.getScriptProperties()
.getProperty('ANTHROPIC_API_KEY');
const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
method: 'post',
contentType: 'application/json',
headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
payload: JSON.stringify({
model: MODEL,
max_tokens: MAX_TOKENS,
messages: [{ role: 'user', content: prompt }],
}),
});
return JSON.parse(res.getContentText()).content[0].text.trim();
}
How it works
runJobsis the throttled runner. It works outintervalMs— the gap between calls needed to hitRATE_PER_SECOND— then loops through the prompts, calling Claude once per iteration.- After each call it sleeps for
intervalMs, except after the last one, so the request rate stays flat instead of bursting. runJobsBatchis the parallel runner. It builds one request object per prompt and hands the whole array toUrlFetchApp.fetchAll, which Google sends concurrently.- The batch runner sets
muteHttpExceptions: trueand uses optional chaining (content?.[0]?.text) so one failed call returns an empty string instead of throwing and losing the rest of the results. callClaudeis the single-call helper the throttled runner uses. All three functions share theMODELandMAX_TOKENSconfig so a batch is consistent.
Example run
Say you have 50 ticket summaries to generate. The throttled runner spaces them out:
function summariseTickets() {
const prompts = getTicketTexts().map((t) => 'Summarise this ticket: ' + t);
const summaries = runJobs(prompts); // ~5 seconds at 10/sec
Logger.log(summaries.length + ' summaries done.');
}
With RATE_PER_SECOND at 10, the 50 calls finish in roughly five seconds and
never trip a 429. The same 50 prompts through runJobsBatch finish in about
one second — fine here, but a 500-prompt batch sent that way would burst well
past the limit.
Run it
This is a helper pattern, not a standalone job — you call runJobs or
runJobsBatch from whatever function builds your prompts:
- Write a wrapper (like
summariseTicketsabove) that assembles the prompt array and calls one of the runners. - Use
runJobsfor large jobs where staying under the rate limit matters more than speed; userunJobsBatchfor small jobs where speed wins. - For a recurring job, add a time-driven trigger on the wrapper.
Watch out for
- Apps Script caps a single execution at 6 minutes. A large
runJobsbatch can hit that ceiling — for hundreds of prompts, process them in chunks across several trigger runs and store progress between runs. fetchAllignoresRATE_PER_SECONDentirely. It sends every request at once, so a big batch can still trip a429. Keep batches small or split them.- Neither runner retries. For production, wrap calls in a retry with
exponential backoff so a transient
429or503does not lose a row. UrlFetchAppitself has a daily quota (around 20,000 calls on consumer accounts). A very large recurring job can exhaust it.- You still pay per token. Batching changes the timing, not the cost — a thousand prompts cost the same whether paced or parallel.
Related
Generate and test email subject lines
A/B test AI-written Northwind subject lines for open rate — outputs ranked by past performance.
Updated Mar 3, 2026
Build retrieval-augmented Q&A over your data
Answer Northwind questions grounded in your own Sheet data — pass relevant rows as context.
Updated Feb 27, 2026
Build an AI weekly-report narrator
Turn Northwind metrics into a written executive summary — numbers in, prose out.
Updated Feb 23, 2026
Build a multi-step AI agent workflow
Chain Claude prompts to complete a Northwind task end to end — research → draft → critique → finalise.
Updated Feb 11, 2026
Adapt marketing copy per region
Localise Northwind tone and references by market with AI — same message, regional flavour.
Updated Jan 30, 2026