appscript.dev
Automation Advanced

Handle streaming responses from an LLM API

Manage long Northwind AI outputs reliably — note: Apps Script UrlFetch is synchronous.

Published Jan 3, 2026

Northwind runs a few longer AI jobs — summarising a quarter of meeting notes, rewriting a batch of product copy. In a normal app you would stream the model’s reply token by token so the user sees progress. Apps Script cannot do that: UrlFetchApp.fetch is synchronous and only returns once the whole response has arrived, so there is no partial output to show and no way to read a stream as it flows.

That constraint is not the end of the road — it just changes the shape of the fix. Instead of streaming, you make each request small enough to finish well inside Apps Script’s six-minute execution limit. This script splits a long input into chunks, sends each as its own complete request, and stitches the replies back together. The job still finishes; it just runs as a series of short calls rather than one long stream.

What you’ll need

  • An Apps Script project that calls an LLM API — the example uses Anthropic’s Messages API.
  • An API key saved as ANTHROPIC_API_KEY in Script Properties — see Store API keys and secrets securely.
  • A rough sense of how long your inputs run, so the chunk size can be tuned to keep each call comfortably short.

The script

// LLM endpoint and the model used for each chunk.
const LLM_API_URL = 'https://api.anthropic.com/v1/messages';
const LLM_MODEL = 'claude-sonnet-4-6';

// How many characters of input go into a single request. Smaller chunks
// mean shorter, more reliable calls — see "Watch out for".
const CHUNK_SIZE = 4000;

// Token ceiling for each individual reply.
const MAX_TOKENS_PER_CHUNK = 1000;

/**
 * Processes a long piece of text by splitting it into chunks, sending
 * each as its own complete request, and joining the replies.
 *
 * @param {string} text       The full input to process.
 * @param {string} instruction  What to do with each chunk.
 * @param {number} chunkSize  Characters per chunk.
 * @returns {string}          The combined output.
 */
function processLongText(text, instruction, chunkSize = CHUNK_SIZE) {
  // Nothing to do — bail out before making any API calls.
  if (!text) {
    Logger.log('No input text — nothing to process.');
    return '';
  }

  const results = [];

  // 1. Walk the text in fixed-size slices. Each slice is one request.
  for (let start = 0; start < text.length; start += chunkSize) {
    const chunk = text.slice(start, start + chunkSize);
    const chunkNumber = Math.floor(start / chunkSize) + 1;
    Logger.log('Processing chunk ' + chunkNumber);

    // 2. Each call is a complete, synchronous request — it returns
    //    only when that chunk's reply is fully written.
    const reply = callLlm(instruction + '\n\n' + chunk);
    results.push(reply);
  }

  // 3. Stitch the chunk replies back into one result.
  Logger.log('Combined ' + results.length + ' chunk(s).');
  return results.join('\n\n');
}

/**
 * Makes one complete request to the LLM API. UrlFetchApp is synchronous,
 * so this blocks until the whole reply has arrived — there is no stream
 * to read, and that is the point: each call is short and self-contained.
 *
 * @param {string} prompt  The full prompt for this request.
 * @returns {string}       The model's reply text.
 */
function callLlm(prompt) {
  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');
  if (!key) throw new Error('ANTHROPIC_API_KEY is not set in Script Properties.');

  const response = UrlFetchApp.fetch(LLM_API_URL, {
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model: LLM_MODEL,
      max_tokens: MAX_TOKENS_PER_CHUNK,
      messages: [{ role: 'user', content: prompt }],
    }),
    muteHttpExceptions: true,
  });

  // Surface API errors instead of letting a bad chunk fail silently.
  const code = response.getResponseCode();
  if (code < 200 || code >= 300) {
    throw new Error('LLM API returned ' + code + ': ' + response.getContentText());
  }

  return JSON.parse(response.getContentText()).content[0].text.trim();
}

How it works

  1. UrlFetchApp.fetch is synchronous — it sends a request and blocks until the complete response has downloaded. There is no way to read tokens as they stream, so the strategy is to make every request small instead.
  2. processLongText bails out immediately on empty input, then walks the text in fixed CHUNK_SIZE slices. Each slice becomes one self-contained request.
  3. For every chunk it builds a prompt — the instruction plus that slice — and calls callLlm. Because each call covers only a few thousand characters, it returns quickly and never approaches the six-minute execution limit.
  4. callLlm makes one complete request. It reads the API key from Script Properties, posts the prompt, and checks the response code so a failed chunk throws a clear error rather than corrupting the output.
  5. Each chunk’s reply is collected, and processLongText joins them into a single result once every chunk is done.

Example run

Summarise a long block of meeting notes:

function summariseNotes() {
  const notes = SpreadsheetApp.getActiveSheet()
    .getRange('A1').getValue();        // ~12,000 characters
  const summary = processLongText(notes, 'Summarise these notes in two sentences.');
  Logger.log(summary);
}

With a 4,000-character chunk size, a 12,000-character input runs as three calls:

ChunkInput rangeResult
1chars 0–4,000Two-sentence summary of part 1
2chars 4,000–8,000Two-sentence summary of part 2
3chars 8,000–12,000Two-sentence summary of part 3

The three summaries are joined into one block — a complete result with no stream, no timeout, and a clear log line per chunk.

Run it

This is an on-demand helper, called from whatever job needs it:

  1. Save the API key in Project Settings → Script Properties as ANTHROPIC_API_KEY.
  2. Call processLongText(text, instruction) from your own function, as in the example above.
  3. Run that function and approve the authorisation prompt the first time.

For very large jobs that risk the six-minute limit even when chunked, save progress after each chunk and resume on a follow-up trigger rather than trying to finish in one execution.

Watch out for

  • True streaming is not possible in Apps Script. UrlFetchApp is synchronous and returns the whole body at once — there is no callback or readable stream. Chunking is the workaround, not a way around the constraint.
  • Each script execution is capped at six minutes. Many chunked calls in series can still hit that ceiling — keep CHUNK_SIZE modest, and for long jobs checkpoint progress and continue on a fresh run.
  • Chunking splits on character count, not meaning. A slice can cut a sentence in half, which weakens per-chunk results. For summarising, split on paragraph breaks instead so each chunk is self-contained.
  • Joining chunk replies does not give you a single coherent document. For a unified result, run a second pass that summarises or merges the combined chunk outputs.
  • If you genuinely need streaming — a live UI showing tokens as they arrive — the only real option is a small external worker that streams the API and posts the assembled result back to an Apps Script web app.
  • Watch the API bill. Chunking turns one request into several, and overlapping context across chunks multiplies token usage. Tune the chunk size against both reliability and cost.

Related