appscript.dev
Automation Intermediate Sheets

Parse semi-structured listings into tables

Extract recipes, specs, or ads from Northwind content into clean spreadsheet rows.

Published Mar 7, 2026

Northwind’s catalogue starts life as messy paragraphs — supplier emails, copy-pasted ad text, ingredient lists, gear specs. The shape is roughly the same every time (a name, a price, a few attributes) but it never lines up into rows. So someone reads each blob and types fields into a spreadsheet, and that someone gives up around row sixty.

This script puts a tiny extractor in front of the column. You drop the raw text into column A, define the schema you want, and Claude returns clean JSON matching that shape — name, price, specs — which the script writes into the columns next to each row. The schema is a normal JavaScript object, so you can swap it from “product spec” to “recipe” to “classified ad” without rewriting the pipeline.

What you’ll need

  • A Google Sheet with the raw text in column A. The script writes the parsed fields into columns B, C and D, with a header row in row 1.
  • An Anthropic API key saved as ANTHROPIC_API_KEY in Script Properties — see Store API keys and secrets securely.
  • A schema that matches the columns you’ve laid out. The default — name, price, specs — works for product listings; change it to fit your data.

The script

// The spreadsheet that holds the raw listings.
const LISTINGS_SHEET_ID = '1abcListingsId';

// Columns the script reads from and writes to (1-indexed for getRange).
const TEXT_COL = 1;
const NAME_COL = 2;
const PRICE_COL = 3;
const SPECS_COL = 4;

// The shape we want Claude to extract. Change the keys to repurpose the
// pipeline — recipes ({title, time, ingredients}), ads ({title, price, area}),
// anything semi-structured.
const DEFAULT_SCHEMA = {
  name: 'string',
  price: 'number',
  specs: ['string'],
};

/**
 * Ask Claude to extract structured fields from a single blob of text,
 * matching the given schema.
 *
 * @param {string} text - Raw listing text.
 * @param {object} schema - The JSON shape to extract.
 * @returns {object} The parsed object, matching the schema's keys.
 */
function parseListing(text, schema) {
  if (!text) throw new Error('parseListing needs some text.');

  // The schema goes into the prompt as JSON so Claude knows the keys and types.
  const prompt =
    'Extract structured data from this Northwind listing. ' +
    'Return ONLY a JSON object — no prose, no markdown — matching this ' +
    'exact schema (keys and types): ' + JSON.stringify(schema) + '. ' +
    'If a field is missing, use null (or [] for arrays). Numbers must be ' +
    'numbers, not strings.\n\n' + text;

  const reply = callClaude(prompt, 'claude-sonnet-4-6', 800);
  return JSON.parse(stripFences(reply));
}

/**
 * Walk the listings sheet, parse every row whose name cell is still empty,
 * and write the extracted fields back into columns B/C/D.
 */
function parseListingsColumn() {
  const sheet = SpreadsheetApp.openById(LISTINGS_SHEET_ID).getSheets()[0];
  const values = sheet.getDataRange().getValues();
  if (values.length < 2) return;
  const rows = values.slice(1);

  let parsed = 0;
  rows.forEach((r, i) => {
    const rowNum = i + 2; // +1 for header, +1 because getRange is 1-indexed.

    // Skip rows with no text, and rows already parsed (name cell populated).
    if (!r[TEXT_COL - 1] || r[NAME_COL - 1]) return;

    try {
      const out = parseListing(r[TEXT_COL - 1], DEFAULT_SCHEMA);
      sheet.getRange(rowNum, NAME_COL).setValue(out.name || '');
      sheet.getRange(rowNum, PRICE_COL).setValue(out.price ?? '');
      sheet.getRange(rowNum, SPECS_COL).setValue((out.specs || []).join(', '));
      parsed++;
    } catch (e) {
      // Don't let one bad row stop the whole sheet.
      sheet.getRange(rowNum, NAME_COL).setValue('ERROR: ' + e.message);
    }
  });

  Logger.log('Parsed ' + parsed + ' new listing(s).');
}

/**
 * Claude occasionally wraps JSON in a ```json code fence. Strip it so
 * JSON.parse never chokes on the markdown.
 */
function stripFences(text) {
  return text.replace(/```(?:json)?/g, '').trim();
}

/**
 * Minimal Anthropic API call. The key lives in Script Properties — it
 * is never pasted into the code.
 */
function callClaude(prompt, model = 'claude-haiku-4-5-20251001', maxTokens = 400) {
  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');
  const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model,
      max_tokens: maxTokens,
      messages: [{ role: 'user', content: prompt }],
    }),
    muteHttpExceptions: true,
  });
  return JSON.parse(res.getContentText()).content[0].text.trim();
}

How it works

  1. parseListing builds a prompt that embeds the schema literally so Claude knows both the keys and the types you expect. Asking for null (or []) when a field is missing prevents the parser from inventing data.
  2. stripFences removes any markdown wrapper, then JSON.parse turns the reply into a real object the caller can use.
  3. parseListingsColumn reads the whole sheet into memory and skips two classes of row: empty source cells and rows already parsed (where the name column is populated). That makes the function idempotent — running it again only touches new rows.
  4. For each remaining row, it calls parseListing, then writes the three fields back into columns B, C and D. Arrays are joined into a single comma-separated cell so the sheet stays human-readable.
  5. If one row blows up — bad JSON, missing fields — the error is written into the name cell and the loop continues with the next row, so one bad listing never stops the run.

Example run

Column A contains free-form blobs like this:

Northwind Trail Pack 32L — £85. Ripstop nylon shell, hip belt, internal hydration sleeve, six external pockets. Weighs 940g. Ships May.

After parseListingsColumn runs, the row looks like:

Raw textNamePriceSpecs
Northwind Trail Pack 32L — £85. …Northwind Trail Pack 32L85Ripstop nylon shell, Hip belt, Internal hydration sleeve, Six external pockets, 940g

Add a new row of raw text and run the function again — only the new row is parsed; the existing rows are skipped because their name cell is already filled in.

Run it

This is on-demand. Drop new text into column A, then:

  1. In the Apps Script editor, select parseListingsColumn and click Run.
  2. Approve the authorisation prompt the first time.
  3. Watch columns B, C and D populate. Re-running is safe — only empty rows are touched.

If you’d rather run it from the sheet, add an onOpen menu:

function onOpen() {
  SpreadsheetApp.getUi()
    .createMenu('Listings')
    .addItem('Parse new rows', 'parseListingsColumn')
    .addToUi();
}

Watch out for

  • Schema mismatch is the most common failure. If your sheet has four output columns but DEFAULT_SCHEMA defines three keys, you’ll get empty cells. Update both together.
  • The model can hallucinate plausible-looking values if the raw text is thin. The “use null for missing fields” instruction reduces this but doesn’t eliminate it — spot-check the first dozen rows when you change schemas.
  • Each row is one API call. For a sheet of thousands, batch ten or twenty blobs into one prompt with an array schema, or run the script in chunks to stay under the six-minute execution limit.
  • Strict JSON keeps this reliable. stripFences handles code fences; errors that still bubble up are written back into the sheet so you can see exactly which row failed.

Related