Parse semi-structured listings into tables
Extract recipes, specs, or ads from Northwind content into clean spreadsheet rows.
Published Mar 7, 2026
Northwind’s catalogue starts life as messy paragraphs — supplier emails, copy-pasted ad text, ingredient lists, gear specs. The shape is roughly the same every time (a name, a price, a few attributes) but it never lines up into rows. So someone reads each blob and types fields into a spreadsheet, and that someone gives up around row sixty.
This script puts a tiny extractor in front of the column. You drop the raw text into column A, define the schema you want, and Claude returns clean JSON matching that shape — name, price, specs — which the script writes into the columns next to each row. The schema is a normal JavaScript object, so you can swap it from “product spec” to “recipe” to “classified ad” without rewriting the pipeline.
What you’ll need
- A Google Sheet with the raw text in column A. The script writes the parsed fields into columns B, C and D, with a header row in row 1.
- An Anthropic API key saved as
ANTHROPIC_API_KEYin Script Properties — see Store API keys and secrets securely. - A schema that matches the columns you’ve laid out. The default — name, price, specs — works for product listings; change it to fit your data.
The script
// The spreadsheet that holds the raw listings.
const LISTINGS_SHEET_ID = '1abcListingsId';
// Columns the script reads from and writes to (1-indexed for getRange).
const TEXT_COL = 1;
const NAME_COL = 2;
const PRICE_COL = 3;
const SPECS_COL = 4;
// The shape we want Claude to extract. Change the keys to repurpose the
// pipeline — recipes ({title, time, ingredients}), ads ({title, price, area}),
// anything semi-structured.
const DEFAULT_SCHEMA = {
name: 'string',
price: 'number',
specs: ['string'],
};
/**
* Ask Claude to extract structured fields from a single blob of text,
* matching the given schema.
*
* @param {string} text - Raw listing text.
* @param {object} schema - The JSON shape to extract.
* @returns {object} The parsed object, matching the schema's keys.
*/
function parseListing(text, schema) {
if (!text) throw new Error('parseListing needs some text.');
// The schema goes into the prompt as JSON so Claude knows the keys and types.
const prompt =
'Extract structured data from this Northwind listing. ' +
'Return ONLY a JSON object — no prose, no markdown — matching this ' +
'exact schema (keys and types): ' + JSON.stringify(schema) + '. ' +
'If a field is missing, use null (or [] for arrays). Numbers must be ' +
'numbers, not strings.\n\n' + text;
const reply = callClaude(prompt, 'claude-sonnet-4-6', 800);
return JSON.parse(stripFences(reply));
}
/**
* Walk the listings sheet, parse every row whose name cell is still empty,
* and write the extracted fields back into columns B/C/D.
*/
function parseListingsColumn() {
const sheet = SpreadsheetApp.openById(LISTINGS_SHEET_ID).getSheets()[0];
const values = sheet.getDataRange().getValues();
if (values.length < 2) return;
const rows = values.slice(1);
let parsed = 0;
rows.forEach((r, i) => {
const rowNum = i + 2; // +1 for header, +1 because getRange is 1-indexed.
// Skip rows with no text, and rows already parsed (name cell populated).
if (!r[TEXT_COL - 1] || r[NAME_COL - 1]) return;
try {
const out = parseListing(r[TEXT_COL - 1], DEFAULT_SCHEMA);
sheet.getRange(rowNum, NAME_COL).setValue(out.name || '');
sheet.getRange(rowNum, PRICE_COL).setValue(out.price ?? '');
sheet.getRange(rowNum, SPECS_COL).setValue((out.specs || []).join(', '));
parsed++;
} catch (e) {
// Don't let one bad row stop the whole sheet.
sheet.getRange(rowNum, NAME_COL).setValue('ERROR: ' + e.message);
}
});
Logger.log('Parsed ' + parsed + ' new listing(s).');
}
/**
* Claude occasionally wraps JSON in a ```json code fence. Strip it so
* JSON.parse never chokes on the markdown.
*/
function stripFences(text) {
return text.replace(/```(?:json)?/g, '').trim();
}
/**
* Minimal Anthropic API call. The key lives in Script Properties — it
* is never pasted into the code.
*/
function callClaude(prompt, model = 'claude-haiku-4-5-20251001', maxTokens = 400) {
const key = PropertiesService.getScriptProperties()
.getProperty('ANTHROPIC_API_KEY');
const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
method: 'post',
contentType: 'application/json',
headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
payload: JSON.stringify({
model,
max_tokens: maxTokens,
messages: [{ role: 'user', content: prompt }],
}),
muteHttpExceptions: true,
});
return JSON.parse(res.getContentText()).content[0].text.trim();
}
How it works
parseListingbuilds a prompt that embeds the schema literally so Claude knows both the keys and the types you expect. Asking fornull(or[]) when a field is missing prevents the parser from inventing data.stripFencesremoves any markdown wrapper, thenJSON.parseturns the reply into a real object the caller can use.parseListingsColumnreads the whole sheet into memory and skips two classes of row: empty source cells and rows already parsed (where the name column is populated). That makes the function idempotent — running it again only touches new rows.- For each remaining row, it calls
parseListing, then writes the three fields back into columns B, C and D. Arrays are joined into a single comma-separated cell so the sheet stays human-readable. - If one row blows up — bad JSON, missing fields — the error is written into the name cell and the loop continues with the next row, so one bad listing never stops the run.
Example run
Column A contains free-form blobs like this:
Northwind Trail Pack 32L — £85. Ripstop nylon shell, hip belt, internal hydration sleeve, six external pockets. Weighs 940g. Ships May.
After parseListingsColumn runs, the row looks like:
| Raw text | Name | Price | Specs |
|---|---|---|---|
| Northwind Trail Pack 32L — £85. … | Northwind Trail Pack 32L | 85 | Ripstop nylon shell, Hip belt, Internal hydration sleeve, Six external pockets, 940g |
Add a new row of raw text and run the function again — only the new row is parsed; the existing rows are skipped because their name cell is already filled in.
Run it
This is on-demand. Drop new text into column A, then:
- In the Apps Script editor, select
parseListingsColumnand click Run. - Approve the authorisation prompt the first time.
- Watch columns B, C and D populate. Re-running is safe — only empty rows are touched.
If you’d rather run it from the sheet, add an onOpen menu:
function onOpen() {
SpreadsheetApp.getUi()
.createMenu('Listings')
.addItem('Parse new rows', 'parseListingsColumn')
.addToUi();
}
Watch out for
- Schema mismatch is the most common failure. If your sheet has four output
columns but
DEFAULT_SCHEMAdefines three keys, you’ll get empty cells. Update both together. - The model can hallucinate plausible-looking values if the raw text is thin. The “use null for missing fields” instruction reduces this but doesn’t eliminate it — spot-check the first dozen rows when you change schemas.
- Each row is one API call. For a sheet of thousands, batch ten or twenty blobs into one prompt with an array schema, or run the script in chunks to stay under the six-minute execution limit.
- Strict JSON keeps this reliable.
stripFenceshandles code fences; errors that still bubble up are written back into the sheet so you can see exactly which row failed.
Related
Extract follow-ups from call transcripts
Turn Northwind sales calls into actionable tasks — owner + task + due date per extracted item.
Updated Jan 10, 2026
Build an AI data-enrichment pipeline
Fill missing company and contact fields on Northwind's prospect list — Claude infers from the row.
Updated Nov 23, 2025
Build an AI invoice and receipt parser
Read Northwind documents into structured ledger rows — vendor, amount, line items.
Updated Oct 2, 2025
Extract entities and relationships from text
Build a structured graph from Northwind prose — people, companies, and how they connect.
Updated Sep 24, 2025
Build an AI data-cleaning assistant
Standardise messy Northwind names, addresses, and categories into clean canonical values.
Updated Sep 4, 2025