Extract structured data from messy text
Pull names, dates, amounts into clean Northwind columns from free-text rows.
Published Jul 14, 2025
Northwind’s free-text column is where the useful data goes to hide. A pasted email, a copied invoice line, a note someone typed in a hurry — each one names a person, a date and an amount, but never in the same place or the same format. Regex can catch one phrasing and miss the next, and copying the values across by hand does not scale.
This script reads each row’s raw text, asks Claude to pull out the fields as
strict JSON, and writes them into clean name, date and amount columns. It
only processes rows where the amount cell is still empty, so a run picks up
exactly the new rows and leaves everything it has already done alone.
What you’ll need
- A Google Sheet with a header row and these columns:
raw(the messy free-text input), plus emptyname,dateandamountcolumns for the extracted fields. - An Anthropic API key saved as
ANTHROPIC_API_KEYin Script Properties — see Store API keys and secrets securely.
The script
// The sheet holding the messy text and the clean output columns.
const EXTRACT_SHEET_ID = '1abcExtractId';
/**
* Reads each row's free-text "raw" column, extracts name, date and
* amount with Claude, and writes them into clean columns. Skips any
* row that already has an amount.
*/
function extractRows() {
const sheet = SpreadsheetApp.openById(EXTRACT_SHEET_ID).getSheets()[0];
// 1. Read every row and map header names to column indexes.
const [h, ...rows] = sheet.getDataRange().getValues();
if (!rows.length) {
Logger.log('No rows to extract — nothing to do.');
return;
}
const col = Object.fromEntries(h.map((k, i) => [k, i]));
let processed = 0;
rows.forEach((r, i) => {
// 2. Skip rows already done (amount filled) or with no text to read.
if (r[col.amount] || !r[col.raw]) return;
// 3. Ask Claude to pull the fields out of the raw text as JSON.
const out = extractFields(r[col.raw]);
// 4. Write each field into its clean column (row i + 2 for the header).
sheet.getRange(i + 2, col.name + 1).setValue(out.name);
sheet.getRange(i + 2, col.date + 1).setValue(out.date);
sheet.getRange(i + 2, col.amount + 1).setValue(out.amount);
processed++;
});
Logger.log('Extracted fields for ' + processed + ' rows.');
}
/**
* Sends one block of free text to Claude and returns the parsed
* {name, date, amount} object. The strict schema keeps the reply parseable.
*/
function extractFields(text) {
const prompt =
'Extract these fields as strict JSON: ' +
'{"name": string, "date": "YYYY-MM-DD", "amount": number}\n\n' + text;
return JSON.parse(callClaude(prompt));
}
/**
* Minimal Anthropic API call. The key lives in Script Properties — it
* is never pasted into the code.
*/
function callClaude(prompt) {
const key = PropertiesService.getScriptProperties()
.getProperty('ANTHROPIC_API_KEY');
const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
method: 'post',
contentType: 'application/json',
headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
payload: JSON.stringify({
model: 'claude-haiku-4-5-20251001',
max_tokens: 300,
messages: [{ role: 'user', content: prompt }],
}),
});
return JSON.parse(res.getContentText()).content[0].text.trim();
}
How it works
extractRowsreads the whole sheet and builds acollookup that maps each header name to its column index, so the rest of the code refers to fields by name rather than a brittle number. It stops early if the sheet has no rows.- For each row it skips two cases: a row whose
amountis already filled (done on an earlier run) and a row with norawtext (nothing to read). extractFieldsbuilds a prompt that pins the output to a strict JSON schema — anamestring, adateinYYYY-MM-DD, and a numericamount— and parses Claude’s reply into a real object.- It writes the three fields into their columns. The
col.name + 1converts a zero-based index to the 1-based columngetRangeexpects, andi + 2accounts for the header row plus the zero-based loop index. - A counter logs how many rows were processed, so a run that finds nothing new is easy to spot.
Example run
Say the raw column holds messy lines like these, and the other columns are
empty:
| raw |
|---|
| Invoice from Acme Corp, paid 3rd Feb 2025, total £1,240.00 |
| got payment — Bridge & Co, 240 quid, jan 15 |
After a run, the clean columns are filled:
| raw | name | date | amount |
|---|---|---|---|
| Invoice from Acme Corp… | Acme Corp | 2025-02-03 | 1240 |
| got payment — Bridge & Co… | Bridge & Co | 2025-01-15 | 240 |
Both rows are normalised to the same shape even though the input phrasing, date format and currency style were all different.
Trigger it
If the messy rows arrive steadily — pasted in through the day, or appended by a form — run the extraction on a schedule so the clean columns stay up to date:
- In the Apps Script editor, open Triggers (the clock icon).
- Click Add Trigger.
- Choose
extractRows, a Time-driven source, and an Hour timer.
Because the script skips rows that already have an amount, each run only spends
API calls on genuinely new text.
Watch out for
- The skip rule keys off
amount. If a real extraction legitimately produces an empty or zero amount, that row will be retried on every run. Use a dedicated status column if your data can have blank amounts. - Ambiguous dates are guesses. “01/02/2025” could be January or February — Claude will pick one. If your source uses a known format, say so in the prompt to remove the ambiguity.
JSON.parsethrows on a malformed reply. The strict schema keeps Haiku reliable, but if Claude wraps the JSON in a code fence, strip the fence before parsing rather than reaching for regex.- One row per call means one API call each. For a large backlog this is slow and can hit the script runtime limit — batch several rows into one prompt and ask for a JSON array of results.
- Amounts come back as numbers, not currency. The script writes
1240, not£1,240.00— apply a currency format to theamountcolumn in the sheet if you need it displayed that way.
Related
Parse semi-structured listings into tables
Extract recipes, specs, or ads from Northwind content into clean spreadsheet rows.
Updated Mar 7, 2026
Extract follow-ups from call transcripts
Turn Northwind sales calls into actionable tasks — owner + task + due date per extracted item.
Updated Jan 10, 2026
Build an AI data-enrichment pipeline
Fill missing company and contact fields on Northwind's prospect list — Claude infers from the row.
Updated Nov 23, 2025
Build an AI invoice and receipt parser
Read Northwind documents into structured ledger rows — vendor, amount, line items.
Updated Oct 2, 2025
Extract entities and relationships from text
Build a structured graph from Northwind prose — people, companies, and how they connect.
Updated Sep 24, 2025