Build an AI data-enrichment pipeline
Fill missing company and contact fields on Northwind's prospect list — Claude infers from the row.
Published Nov 23, 2025
Northwind’s prospect list grows faster than anyone can clean it. A name and a website come in from a form or an import, and the columns that actually drive segmentation — industry, headcount band, head-office location — sit empty. Filling them by hand means a dozen browser tabs per row, so the list stays half-blank and the sales team can’t slice it.
This script enriches the list automatically. It walks every row, skips the ones already filled in, and asks Claude to infer the missing fields from the company name and URL. The answers come back as structured JSON and write straight into the right columns. Run it once on a backlog, then on a schedule to keep up with new rows.
What you’ll need
- A Google Sheet of prospects with a header row, including columns named
company,url,industry,headcountBand, andhq. The script reads columns by header name, so order does not matter. - An Anthropic API key saved as
ANTHROPIC_API_KEYin Script Properties — see Store API keys and secrets securely. - Nothing else — the script writes back into the same sheet.
The script
// The spreadsheet that holds your prospect list.
const PROSPECTS_SHEET_ID = '1abcProspectsId';
// Allowed values for the headcount band — pinned so the column stays clean.
const HEADCOUNT_BANDS = '1-10|11-50|51-200|200+';
/**
* Walks the prospect list and fills missing industry, headcount band,
* and HQ fields by asking Claude to infer them from each row.
*/
function enrichRows() {
const sheet = SpreadsheetApp.openById(PROSPECTS_SHEET_ID).getSheets()[0];
// 1. Read the whole sheet and split the header from the data rows.
const [header, ...rows] = sheet.getDataRange().getValues();
if (!rows.length) {
Logger.log('No prospect rows to enrich — nothing to do.');
return;
}
// 2. Map header names to column indexes so lookups read clearly.
const col = Object.fromEntries(header.map((name, i) => [name, i]));
let enriched = 0;
rows.forEach((r, i) => {
// 3. Skip rows that already have industry and headcount filled in.
if (r[col.industry] && r[col.headcountBand]) return;
// 4. Ask Claude to infer the missing fields, pinned to strict JSON.
const prompt =
'Infer fields for this company. ' +
'Return ONLY a JSON object — no prose, no markdown — in this shape: ' +
`{"industry": string, "headcountBand": "${HEADCOUNT_BANDS}", "hq": string}.` +
'\n\nCompany: ' + r[col.company] +
'\nURL: ' + (r[col.url] || '(none)');
const out = JSON.parse(stripFences(callClaude(prompt)));
// 5. Write each inferred value back into its column on this row.
sheet.getRange(i + 2, col.industry + 1).setValue(out.industry);
sheet.getRange(i + 2, col.headcountBand + 1).setValue(out.headcountBand);
sheet.getRange(i + 2, col.hq + 1).setValue(out.hq);
enriched++;
});
Logger.log('Enriched ' + enriched + ' row(s).');
}
/**
* Claude occasionally wraps JSON in a ```json code fence. Strip it so
* JSON.parse never chokes on the markdown.
*/
function stripFences(text) {
return text.replace(/```(?:json)?/g, '').trim();
}
/**
* Minimal Anthropic API call. The key lives in Script Properties — it
* is never pasted into the code.
*/
function callClaude(prompt, model = 'claude-haiku-4-5-20251001', maxTokens = 200) {
const key = PropertiesService.getScriptProperties()
.getProperty('ANTHROPIC_API_KEY');
const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
method: 'post',
contentType: 'application/json',
headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
payload: JSON.stringify({
model,
max_tokens: maxTokens,
messages: [{ role: 'user', content: prompt }],
}),
muteHttpExceptions: true,
});
return JSON.parse(res.getContentText()).content[0].text.trim();
}
How it works
enrichRowsopens the prospect sheet, reads the whole data range, and splits the header row from the data rows.- If there are no data rows, it logs a message and stops — no wasted API calls.
- It builds a
collookup from header name to column index, so the rest of the code readscol.industryinstead of a brittle hard-coded number. - For each row it checks whether
industryandheadcountBandare already filled. If they are, it skips the row — the script is safe to re-run and only pays for rows that still need work. - It builds a prompt that pins the output to a strict JSON object, with
headcountBandconstrained to a fixed set of bands so the column stays clean. - It calls Claude Haiku — inference from a name and URL is light work, so the
cheap model is the right call.
stripFencesremoves any code fence, thenJSON.parseturns the reply into a real object. - It writes the three inferred values back into their columns on that row.
Example run
Say the prospect sheet has a row that came in with only a name and a URL:
| company | url | industry | headcountBand | hq |
|---|---|---|---|---|
| Riverbend Logistics | riverbendlogistics.com |
After a run, Claude has filled the gaps:
| company | url | industry | headcountBand | hq |
|---|---|---|---|---|
| Riverbend Logistics | riverbendlogistics.com | Freight & logistics | 51-200 | Manchester, UK |
Rows that already had industry and headcountBand are left untouched, so a
second run only spends API calls on the genuinely new prospects.
Trigger it
Run it once by hand to clear the backlog, then on a schedule to keep up:
- In the Apps Script editor, select
enrichRowsand click Run to enrich the existing rows. Approve the authorisation prompt the first time. - Open Triggers (the clock icon) and add a time-driven trigger for
enrichRows, daily or hourly depending on how fast new rows arrive.
Because filled rows are skipped, each scheduled run only touches prospects added since the last one.
Watch out for
- Inferred fields are best guesses, not verified facts. For a small company with a thin web presence, Claude may infer the wrong industry or headcount — treat enriched data as a starting point, not a source of truth.
- The headcount band is constrained to
HEADCOUNT_BANDS, but the prompt cannot force it. Validate the returned value against the allowed set before writing if the column feeds a strict downstream report. - Each row is one API call inside the loop, plus a
setValueper field. A large backlog can hit Apps Script’s six-minute limit — process in batches by filtering to a slice of rows, and let the skip logic resume where it left off. JSON.parsewill throw on a malformed reply.stripFenceshandles the common code-fence case; if it still throws, log the raw reply and tighten the prompt rather than wrapping the call in a bare try/catch that hides bad data.- Writing cell by cell is slow. For large lists, collect the enriched values and
write them with a single
setValuescall per column block instead.
Related
Parse semi-structured listings into tables
Extract recipes, specs, or ads from Northwind content into clean spreadsheet rows.
Updated Mar 7, 2026
Extract follow-ups from call transcripts
Turn Northwind sales calls into actionable tasks — owner + task + due date per extracted item.
Updated Jan 10, 2026
Build an AI invoice and receipt parser
Read Northwind documents into structured ledger rows — vendor, amount, line items.
Updated Oct 2, 2025
Extract entities and relationships from text
Build a structured graph from Northwind prose — people, companies, and how they connect.
Updated Sep 24, 2025
Build an AI data-cleaning assistant
Standardise messy Northwind names, addresses, and categories into clean canonical values.
Updated Sep 4, 2025