appscript.dev
Automation Advanced Sheets

Build an AI data-enrichment pipeline

Fill missing company and contact fields on Northwind's prospect list — Claude infers from the row.

Published Nov 23, 2025

Northwind’s prospect list grows faster than anyone can clean it. A name and a website come in from a form or an import, and the columns that actually drive segmentation — industry, headcount band, head-office location — sit empty. Filling them by hand means a dozen browser tabs per row, so the list stays half-blank and the sales team can’t slice it.

This script enriches the list automatically. It walks every row, skips the ones already filled in, and asks Claude to infer the missing fields from the company name and URL. The answers come back as structured JSON and write straight into the right columns. Run it once on a backlog, then on a schedule to keep up with new rows.

What you’ll need

  • A Google Sheet of prospects with a header row, including columns named company, url, industry, headcountBand, and hq. The script reads columns by header name, so order does not matter.
  • An Anthropic API key saved as ANTHROPIC_API_KEY in Script Properties — see Store API keys and secrets securely.
  • Nothing else — the script writes back into the same sheet.

The script

// The spreadsheet that holds your prospect list.
const PROSPECTS_SHEET_ID = '1abcProspectsId';

// Allowed values for the headcount band — pinned so the column stays clean.
const HEADCOUNT_BANDS = '1-10|11-50|51-200|200+';

/**
 * Walks the prospect list and fills missing industry, headcount band,
 * and HQ fields by asking Claude to infer them from each row.
 */
function enrichRows() {
  const sheet = SpreadsheetApp.openById(PROSPECTS_SHEET_ID).getSheets()[0];

  // 1. Read the whole sheet and split the header from the data rows.
  const [header, ...rows] = sheet.getDataRange().getValues();

  if (!rows.length) {
    Logger.log('No prospect rows to enrich — nothing to do.');
    return;
  }

  // 2. Map header names to column indexes so lookups read clearly.
  const col = Object.fromEntries(header.map((name, i) => [name, i]));

  let enriched = 0;
  rows.forEach((r, i) => {
    // 3. Skip rows that already have industry and headcount filled in.
    if (r[col.industry] && r[col.headcountBand]) return;

    // 4. Ask Claude to infer the missing fields, pinned to strict JSON.
    const prompt =
      'Infer fields for this company. ' +
      'Return ONLY a JSON object — no prose, no markdown — in this shape: ' +
      `{"industry": string, "headcountBand": "${HEADCOUNT_BANDS}", "hq": string}.` +
      '\n\nCompany: ' + r[col.company] +
      '\nURL: ' + (r[col.url] || '(none)');
    const out = JSON.parse(stripFences(callClaude(prompt)));

    // 5. Write each inferred value back into its column on this row.
    sheet.getRange(i + 2, col.industry + 1).setValue(out.industry);
    sheet.getRange(i + 2, col.headcountBand + 1).setValue(out.headcountBand);
    sheet.getRange(i + 2, col.hq + 1).setValue(out.hq);
    enriched++;
  });

  Logger.log('Enriched ' + enriched + ' row(s).');
}

/**
 * Claude occasionally wraps JSON in a ```json code fence. Strip it so
 * JSON.parse never chokes on the markdown.
 */
function stripFences(text) {
  return text.replace(/```(?:json)?/g, '').trim();
}

/**
 * Minimal Anthropic API call. The key lives in Script Properties — it
 * is never pasted into the code.
 */
function callClaude(prompt, model = 'claude-haiku-4-5-20251001', maxTokens = 200) {
  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');
  const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model,
      max_tokens: maxTokens,
      messages: [{ role: 'user', content: prompt }],
    }),
    muteHttpExceptions: true,
  });
  return JSON.parse(res.getContentText()).content[0].text.trim();
}

How it works

  1. enrichRows opens the prospect sheet, reads the whole data range, and splits the header row from the data rows.
  2. If there are no data rows, it logs a message and stops — no wasted API calls.
  3. It builds a col lookup from header name to column index, so the rest of the code reads col.industry instead of a brittle hard-coded number.
  4. For each row it checks whether industry and headcountBand are already filled. If they are, it skips the row — the script is safe to re-run and only pays for rows that still need work.
  5. It builds a prompt that pins the output to a strict JSON object, with headcountBand constrained to a fixed set of bands so the column stays clean.
  6. It calls Claude Haiku — inference from a name and URL is light work, so the cheap model is the right call. stripFences removes any code fence, then JSON.parse turns the reply into a real object.
  7. It writes the three inferred values back into their columns on that row.

Example run

Say the prospect sheet has a row that came in with only a name and a URL:

companyurlindustryheadcountBandhq
Riverbend Logisticsriverbendlogistics.com

After a run, Claude has filled the gaps:

companyurlindustryheadcountBandhq
Riverbend Logisticsriverbendlogistics.comFreight & logistics51-200Manchester, UK

Rows that already had industry and headcountBand are left untouched, so a second run only spends API calls on the genuinely new prospects.

Trigger it

Run it once by hand to clear the backlog, then on a schedule to keep up:

  1. In the Apps Script editor, select enrichRows and click Run to enrich the existing rows. Approve the authorisation prompt the first time.
  2. Open Triggers (the clock icon) and add a time-driven trigger for enrichRows, daily or hourly depending on how fast new rows arrive.

Because filled rows are skipped, each scheduled run only touches prospects added since the last one.

Watch out for

  • Inferred fields are best guesses, not verified facts. For a small company with a thin web presence, Claude may infer the wrong industry or headcount — treat enriched data as a starting point, not a source of truth.
  • The headcount band is constrained to HEADCOUNT_BANDS, but the prompt cannot force it. Validate the returned value against the allowed set before writing if the column feeds a strict downstream report.
  • Each row is one API call inside the loop, plus a setValue per field. A large backlog can hit Apps Script’s six-minute limit — process in batches by filtering to a slice of rows, and let the skip logic resume where it left off.
  • JSON.parse will throw on a malformed reply. stripFences handles the common code-fence case; if it still throws, log the raw reply and tighten the prompt rather than wrapping the call in a bare try/catch that hides bad data.
  • Writing cell by cell is slow. For large lists, collect the enriched values and write them with a single setValues call per column block instead.

Related