Build retrieval-augmented Q&A over your data

Northwind keeps a lot of useful facts in spreadsheets — pricing tiers, project histories, supplier contacts, policy notes. The trouble is that nobody can remember which tab holds what, and a general-purpose chatbot will happily invent an answer rather than admit it does not know. What you want is an assistant that answers only from your own data and says so plainly when the data falls short.

That pattern is called retrieval-augmented generation: instead of asking the model to recall facts, you retrieve the rows that look relevant and hand them over as context. This script does the simplest honest version of it — a keyword match to pull candidate rows, then Claude to read those rows and answer. No vector database, no embeddings; just a Sheet and a prompt that forbids guessing.

What you’ll need

A Google Sheet acting as your knowledge base, with a header row and one fact per row. Column names can be anything — the script reads them all.
An Anthropic API key saved as ANTHROPIC_API_KEY in Script Properties — see Store API keys and secrets securely.
Nothing else. Retrieval happens in plain JavaScript.

The script

// The Sheet that holds your knowledge base — one fact per row.
const KNOWLEDGE_SHEET_ID = '1abcKnowledgeId';

// How many matching rows to pass to Claude as context. Enough to be
// useful, capped so the prompt stays within a sensible token budget.
const MAX_CONTEXT_ROWS = 20;

/**
 * Answers a question using only the Northwind knowledge Sheet.
 * Retrieves candidate rows by keyword, then asks Claude to answer
 * from that context alone.
 */
function answer(question) {
  // 1. Bail out early on an empty question.
  if (!question || !question.trim()) {
    return 'Ask a question and I will look it up.';
  }

  // 2. Load every row of the knowledge base as keyed objects.
  const data = readSheet(KNOWLEDGE_SHEET_ID);
  if (!data.length) {
    return 'The knowledge base is empty — nothing to search.';
  }

  // 3. Naive retrieval: split the question into words, keep any row
  //    whose serialised text contains at least one of them.
  const words = question.toLowerCase().split(/\W+/).filter(Boolean);
  const relevant = data
    .filter((r) => words.some((w) => JSON.stringify(r).toLowerCase().includes(w)))
    .slice(0, MAX_CONTEXT_ROWS);

  // 4. If nothing matched, do not waste an API call.
  if (!relevant.length) {
    return 'No rows in the knowledge base mention that — needs a human.';
  }

  // 5. Build a grounded prompt: the data is the only allowed source.
  const prompt =
    'Answer this question for Northwind using ONLY the data below. ' +
    'If the data does not cover it, say so.\n\nData:\n' +
    JSON.stringify(relevant) + '\n\nQuestion: ' + question;

  // 6. Sonnet reads the rows and writes the answer.
  return callClaude(prompt, 'claude-sonnet-4-6', 600);
}

/**
 * Reads the first tab of a Sheet and returns each row as an object
 * keyed by the header cells.
 */
function readSheet(id) {
  const [h, ...rows] = SpreadsheetApp.openById(id)
    .getSheets()[0]
    .getDataRange()
    .getValues();
  return rows.map((r) => Object.fromEntries(h.map((k, i) => [k, r[i]])));
}

/**
 * Minimal Anthropic API call. The key lives in Script Properties — it
 * is never pasted into the code.
 */
function callClaude(prompt, model = 'claude-haiku-4-5-20251001', maxTokens = 400) {
  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');
  const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model,
      max_tokens: maxTokens,
      messages: [{ role: 'user', content: prompt }],
    }),
    muteHttpExceptions: true,
  });
  return JSON.parse(res.getContentText()).content[0].text.trim();
}

How it works

answer first checks the question is not blank — an empty string gets a friendly nudge instead of a pointless API call.
readSheet loads the whole knowledge base and turns each row into an object keyed by the header, so a row reads like { topic: 'Refunds', detail: '...' }.
Retrieval is deliberately simple: the question is split into words, and any row whose serialised JSON contains one of those words is kept. It is crude, but for a few hundred rows it is fast and good enough.
The matches are capped at MAX_CONTEXT_ROWS so the prompt never balloons, and if nothing matched at all the script returns early.
The prompt hands Claude the matched rows as the only permitted source and instructs it to admit when the data does not cover the question.
Claude Sonnet reads the context and returns a grounded answer — one that cites your data rather than its training.

Example run

Say the knowledge Sheet holds rows like these:

topic	detail
Refund window	Northwind refunds within 30 days of delivery, no questions asked.
Rush delivery	Rush orders ship next business day for a 15% surcharge.
Warranty	Hardware carries a 2-year warranty; software is sold as-is.

Calling answer('how long do I have to return something?') retrieves the “Refund window” row and returns:

Northwind accepts returns within 30 days of delivery, with no questions asked.

Calling answer('do you ship internationally?') retrieves nothing, so it returns “No rows in the knowledge base mention that — needs a human.” — which is exactly the behaviour you want from a grounded assistant.

Run it

This is an on-demand function — call it whenever you have a question:

In the Apps Script editor, open the function and call answer('your question here') from a wrapper, or paste the question into a test run.
Approve the authorisation prompt the first time.
Read the returned string in the execution log.

To make it usable from a Sheet, wrap it as a custom function so colleagues can type =ASK("when do refunds expire?") in a cell. Keep in mind custom functions cannot use services that need authorisation beyond the spreadsheet, so test the plain answer first.

Watch out for

Keyword retrieval is literal. “Return” will not match a row that only says “refund”, so a question and its facts can miss each other. If recall matters, add synonyms to your rows or move to embeddings-based retrieval.
The whole knowledge base is read on every call. That is fine for hundreds of rows; for thousands, cache the parsed data or narrow the search first.
Grounding depends on the prompt holding. Claude is told to use only the supplied data — keep that instruction, and never loosen it to “be helpful”, or it will start filling gaps from memory.
Stale rows produce confident wrong answers. The script trusts the Sheet completely, so a knowledge base is only as good as its last edit.

Build retrieval-augmented Q&A over your data

What you’ll need

The script

How it works

Example run

Run it

Watch out for

Related

Generate and test email subject lines

Build an AI weekly-report narrator

Build a multi-step AI agent workflow

Adapt marketing copy per region

Auto-write CRM notes from call summaries