appscript.dev
Automation Advanced Docs

Extract entities and relationships from text

Build a structured graph from Northwind prose — people, companies, and how they connect.

Published Sep 24, 2025

A Northwind research memo, a meeting write-up, or a due-diligence note is full of relationships — who works for whom, which company acquired which, who introduced two parties. That structure is obvious to a human reading the prose and invisible to anything else. To plot it, search it, or feed it to another tool, the relationships have to come out of the sentences and into a table.

This script reads a Doc, asks Claude to extract the entities (people and companies) and the edges between them as strict JSON, and writes both to a spreadsheet — an Entities tab and a Relationships tab. That is the raw material for a graph diagram, a lookup, or a simple knowledge base.

What you’ll need

  • A Google Doc containing the prose you want to map — a memo, report, or set of notes.
  • A Google Sheet the script writes to — it creates the Entities and Relationships tabs itself.
  • An Anthropic API key saved as ANTHROPIC_API_KEY in Script Properties — see Store API keys and secrets securely.

The script

// The Doc to read and the Sheet to write the graph into.
const SOURCE_DOC_ID = '1abcSourceDocId';
const GRAPH_SHEET_ID = '1abcGraphSheetId';

// How much of the Doc to send in one pass — keeps the prompt within a
// sensible token budget. See "Watch out for".
const CONTEXT_CHARS = 6000;

/**
 * Reads the source Doc, asks Claude to extract a graph of entities and
 * relationships, and writes it to two tabs in the graph spreadsheet.
 */
function buildGraph() {
  // 1. Extract the graph as structured objects.
  const graph = extractGraph(SOURCE_DOC_ID);
  if (!graph || !graph.entities || !graph.entities.length) {
    Logger.log('No entities found — nothing to write.');
    return;
  }

  const ss = SpreadsheetApp.openById(GRAPH_SHEET_ID);

  // 2. Rebuild the Entities tab: one row per person or company.
  const entitiesTab = ss.getSheetByName('Entities')
    || ss.insertSheet('Entities');
  entitiesTab.clear();
  entitiesTab.getRange(1, 1, 1, 2).setValues([['Name', 'Type']]);
  const entityRows = graph.entities.map((e) => [e.name, e.type]);
  entitiesTab.getRange(2, 1, entityRows.length, 2).setValues(entityRows);

  // 3. Rebuild the Relationships tab: one row per edge.
  const edgesTab = ss.getSheetByName('Relationships')
    || ss.insertSheet('Relationships');
  edgesTab.clear();
  edgesTab.getRange(1, 1, 1, 3).setValues([['From', 'Relation', 'To']]);
  const edges = graph.edges || [];
  if (edges.length) {
    const edgeRows = edges.map((x) => [x.from, x.relation, x.to]);
    edgesTab.getRange(2, 1, edgeRows.length, 3).setValues(edgeRows);
  }
  Logger.log('Wrote ' + entityRows.length + ' entities and '
    + edges.length + ' relationships.');
}

/**
 * Asks Claude to extract entities and relationships from a Doc, pinned
 * to a strict JSON schema so the result is always parseable.
 */
function extractGraph(docId) {
  const text = DocumentApp.openById(docId).getBody()
    .getText().slice(0, CONTEXT_CHARS);
  if (!text.trim()) {
    Logger.log('Source Doc is empty — nothing to extract.');
    return null;
  }

  // A fixed schema is the difference between a parseable result and a
  // parsing nightmare.
  const prompt =
    'Extract entities and relationships from this Northwind text as JSON. ' +
    'Return ONLY JSON — no prose, no markdown — in this shape: ' +
    '{"entities":[{"name":string,"type":"person"|"company"}],' +
    '"edges":[{"from":string,"to":string,"relation":string}]}. ' +
    'Use exact names as written in the text.\n\n' + text;
  return JSON.parse(stripFences(callClaude(prompt, 'claude-sonnet-4-6', 1500)));
}

/**
 * Claude occasionally wraps JSON in a ```json code fence. Strip it so
 * JSON.parse never chokes on the markdown.
 */
function stripFences(textValue) {
  return textValue.replace(/```(?:json)?/g, '').trim();
}

/**
 * Minimal Anthropic API call. The key lives in Script Properties — it
 * is never pasted into the code.
 */
function callClaude(prompt, model = 'claude-haiku-4-5-20251001', maxTokens = 400) {
  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');
  const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model,
      max_tokens: maxTokens,
      messages: [{ role: 'user', content: prompt }],
    }),
    muteHttpExceptions: true,
  });
  return JSON.parse(res.getContentText()).content[0].text.trim();
}

How it works

  1. buildGraph calls extractGraph, which reads the first CONTEXT_CHARS characters of the source Doc.
  2. If the Doc is empty, extractGraph logs a message and returns null, and buildGraph stops — no wasted API call.
  3. extractGraph builds a prompt that pins the output to a strict JSON schema: an entities array and an edges array, with type constrained to person or company.
  4. It calls Claude Sonnet — graph extraction needs reasoning over the whole passage, not just pattern matching. stripFences removes any code fence, then JSON.parse turns the reply into real objects.
  5. buildGraph rebuilds the Entities tab (name and type) and the Relationships tab (from, relation, to) from scratch, so each run reflects the latest Doc.

Example run

Say the source Doc contains this paragraph:

Maria Reyes joined Northwind in 2021 from Acme Corp, where she had worked alongside Tom Hale. Tom now runs partnerships at Globex, which acquired Initech last spring.

After a run, the Entities tab holds:

NameType
Maria Reyesperson
Tom Haleperson
Northwindcompany
Acme Corpcompany
Globexcompany
Initechcompany

And the Relationships tab holds:

FromRelationTo
Maria Reyesworks atNorthwind
Maria Reyespreviously worked atAcme Corp
Tom Halepreviously worked atAcme Corp
Tom Haleruns partnerships atGlobex
GlobexacquiredInitech

That is a structured graph you can chart, search, or hand to another tool.

Run it

This is an on-demand job — run it when you have a Doc to map:

  1. Set SOURCE_DOC_ID to the Doc and GRAPH_SHEET_ID to the output Sheet.
  2. In the Apps Script editor, select buildGraph and click Run.
  3. Approve the authorisation prompt the first time.
  4. Open the Entities and Relationships tabs to see the graph.

To run it against many Docs without editing the code each time, add a custom menu that prompts for a Doc ID.

Watch out for

  • The same entity can appear under different names — “Globex” and “Globex Corp”, “Maria” and “Maria Reyes”. Claude is told to use exact names; for a clean graph you may still need a de-duplication pass afterwards.
  • Only the first CONTEXT_CHARS characters are read. A long report is truncated — raise the cap and max_tokens together, or run the script over the Doc in sections and merge the tabs.
  • Strict JSON keeps this reliable. stripFences handles the common code-fence case; if JSON.parse still throws, log the raw reply and tighten the prompt rather than reaching for regex.
  • Extraction is not the same as truth. Claude reports the relationships the text states — if the prose is wrong or speculative, the graph inherits that.

Related