Extract entities and relationships from text
Build a structured graph from Northwind prose — people, companies, and how they connect.
Published Sep 24, 2025
A Northwind research memo, a meeting write-up, or a due-diligence note is full of relationships — who works for whom, which company acquired which, who introduced two parties. That structure is obvious to a human reading the prose and invisible to anything else. To plot it, search it, or feed it to another tool, the relationships have to come out of the sentences and into a table.
This script reads a Doc, asks Claude to extract the entities (people and
companies) and the edges between them as strict JSON, and writes both to a
spreadsheet — an Entities tab and a Relationships tab. That is the raw
material for a graph diagram, a lookup, or a simple knowledge base.
What you’ll need
- A Google Doc containing the prose you want to map — a memo, report, or set of notes.
- A Google Sheet the script writes to — it creates the
EntitiesandRelationshipstabs itself. - An Anthropic API key saved as
ANTHROPIC_API_KEYin Script Properties — see Store API keys and secrets securely.
The script
// The Doc to read and the Sheet to write the graph into.
const SOURCE_DOC_ID = '1abcSourceDocId';
const GRAPH_SHEET_ID = '1abcGraphSheetId';
// How much of the Doc to send in one pass — keeps the prompt within a
// sensible token budget. See "Watch out for".
const CONTEXT_CHARS = 6000;
/**
* Reads the source Doc, asks Claude to extract a graph of entities and
* relationships, and writes it to two tabs in the graph spreadsheet.
*/
function buildGraph() {
// 1. Extract the graph as structured objects.
const graph = extractGraph(SOURCE_DOC_ID);
if (!graph || !graph.entities || !graph.entities.length) {
Logger.log('No entities found — nothing to write.');
return;
}
const ss = SpreadsheetApp.openById(GRAPH_SHEET_ID);
// 2. Rebuild the Entities tab: one row per person or company.
const entitiesTab = ss.getSheetByName('Entities')
|| ss.insertSheet('Entities');
entitiesTab.clear();
entitiesTab.getRange(1, 1, 1, 2).setValues([['Name', 'Type']]);
const entityRows = graph.entities.map((e) => [e.name, e.type]);
entitiesTab.getRange(2, 1, entityRows.length, 2).setValues(entityRows);
// 3. Rebuild the Relationships tab: one row per edge.
const edgesTab = ss.getSheetByName('Relationships')
|| ss.insertSheet('Relationships');
edgesTab.clear();
edgesTab.getRange(1, 1, 1, 3).setValues([['From', 'Relation', 'To']]);
const edges = graph.edges || [];
if (edges.length) {
const edgeRows = edges.map((x) => [x.from, x.relation, x.to]);
edgesTab.getRange(2, 1, edgeRows.length, 3).setValues(edgeRows);
}
Logger.log('Wrote ' + entityRows.length + ' entities and '
+ edges.length + ' relationships.');
}
/**
* Asks Claude to extract entities and relationships from a Doc, pinned
* to a strict JSON schema so the result is always parseable.
*/
function extractGraph(docId) {
const text = DocumentApp.openById(docId).getBody()
.getText().slice(0, CONTEXT_CHARS);
if (!text.trim()) {
Logger.log('Source Doc is empty — nothing to extract.');
return null;
}
// A fixed schema is the difference between a parseable result and a
// parsing nightmare.
const prompt =
'Extract entities and relationships from this Northwind text as JSON. ' +
'Return ONLY JSON — no prose, no markdown — in this shape: ' +
'{"entities":[{"name":string,"type":"person"|"company"}],' +
'"edges":[{"from":string,"to":string,"relation":string}]}. ' +
'Use exact names as written in the text.\n\n' + text;
return JSON.parse(stripFences(callClaude(prompt, 'claude-sonnet-4-6', 1500)));
}
/**
* Claude occasionally wraps JSON in a ```json code fence. Strip it so
* JSON.parse never chokes on the markdown.
*/
function stripFences(textValue) {
return textValue.replace(/```(?:json)?/g, '').trim();
}
/**
* Minimal Anthropic API call. The key lives in Script Properties — it
* is never pasted into the code.
*/
function callClaude(prompt, model = 'claude-haiku-4-5-20251001', maxTokens = 400) {
const key = PropertiesService.getScriptProperties()
.getProperty('ANTHROPIC_API_KEY');
const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
method: 'post',
contentType: 'application/json',
headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
payload: JSON.stringify({
model,
max_tokens: maxTokens,
messages: [{ role: 'user', content: prompt }],
}),
muteHttpExceptions: true,
});
return JSON.parse(res.getContentText()).content[0].text.trim();
}
How it works
buildGraphcallsextractGraph, which reads the firstCONTEXT_CHARScharacters of the source Doc.- If the Doc is empty,
extractGraphlogs a message and returnsnull, andbuildGraphstops — no wasted API call. extractGraphbuilds a prompt that pins the output to a strict JSON schema: anentitiesarray and anedgesarray, withtypeconstrained topersonorcompany.- It calls Claude Sonnet — graph extraction needs reasoning over the whole
passage, not just pattern matching.
stripFencesremoves any code fence, thenJSON.parseturns the reply into real objects. buildGraphrebuilds theEntitiestab (name and type) and theRelationshipstab (from, relation, to) from scratch, so each run reflects the latest Doc.
Example run
Say the source Doc contains this paragraph:
Maria Reyes joined Northwind in 2021 from Acme Corp, where she had worked alongside Tom Hale. Tom now runs partnerships at Globex, which acquired Initech last spring.
After a run, the Entities tab holds:
| Name | Type |
|---|---|
| Maria Reyes | person |
| Tom Hale | person |
| Northwind | company |
| Acme Corp | company |
| Globex | company |
| Initech | company |
And the Relationships tab holds:
| From | Relation | To |
|---|---|---|
| Maria Reyes | works at | Northwind |
| Maria Reyes | previously worked at | Acme Corp |
| Tom Hale | previously worked at | Acme Corp |
| Tom Hale | runs partnerships at | Globex |
| Globex | acquired | Initech |
That is a structured graph you can chart, search, or hand to another tool.
Run it
This is an on-demand job — run it when you have a Doc to map:
- Set
SOURCE_DOC_IDto the Doc andGRAPH_SHEET_IDto the output Sheet. - In the Apps Script editor, select
buildGraphand click Run. - Approve the authorisation prompt the first time.
- Open the
EntitiesandRelationshipstabs to see the graph.
To run it against many Docs without editing the code each time, add a custom menu that prompts for a Doc ID.
Watch out for
- The same entity can appear under different names — “Globex” and “Globex Corp”, “Maria” and “Maria Reyes”. Claude is told to use exact names; for a clean graph you may still need a de-duplication pass afterwards.
- Only the first
CONTEXT_CHARScharacters are read. A long report is truncated — raise the cap andmax_tokenstogether, or run the script over the Doc in sections and merge the tabs. - Strict JSON keeps this reliable.
stripFenceshandles the common code-fence case; ifJSON.parsestill throws, log the raw reply and tighten the prompt rather than reaching for regex. - Extraction is not the same as truth. Claude reports the relationships the text states — if the prose is wrong or speculative, the graph inherits that.
Related
Parse semi-structured listings into tables
Extract recipes, specs, or ads from Northwind content into clean spreadsheet rows.
Updated Mar 7, 2026
Extract follow-ups from call transcripts
Turn Northwind sales calls into actionable tasks — owner + task + due date per extracted item.
Updated Jan 10, 2026
Build an AI data-enrichment pipeline
Fill missing company and contact fields on Northwind's prospect list — Claude infers from the row.
Updated Nov 23, 2025
Build an AI invoice and receipt parser
Read Northwind documents into structured ledger rows — vendor, amount, line items.
Updated Oct 2, 2025
Build an AI data-cleaning assistant
Standardise messy Northwind names, addresses, and categories into clean canonical values.
Updated Sep 4, 2025