Build a document-classification system
Sort Northwind Drive files into types by their content — contracts, briefs, invoices.
Published Jan 6, 2026
Northwind’s shared Drive is a mix of contracts, creative briefs, invoices, and meeting notes, all sitting in the same folder because nobody renames files at the moment they save them. Finding “that invoice from March” means opening documents one by one. The information to sort them is right there in the text — it just needs reading.
This script reads the first few thousand characters of every Doc in a folder and asks Claude to label it as one of a fixed set of types. The label is written to the file’s Drive description, so it shows up in search and in the file details pane without changing the document itself. Run it once over a backlog, or on a schedule to keep new files tidy.
What you’ll need
- A Drive folder of Google Docs to classify, and its folder ID from the URL.
- An Anthropic API key saved as
ANTHROPIC_API_KEYin Script Properties — see Store API keys and secrets securely. - Edit access to the files, so the script can set each one’s description.
The script
// The labels a document may be classified as. "other" is the catch-all.
const TYPES = ['contract', 'brief', 'invoice', 'meeting-notes', 'other'];
// How many characters of each document to send to Claude. Enough to
// classify reliably without spending tokens on the whole file.
const SAMPLE_CHARS = 3000;
/**
* Classifies every Google Doc in a folder and records the label in
* each file's Drive description.
* @param {string} folderId The folder of Docs to classify.
*/
function classifyFolderDocs(folderId) {
const files = DriveApp.getFolderById(folderId).getFiles();
let classified = 0;
while (files.hasNext()) {
const f = files.next();
// Skip anything that is not a Google Doc — Sheets, PDFs, images.
if (f.getMimeType() !== MimeType.GOOGLE_DOCS) continue;
// Read the opening of the document — usually enough to tell the type.
const text = DocumentApp.openById(f.getId())
.getBody()
.getText()
.slice(0, SAMPLE_CHARS);
// Ask Claude for a single label from the fixed list.
const label = callClaude(
'Classify this Northwind document as one of: ' + TYPES.join(', ') +
'. Return only the label.\n\n' + text
);
// Record the label in the file's description so search can find it.
f.setDescription('type: ' + label);
classified++;
}
Logger.log(`Classified ${classified} document(s).`);
}
/**
* Minimal Anthropic API call. The key lives in Script Properties — it
* is never pasted into the code.
* @param {string} prompt The prompt to send.
* @return {string} Claude's reply, trimmed.
*/
function callClaude(prompt) {
const key = PropertiesService.getScriptProperties()
.getProperty('ANTHROPIC_API_KEY');
const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
method: 'post',
contentType: 'application/json',
headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
payload: JSON.stringify({
model: 'claude-haiku-4-5-20251001',
max_tokens: 20,
messages: [{ role: 'user', content: prompt }],
}),
muteHttpExceptions: true,
});
return JSON.parse(res.getContentText()).content[0].text.trim();
}
How it works
classifyFolderDocsopens the target folder and iterates over every file in it.- It skips anything that is not a Google Doc, checking the MIME type — so spreadsheets, PDFs, and images in the same folder are left alone.
- For each Doc it reads the body text and takes the first
SAMPLE_CHARScharacters. The opening of a document almost always reveals its type, and capping the sample keeps the token cost low. - It calls
callClaudewith a prompt that lists the allowedTYPESand asks for the label only — a tight prompt keeps the reply to a single word. - It writes
type: <label>into the file’s Drive description. The document itself is untouched; the label lives in metadata. callClaudeposts the prompt to the Anthropic API using the key from Script Properties.max_tokensis just 20 because the reply is a single label.
Example run
A folder holds four Docs with these opening lines:
| File | First line |
|---|---|
| Acme MSA | ”This Master Services Agreement is entered into…” |
| Q1 campaign | ”Creative brief: spring product launch, target audience…” |
| INV-2031 | ”Invoice 2031 — Northwind Studio — amount due…” |
| Standup 12 Mar | ”Attendees: Sam, Dana, Lee. Action items…” |
After a run, each file’s Drive description is set:
| File | Description |
|---|---|
| Acme MSA | type: contract |
| Q1 campaign | type: brief |
| INV-2031 | type: invoice |
| Standup 12 Mar | type: meeting-notes |
Searching Drive for type: invoice now turns up INV-2031 instantly.
Run it
Run it by hand to classify a backlog:
- In the Apps Script editor, call
classifyFolderDocswith your folder ID. - Approve the authorisation prompt the first time.
- Check a few files’ descriptions to confirm the labels look right.
To keep new files tidy, add a Time-driven trigger on classifyFolderDocs
that runs daily. Re-running reclassifies every file, which simply refreshes the
description — harmless, but see the quota note below.
Watch out for
- Re-running classifies every file again, including ones already labelled. For
a large folder, skip files whose description already starts with
type:to save API calls. - Each file is one API call. A few hundred Docs is fine; thousands will be slow and may hit the 6-minute Apps Script execution limit — process in batches with a continuation trigger.
- Claude can return a label outside
TYPESif the prompt drifts. Validate the reply against theTYPESarray and fall back tootherif it does not match. - Only Google Docs are read. To classify PDFs you would need to extract their text first, which is a different job.
setDescriptionneeds edit access. Files you can only view will throw — wrap the call in atry/catchif the folder is mixed-permission.
Related
Build an AI keyword-clustering tool
Group Northwind's tracked search terms into topic clusters — for SEO content planning.
Updated Feb 19, 2026
Build an AI customer-churn predictor
Flag at-risk Northwind accounts from behavioural signals — usage, support tickets, billing.
Updated Feb 15, 2026
Build a context-aware AI data validator
Catch values that look wrong in context — '£10' for a Northwind retainer is suspicious.
Updated Feb 7, 2026
Auto-categorize a photo library
Tag Northwind Drive images by visual content — product, team, event, behind-the-scenes.
Updated Feb 3, 2026
Build an AI bug-triage system
Categorise and prioritise Northwind's reported issues automatically — type, severity, owner.
Updated Jan 22, 2026