appscript.dev
Automation Advanced Drive

Build a document-classification system

Sort Northwind Drive files into types by their content — contracts, briefs, invoices.

Published Jan 6, 2026

Northwind’s shared Drive is a mix of contracts, creative briefs, invoices, and meeting notes, all sitting in the same folder because nobody renames files at the moment they save them. Finding “that invoice from March” means opening documents one by one. The information to sort them is right there in the text — it just needs reading.

This script reads the first few thousand characters of every Doc in a folder and asks Claude to label it as one of a fixed set of types. The label is written to the file’s Drive description, so it shows up in search and in the file details pane without changing the document itself. Run it once over a backlog, or on a schedule to keep new files tidy.

What you’ll need

  • A Drive folder of Google Docs to classify, and its folder ID from the URL.
  • An Anthropic API key saved as ANTHROPIC_API_KEY in Script Properties — see Store API keys and secrets securely.
  • Edit access to the files, so the script can set each one’s description.

The script

// The labels a document may be classified as. "other" is the catch-all.
const TYPES = ['contract', 'brief', 'invoice', 'meeting-notes', 'other'];

// How many characters of each document to send to Claude. Enough to
// classify reliably without spending tokens on the whole file.
const SAMPLE_CHARS = 3000;

/**
 * Classifies every Google Doc in a folder and records the label in
 * each file's Drive description.
 * @param {string} folderId The folder of Docs to classify.
 */
function classifyFolderDocs(folderId) {
  const files = DriveApp.getFolderById(folderId).getFiles();
  let classified = 0;

  while (files.hasNext()) {
    const f = files.next();

    // Skip anything that is not a Google Doc — Sheets, PDFs, images.
    if (f.getMimeType() !== MimeType.GOOGLE_DOCS) continue;

    // Read the opening of the document — usually enough to tell the type.
    const text = DocumentApp.openById(f.getId())
      .getBody()
      .getText()
      .slice(0, SAMPLE_CHARS);

    // Ask Claude for a single label from the fixed list.
    const label = callClaude(
      'Classify this Northwind document as one of: ' + TYPES.join(', ') +
      '. Return only the label.\n\n' + text
    );

    // Record the label in the file's description so search can find it.
    f.setDescription('type: ' + label);
    classified++;
  }

  Logger.log(`Classified ${classified} document(s).`);
}

/**
 * Minimal Anthropic API call. The key lives in Script Properties — it
 * is never pasted into the code.
 * @param {string} prompt The prompt to send.
 * @return {string} Claude's reply, trimmed.
 */
function callClaude(prompt) {
  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');
  const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model: 'claude-haiku-4-5-20251001',
      max_tokens: 20,
      messages: [{ role: 'user', content: prompt }],
    }),
    muteHttpExceptions: true,
  });
  return JSON.parse(res.getContentText()).content[0].text.trim();
}

How it works

  1. classifyFolderDocs opens the target folder and iterates over every file in it.
  2. It skips anything that is not a Google Doc, checking the MIME type — so spreadsheets, PDFs, and images in the same folder are left alone.
  3. For each Doc it reads the body text and takes the first SAMPLE_CHARS characters. The opening of a document almost always reveals its type, and capping the sample keeps the token cost low.
  4. It calls callClaude with a prompt that lists the allowed TYPES and asks for the label only — a tight prompt keeps the reply to a single word.
  5. It writes type: <label> into the file’s Drive description. The document itself is untouched; the label lives in metadata.
  6. callClaude posts the prompt to the Anthropic API using the key from Script Properties. max_tokens is just 20 because the reply is a single label.

Example run

A folder holds four Docs with these opening lines:

FileFirst line
Acme MSA”This Master Services Agreement is entered into…”
Q1 campaign”Creative brief: spring product launch, target audience…”
INV-2031”Invoice 2031 — Northwind Studio — amount due…”
Standup 12 Mar”Attendees: Sam, Dana, Lee. Action items…”

After a run, each file’s Drive description is set:

FileDescription
Acme MSAtype: contract
Q1 campaigntype: brief
INV-2031type: invoice
Standup 12 Martype: meeting-notes

Searching Drive for type: invoice now turns up INV-2031 instantly.

Run it

Run it by hand to classify a backlog:

  1. In the Apps Script editor, call classifyFolderDocs with your folder ID.
  2. Approve the authorisation prompt the first time.
  3. Check a few files’ descriptions to confirm the labels look right.

To keep new files tidy, add a Time-driven trigger on classifyFolderDocs that runs daily. Re-running reclassifies every file, which simply refreshes the description — harmless, but see the quota note below.

Watch out for

  • Re-running classifies every file again, including ones already labelled. For a large folder, skip files whose description already starts with type: to save API calls.
  • Each file is one API call. A few hundred Docs is fine; thousands will be slow and may hit the 6-minute Apps Script execution limit — process in batches with a continuation trigger.
  • Claude can return a label outside TYPES if the prompt drifts. Validate the reply against the TYPES array and fall back to other if it does not match.
  • Only Google Docs are read. To classify PDFs you would need to extract their text first, which is a different job.
  • setDescription needs edit access. Files you can only view will throw — wrap the call in a try/catch if the folder is mixed-permission.

Related