appscript.dev
Automation Advanced Docs Sheets Drive

Collect highlights and quotes into a research sheet

Gather highlighted passages from a folder of Docs into a Quotes sheet.

Published Dec 28, 2025

When Northwind’s research team reads source material, they highlight the lines worth keeping in yellow as they go. By the end of a project that signal is scattered across a dozen Docs, and pulling it together for the research deck means scrolling each one and copying every highlighted line out by hand.

This script does that collection pass for you. It walks every Doc in a folder, checks each paragraph’s background colour, and copies the ones highlighted in your chosen colour into a Quotes sheet — each with the source Doc name and a link back. What was a tedious copy-and-paste job becomes a single run, and the sheet is ready to drop straight into the deck.

What you’ll need

  • A Drive folder of Google Docs with highlighted passages, and its folder ID.
  • A Google Sheet to collect the quotes into, and its sheet ID. The script clears the first tab, so use a dedicated sheet.
  • Highlights applied as a paragraph background colour. The default target is yellow (#FFFF00); pass a different hex code to collect another colour.

The script

// The Drive folder of Docs to scan for highlights.
const FOLDER_ID = '1abcResearchFolderId';

// The Sheet the collected quotes are written into.
const QUOTES_SHEET_ID = '1abcQuotesSheetId';

// The default highlight colour to collect, as an uppercase hex code.
const DEFAULT_COLOUR = '#FFFF00';

// The header row written above the quotes.
const HEADERS = ['Doc', 'Quote', 'Link'];

/**
 * Collects every highlighted paragraph from a folder of Docs into a sheet.
 *
 * @param {string} folderId The folder of Docs to scan.
 * @param {string} sheetId The Sheet to write the quotes into.
 * @param {string} colour The highlight colour to collect, as a hex code.
 */
function gatherHighlights(folderId, sheetId, colour = DEFAULT_COLOUR) {
  // 1. Normalise the target colour so the comparison is case-insensitive.
  const target = colour.toUpperCase();

  // 2. Open the folder and grab an iterator of its Google Docs.
  const files = DriveApp.getFolderById(folderId)
    .getFilesByType(MimeType.GOOGLE_DOCS);

  const rows = [];

  // 3. Walk every Doc in the folder.
  while (files.hasNext()) {
    const file = files.next();
    const paragraphs = DocumentApp.openById(file.getId())
      .getBody()
      .getParagraphs();

    // 4. Walk every paragraph, skipping empty lines.
    for (const p of paragraphs) {
      const text = p.getText();
      if (!text) continue;

      // 5. Read the background colour of the paragraph's first character.
      const bg = p.editAsText().getBackgroundColor(0);

      // 6. Keep the paragraph if its highlight matches the target colour.
      if (bg && bg.toUpperCase() === target) {
        rows.push([file.getName(), text, file.getUrl()]);
      }
    }
  }

  // 7. Bail out if no highlighted lines were found anywhere.
  if (!rows.length) {
    Logger.log('No ' + target + ' highlights found — nothing to collect.');
    return;
  }

  // 8. Rebuild the quotes sheet from scratch and write every quote.
  const sheet = SpreadsheetApp.openById(sheetId).getSheets()[0];
  sheet.clear();
  sheet.getRange(1, 1, 1, HEADERS.length).setValues([HEADERS]);
  sheet.getRange(2, 1, rows.length, HEADERS.length).setValues(rows);
  Logger.log('Collected ' + rows.length + ' highlighted quotes.');
}

How it works

  1. gatherHighlights uppercases the target colour up front. Apps Script can return colours in mixed case, so normalising both sides keeps the match reliable.
  2. It opens the folder and asks for an iterator filtered to Google Docs, so other file types in the folder are ignored.
  3. It loops over every Doc, opening each one and reading its paragraphs.
  4. For each paragraph it skips empty lines — a blank highlighted line is not a quote worth collecting.
  5. editAsText().getBackgroundColor(0) reads the highlight colour at the first character of the paragraph. This is a deliberate simplification: it assumes a highlighted line is highlighted from the start.
  6. If that colour matches the target, the paragraph is recorded as a row: the Doc name, the quote text, and file.getUrl() for a link back to the source.
  7. If nothing matched across the whole folder it logs and stops, rather than clearing the sheet for an empty result.
  8. It clears the first tab, writes the header row, and writes every collected quote in a single setValues call.

Example run

Say a research folder has two Docs. “Market notes” has two yellow-highlighted lines; “Interview — retailer” has one. After a run the Quotes sheet holds:

DocQuoteLink
Market notesDemand for next-day delivery rose 40% year on year.docs.google.com/…
Market notesSmaller retailers cite stock visibility as the top pain.docs.google.com/…
Interview — retailer”We’d switch supplier tomorrow for a live stock feed.”docs.google.com/…

That sheet is the raw material for the research deck — every kept passage in one place, traceable to its source.

Run it

Collecting highlights is a once-per-project job, so run it by hand when the reading is done:

  1. In the Apps Script editor, add a wrapper that passes the config values:
/**
 * Run entry point — collects yellow highlights into the quotes sheet.
 */
function gatherHighlightsNow() {
  gatherHighlights(FOLDER_ID, QUOTES_SHEET_ID);
}
  1. Select gatherHighlightsNow and click Run.
  2. Approve the authorisation prompt the first time.
  3. Open the Quotes sheet to read the result. To collect a different colour, call gatherHighlights(FOLDER_ID, QUOTES_SHEET_ID, '#00FF00').

Watch out for

  • The colour check reads character 0 only. A paragraph highlighted from the second word, or with a highlighted run mid-line, will be missed. Scan every character’s background colour if your highlights are not whole-line.
  • Highlight colours must match exactly. A “yellow” applied from a custom palette may be #FFF275 rather than #FFFF00 — copy the precise hex from the Docs highlight picker and pass it in.
  • Each Doc is opened in full. A folder of large Docs can be slow; index in batches if you approach the six-minute execution limit.
  • The script reads the folder’s direct children only. Docs in sub-folders are skipped — recurse explicitly if your research is nested.

Related