appscript.dev
Automation Intermediate Sheets

Fetch link previews and metadata

Pull titles, descriptions, and og:images for a column of Northwind links.

Published Nov 4, 2025

Northwind’s content team keeps a spreadsheet of links — articles to cite, competitor pages to watch, resources to share in the newsletter. A bare URL tells you nothing, so someone ends up opening each one, copying the headline and pasting it back into the next column. For a list of forty links that is an hour of tab-juggling.

This is a custom spreadsheet function. Put =LINK_PREVIEW(A2) next to a URL and it fetches the page, scrapes the title, meta description and Open Graph image, and spills them across three cells — the same preview card a chat app or social network would generate, but inside the sheet.

What you’ll need

  • A Google Sheet with URLs in a column.
  • Nothing to install. LINK_PREVIEW is a custom function in the bound script, called like any built-in formula.
  • Public, fetchable pages. Links behind a login or a bot wall will not return usable HTML — see “Watch out for”.

The script

// Cap on how much of each page we download. og: and <title> tags live in
// the <head>, so the first slice of HTML is all we need.
const PREVIEW_FETCH_OPTIONS = {
  muteHttpExceptions: true,
  followRedirects: true,
};

/**
 * Pulls the title, description and Open Graph image for a web page.
 * Use it in a cell as =LINK_PREVIEW(A2).
 *
 * @param {string} url  The page URL to preview.
 * @returns {string[][]} A three-cell row: [title, description, image].
 * @customfunction
 */
function LINK_PREVIEW(url) {
  // Blank cell — return empty so the formula does not error when filled down.
  if (!url) return '';

  try {
    // 1. Fetch the page. muteHttpExceptions stops a 404 throwing.
    const response = UrlFetchApp.fetch(url, PREVIEW_FETCH_OPTIONS);

    // 2. A non-2xx page has no useful metadata — bail out cleanly.
    const code = response.getResponseCode();
    if (code < 200 || code >= 300) return [['', '', '']];

    const html = response.getContentText();

    // 3. Scrape the three fields. Each helper returns '' if not found.
    const title = extractTitle(html);
    const description = extractMeta(html, 'description');
    const image = extractOpenGraph(html, 'og:image');

    // Returned as a row so the values spill into three columns.
    return [[title, description, image]];
  } catch (e) {
    // Network failure, bad URL, timeout — return blanks, never an error.
    return [['', '', '']];
  }
}

/**
 * Pulls the text inside the <title> tag.
 *
 * @param {string} html  The page HTML.
 * @returns {string}     The title, or '' if there is none.
 */
function extractTitle(html) {
  const match = html.match(/<title[^>]*>([^<]*)<\/title>/i);
  return match ? decodeEntities(match[1].trim()) : '';
}

/**
 * Pulls a named <meta name="..."> tag's content, tolerating either
 * attribute order (name-before-content or content-before-name).
 *
 * @param {string} html  The page HTML.
 * @param {string} name  The meta name to look for, e.g. 'description'.
 * @returns {string}     The content, or '' if not present.
 */
function extractMeta(html, name) {
  const a = html.match(
    new RegExp('<meta[^>]+name=["\']' + name + '["\'][^>]+content=["\']([^"\']*)', 'i')
  );
  const b = html.match(
    new RegExp('<meta[^>]+content=["\']([^"\']*)["\'][^>]+name=["\']' + name + '["\']', 'i')
  );
  const match = a || b;
  return match ? decodeEntities(match[1].trim()) : '';
}

/**
 * Pulls an Open Graph <meta property="og:..."> tag's content.
 *
 * @param {string} html      The page HTML.
 * @param {string} property  The og property, e.g. 'og:image'.
 * @returns {string}         The content, or '' if not present.
 */
function extractOpenGraph(html, property) {
  const match = html.match(
    new RegExp('<meta[^>]+property=["\']' + property + '["\'][^>]+content=["\']([^"\']*)', 'i')
  );
  return match ? match[1].trim() : '';
}

/**
 * Turns the handful of HTML entities that show up in titles back into
 * plain characters, so "Tips &amp; tricks" reads as "Tips & tricks".
 *
 * @param {string} text  Raw scraped text.
 * @returns {string}     The decoded text.
 */
function decodeEntities(text) {
  return text
    .replace(/&amp;/g, '&')
    .replace(/&lt;/g, '<')
    .replace(/&gt;/g, '>')
    .replace(/&quot;/g, '"')
    .replace(/&#39;/g, "'");
}

How it works

  1. LINK_PREVIEW is tagged @customfunction, so it appears in formula autocomplete and runs once per cell.
  2. A blank input returns an empty string — needed when the formula is filled past the last URL.
  3. It fetches the page with muteHttpExceptions on, so a 404 or 500 returns a response object instead of throwing. A non-2xx page returns three blanks.
  4. Three small helpers scrape the HTML with regular expressions: extractTitle reads the <title> tag, extractMeta reads <meta name="description"> (tolerating either attribute order), and extractOpenGraph reads <meta property="og:image">.
  5. decodeEntities converts the common HTML entities back to plain characters, so a title scraped as Tips &amp; tricks displays as Tips & tricks.
  6. The three values are returned as one row ([[title, description, image]]) so they spill into three columns. Any failure — bad URL, timeout, network error — is caught and returns blanks rather than an error cell.

Example run

With URLs in column A, put =LINK_PREVIEW(A2) in B2 and fill it down:

A (URL)B (Title)C (Description)D (Image)
example.com/guideThe complete guide to widgetsEverything you need to know about widgets.example.com/og/guide.png
example.com/missing
example.com/blog/postFive lessons from a year of shippingWhat a year of weekly releases taught us.example.com/og/post.jpg

Row 3’s page returned a 404, so its cells stay blank rather than breaking the column.

Use it

LINK_PREVIEW is a spreadsheet formula. Type it into a cell:

=LINK_PREVIEW(A2)

It returns three cells — title, description and image — so leave the two columns to its right empty for the result to spill into. To preview a whole list, put the formula in B2 and double-click the fill handle to copy it down.

Watch out for

  • Scraping HTML with regular expressions is fast but fragile. It works on the great majority of pages, but a site with unusual markup, the title split across lines, or metadata injected by JavaScript can return blanks.
  • JavaScript-rendered pages are the main blind spot. UrlFetchApp fetches the raw HTML and does not run scripts, so single-page apps that build their <head> client-side will look empty.
  • Many sites block or cloak requests that lack a browser-like user agent. Expect some links — news sites and social networks especially — to return a consent page or a 403 instead of real content.
  • Custom functions have a 30-second limit per call and cannot use services that need user authorisation. A slow page can time out and return blanks.
  • Recalculating the sheet re-fetches every LINK_PREVIEW cell, and a column of them can fire dozens of requests at once. Once a list is enriched, paste the results as static values so a stray edit does not trigger a fresh crawl.
  • The function does not cache. For a list with repeated URLs, or one you refresh often, wrap the fetch in a cache so identical links are not fetched twice.

Related