Fetch link previews and metadata

Northwind’s content team keeps a spreadsheet of links — articles to cite, competitor pages to watch, resources to share in the newsletter. A bare URL tells you nothing, so someone ends up opening each one, copying the headline and pasting it back into the next column. For a list of forty links that is an hour of tab-juggling.

This is a custom spreadsheet function. Put =LINK_PREVIEW(A2) next to a URL and it fetches the page, scrapes the title, meta description and Open Graph image, and spills them across three cells — the same preview card a chat app or social network would generate, but inside the sheet.

What you’ll need

A Google Sheet with URLs in a column.
Nothing to install. LINK_PREVIEW is a custom function in the bound script, called like any built-in formula.
Public, fetchable pages. Links behind a login or a bot wall will not return usable HTML — see “Watch out for”.

The script

// Cap on how much of each page we download. og: and <title> tags live in
// the <head>, so the first slice of HTML is all we need.
const PREVIEW_FETCH_OPTIONS = {
  muteHttpExceptions: true,
  followRedirects: true,
};

/**
 * Pulls the title, description and Open Graph image for a web page.
 * Use it in a cell as =LINK_PREVIEW(A2).
 *
 * @param {string} url  The page URL to preview.
 * @returns {string[][]} A three-cell row: [title, description, image].
 * @customfunction
 */
function LINK_PREVIEW(url) {
  // Blank cell — return empty so the formula does not error when filled down.
  if (!url) return '';

  try {
    // 1. Fetch the page. muteHttpExceptions stops a 404 throwing.
    const response = UrlFetchApp.fetch(url, PREVIEW_FETCH_OPTIONS);

    // 2. A non-2xx page has no useful metadata — bail out cleanly.
    const code = response.getResponseCode();
    if (code < 200 || code >= 300) return [['', '', '']];

    const html = response.getContentText();

    // 3. Scrape the three fields. Each helper returns '' if not found.
    const title = extractTitle(html);
    const description = extractMeta(html, 'description');
    const image = extractOpenGraph(html, 'og:image');

    // Returned as a row so the values spill into three columns.
    return [[title, description, image]];
  } catch (e) {
    // Network failure, bad URL, timeout — return blanks, never an error.
    return [['', '', '']];
  }
}

/**
 * Pulls the text inside the <title> tag.
 *
 * @param {string} html  The page HTML.
 * @returns {string}     The title, or '' if there is none.
 */
function extractTitle(html) {
  const match = html.match(/<title[^>]*>([^<]*)<\/title>/i);
  return match ? decodeEntities(match[1].trim()) : '';
}

/**
 * Pulls a named <meta name="..."> tag's content, tolerating either
 * attribute order (name-before-content or content-before-name).
 *
 * @param {string} html  The page HTML.
 * @param {string} name  The meta name to look for, e.g. 'description'.
 * @returns {string}     The content, or '' if not present.
 */
function extractMeta(html, name) {
  const a = html.match(
    new RegExp('<meta[^>]+name=["\']' + name + '["\'][^>]+content=["\']([^"\']*)', 'i')
  );
  const b = html.match(
    new RegExp('<meta[^>]+content=["\']([^"\']*)["\'][^>]+name=["\']' + name + '["\']', 'i')
  );
  const match = a || b;
  return match ? decodeEntities(match[1].trim()) : '';
}

/**
 * Pulls an Open Graph <meta property="og:..."> tag's content.
 *
 * @param {string} html      The page HTML.
 * @param {string} property  The og property, e.g. 'og:image'.
 * @returns {string}         The content, or '' if not present.
 */
function extractOpenGraph(html, property) {
  const match = html.match(
    new RegExp('<meta[^>]+property=["\']' + property + '["\'][^>]+content=["\']([^"\']*)', 'i')
  );
  return match ? match[1].trim() : '';
}

/**
 * Turns the handful of HTML entities that show up in titles back into
 * plain characters, so "Tips &amp; tricks" reads as "Tips & tricks".
 *
 * @param {string} text  Raw scraped text.
 * @returns {string}     The decoded text.
 */
function decodeEntities(text) {
  return text
    .replace(/&amp;/g, '&')
    .replace(/&lt;/g, '<')
    .replace(/&gt;/g, '>')
    .replace(/&quot;/g, '"')
    .replace(/&#39;/g, "'");
}

How it works

LINK_PREVIEW is tagged @customfunction, so it appears in formula autocomplete and runs once per cell.
A blank input returns an empty string — needed when the formula is filled past the last URL.
It fetches the page with muteHttpExceptions on, so a 404 or 500 returns a response object instead of throwing. A non-2xx page returns three blanks.
Three small helpers scrape the HTML with regular expressions: extractTitle reads the <title> tag, extractMeta reads <meta name="description"> (tolerating either attribute order), and extractOpenGraph reads <meta property="og:image">.
decodeEntities converts the common HTML entities back to plain characters, so a title scraped as Tips & tricks displays as Tips & tricks.
The three values are returned as one row ([[title, description, image]]) so they spill into three columns. Any failure — bad URL, timeout, network error — is caught and returns blanks rather than an error cell.

Example run

With URLs in column A, put =LINK_PREVIEW(A2) in B2 and fill it down:

A (URL)	B (Title)	C (Description)	D (Image)
example.com/guide	The complete guide to widgets	Everything you need to know about widgets.	example.com/og/guide.png
example.com/missing
example.com/blog/post	Five lessons from a year of shipping	What a year of weekly releases taught us.	example.com/og/post.jpg

Row 3’s page returned a 404, so its cells stay blank rather than breaking the column.

Use it

LINK_PREVIEW is a spreadsheet formula. Type it into a cell:

=LINK_PREVIEW(A2)

It returns three cells — title, description and image — so leave the two columns to its right empty for the result to spill into. To preview a whole list, put the formula in B2 and double-click the fill handle to copy it down.

Watch out for

Scraping HTML with regular expressions is fast but fragile. It works on the great majority of pages, but a site with unusual markup, the title split across lines, or metadata injected by JavaScript can return blanks.
JavaScript-rendered pages are the main blind spot. UrlFetchApp fetches the raw HTML and does not run scripts, so single-page apps that build their <head> client-side will look empty.
Many sites block or cloak requests that lack a browser-like user agent. Expect some links — news sites and social networks especially — to return a consent page or a 403 instead of real content.
Custom functions have a 30-second limit per call and cannot use services that need user authorisation. A slow page can time out and return blanks.
Recalculating the sheet re-fetches every LINK_PREVIEW cell, and a column of them can fire dozens of requests at once. Once a list is enriched, paste the results as static values so a stray edit does not trigger a fresh crawl.
The function does not cache. For a list with repeated URLs, or one you refresh often, wrap the fetch in a cache so identical links are not fetched twice.

Fetch link previews and metadata

What you’ll need

The script

How it works

Example run

Use it

Watch out for

Related

Handle streaming responses from an LLM API

Cache API responses to cut quota usage

Build an API-key vault and rotation system

Build a rate-limit-aware API client

Build a generic paginated-API fetcher