Fetch link previews and metadata
Pull titles, descriptions, and og:images for a column of Northwind links.
Published Nov 4, 2025
Northwind’s content team keeps a spreadsheet of links — articles to cite, competitor pages to watch, resources to share in the newsletter. A bare URL tells you nothing, so someone ends up opening each one, copying the headline and pasting it back into the next column. For a list of forty links that is an hour of tab-juggling.
This is a custom spreadsheet function. Put =LINK_PREVIEW(A2) next to a URL and
it fetches the page, scrapes the title, meta description and Open Graph image,
and spills them across three cells — the same preview card a chat app or social
network would generate, but inside the sheet.
What you’ll need
- A Google Sheet with URLs in a column.
- Nothing to install.
LINK_PREVIEWis a custom function in the bound script, called like any built-in formula. - Public, fetchable pages. Links behind a login or a bot wall will not return usable HTML — see “Watch out for”.
The script
// Cap on how much of each page we download. og: and <title> tags live in
// the <head>, so the first slice of HTML is all we need.
const PREVIEW_FETCH_OPTIONS = {
muteHttpExceptions: true,
followRedirects: true,
};
/**
* Pulls the title, description and Open Graph image for a web page.
* Use it in a cell as =LINK_PREVIEW(A2).
*
* @param {string} url The page URL to preview.
* @returns {string[][]} A three-cell row: [title, description, image].
* @customfunction
*/
function LINK_PREVIEW(url) {
// Blank cell — return empty so the formula does not error when filled down.
if (!url) return '';
try {
// 1. Fetch the page. muteHttpExceptions stops a 404 throwing.
const response = UrlFetchApp.fetch(url, PREVIEW_FETCH_OPTIONS);
// 2. A non-2xx page has no useful metadata — bail out cleanly.
const code = response.getResponseCode();
if (code < 200 || code >= 300) return [['', '', '']];
const html = response.getContentText();
// 3. Scrape the three fields. Each helper returns '' if not found.
const title = extractTitle(html);
const description = extractMeta(html, 'description');
const image = extractOpenGraph(html, 'og:image');
// Returned as a row so the values spill into three columns.
return [[title, description, image]];
} catch (e) {
// Network failure, bad URL, timeout — return blanks, never an error.
return [['', '', '']];
}
}
/**
* Pulls the text inside the <title> tag.
*
* @param {string} html The page HTML.
* @returns {string} The title, or '' if there is none.
*/
function extractTitle(html) {
const match = html.match(/<title[^>]*>([^<]*)<\/title>/i);
return match ? decodeEntities(match[1].trim()) : '';
}
/**
* Pulls a named <meta name="..."> tag's content, tolerating either
* attribute order (name-before-content or content-before-name).
*
* @param {string} html The page HTML.
* @param {string} name The meta name to look for, e.g. 'description'.
* @returns {string} The content, or '' if not present.
*/
function extractMeta(html, name) {
const a = html.match(
new RegExp('<meta[^>]+name=["\']' + name + '["\'][^>]+content=["\']([^"\']*)', 'i')
);
const b = html.match(
new RegExp('<meta[^>]+content=["\']([^"\']*)["\'][^>]+name=["\']' + name + '["\']', 'i')
);
const match = a || b;
return match ? decodeEntities(match[1].trim()) : '';
}
/**
* Pulls an Open Graph <meta property="og:..."> tag's content.
*
* @param {string} html The page HTML.
* @param {string} property The og property, e.g. 'og:image'.
* @returns {string} The content, or '' if not present.
*/
function extractOpenGraph(html, property) {
const match = html.match(
new RegExp('<meta[^>]+property=["\']' + property + '["\'][^>]+content=["\']([^"\']*)', 'i')
);
return match ? match[1].trim() : '';
}
/**
* Turns the handful of HTML entities that show up in titles back into
* plain characters, so "Tips & tricks" reads as "Tips & tricks".
*
* @param {string} text Raw scraped text.
* @returns {string} The decoded text.
*/
function decodeEntities(text) {
return text
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, "'");
}
How it works
LINK_PREVIEWis tagged@customfunction, so it appears in formula autocomplete and runs once per cell.- A blank input returns an empty string — needed when the formula is filled past the last URL.
- It fetches the page with
muteHttpExceptionson, so a 404 or 500 returns a response object instead of throwing. A non-2xx page returns three blanks. - Three small helpers scrape the HTML with regular expressions:
extractTitlereads the<title>tag,extractMetareads<meta name="description">(tolerating either attribute order), andextractOpenGraphreads<meta property="og:image">. decodeEntitiesconverts the common HTML entities back to plain characters, so a title scraped asTips & tricksdisplays asTips & tricks.- The three values are returned as one row (
[[title, description, image]]) so they spill into three columns. Any failure — bad URL, timeout, network error — is caught and returns blanks rather than an error cell.
Example run
With URLs in column A, put =LINK_PREVIEW(A2) in B2 and fill it down:
| A (URL) | B (Title) | C (Description) | D (Image) |
|---|---|---|---|
| example.com/guide | The complete guide to widgets | Everything you need to know about widgets. | example.com/og/guide.png |
| example.com/missing | |||
| example.com/blog/post | Five lessons from a year of shipping | What a year of weekly releases taught us. | example.com/og/post.jpg |
Row 3’s page returned a 404, so its cells stay blank rather than breaking the column.
Use it
LINK_PREVIEW is a spreadsheet formula. Type it into a cell:
=LINK_PREVIEW(A2)
It returns three cells — title, description and image — so leave the two columns to its right empty for the result to spill into. To preview a whole list, put the formula in B2 and double-click the fill handle to copy it down.
Watch out for
- Scraping HTML with regular expressions is fast but fragile. It works on the great majority of pages, but a site with unusual markup, the title split across lines, or metadata injected by JavaScript can return blanks.
- JavaScript-rendered pages are the main blind spot.
UrlFetchAppfetches the raw HTML and does not run scripts, so single-page apps that build their<head>client-side will look empty. - Many sites block or cloak requests that lack a browser-like user agent. Expect some links — news sites and social networks especially — to return a consent page or a 403 instead of real content.
- Custom functions have a 30-second limit per call and cannot use services that need user authorisation. A slow page can time out and return blanks.
- Recalculating the sheet re-fetches every
LINK_PREVIEWcell, and a column of them can fire dozens of requests at once. Once a list is enriched, paste the results as static values so a stray edit does not trigger a fresh crawl. - The function does not cache. For a list with repeated URLs, or one you refresh often, wrap the fetch in a cache so identical links are not fetched twice.
Related
Handle streaming responses from an LLM API
Manage long Northwind AI outputs reliably — note: Apps Script UrlFetch is synchronous.
Updated Jan 3, 2026
Cache API responses to cut quota usage
Store and reuse Northwind API responses intelligently — sub-second hits, fewer bills.
Updated Dec 26, 2025
Build an API-key vault and rotation system
Manage Northwind credentials securely at scale — centralised storage, scheduled rotation.
Updated Dec 22, 2025
Build a rate-limit-aware API client
Back off and retry gracefully on 429s — Northwind's robust outbound HTTP pattern.
Updated Dec 14, 2025
Build a generic paginated-API fetcher
Handle cursors and pages for any large dataset — Northwind's standard pull pattern.
Updated Dec 6, 2025