Scrape a web table into cells with one formula
Pull HTML tables into Sheets as a custom function — no IMPORTHTML quirks.
Published Jul 30, 2025
Northwind’s pricing team tracks a handful of competitor and supplier pages that
publish their numbers in a plain HTML table. The built-in IMPORTHTML formula
can read those tables, but it is fragile: it counts tables by position, breaks
the moment a site adds a stray layout table, and gives you no control over how
the cells are cleaned.
A custom function fixes that. FETCH_TABLE fetches the page, finds the table
you ask for, strips the HTML out of every cell, and spills the result into the
grid — the same shape IMPORTHTML returns, but as code you own and can adjust
when a site changes.
What you’ll need
- A Google Sheet where you want the scraped table to land.
- The URL of a public page that contains the table — no login, no JavaScript rendering required (see “Watch out for”).
- The index of the table on that page, counting from
0. Most pages have the table you want at index0; if not, try1,2, and so on.
The script
// How long to keep a fetched page in cache, in seconds. Six hours keeps
// re-calculations off the network without the data going badly stale.
const FETCH_TABLE_CACHE_SECONDS = 6 * 60 * 60;
/**
* Fetches an HTML table from a web page and spills it into the grid.
* Returns the same row/column shape as IMPORTHTML, so it can be dropped
* straight into a cell.
*
* @param {string} url The page to fetch.
* @param {number} index Which table on the page to read, counting from 0.
* @return {string[][]} The table cells, or an error marker.
* @customfunction
*/
function FETCH_TABLE(url, index = 0) {
// 1. Bail out early if no URL was passed.
if (!url) return '';
// 2. Reuse a recently fetched page if we have one. Custom functions are
// called once per cell, so caching avoids re-fetching on every edit.
const cache = CacheService.getScriptCache();
const cacheKey = 'table:' + url;
let html = cache.get(cacheKey);
// 3. Fetch the page only on a cache miss.
if (!html) {
html = UrlFetchApp
.fetch(url, { muteHttpExceptions: true })
.getContentText();
cache.put(cacheKey, html, FETCH_TABLE_CACHE_SECONDS);
}
// 4. Pull out every <table> block on the page.
const tables = html.match(/<table[\s\S]*?<\/table>/g) || [];
if (!tables[index]) return '#NO_TABLE';
// 5. Split the chosen table into rows, and each row into cleaned cells.
const rows = (tables[index].match(/<tr[\s\S]*?<\/tr>/g) || []).map((tr) =>
(tr.match(/<t[dh][^>]*>([\s\S]*?)<\/t[dh]>/g) || []).map((cell) =>
cleanCell(cell)
)
);
// 6. Drop fully empty rows (spacer rows some sites add for layout).
return rows.filter((row) => row.some((cell) => cell !== ''));
}
/**
* Strips the HTML out of a single table cell and tidies the whitespace.
* Keeps the text content, throws away tags, and decodes the few entities
* that show up most often in tables.
*
* @param {string} cellHtml The raw <td> or <th> HTML.
* @return {string} The plain-text cell value.
*/
function cleanCell(cellHtml) {
return cellHtml
.replace(/<[^>]+>/g, '') // remove every tag
.replace(/ /g, ' ') // decode the common entities
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/\s+/g, ' ') // collapse runs of whitespace
.trim();
}
How it works
FETCH_TABLEreturns an empty string straight away if the cell passes no URL, so a half-typed formula never throws.- It checks
CacheServicefor a copy of the page. A custom function is re-evaluated on every recalculation, so without a cache a busy sheet would hammer the target site. - On a cache miss it fetches the page with
UrlFetchAppand stores the HTML for six hours (FETCH_TABLE_CACHE_SECONDS). - A regular expression pulls out every
<table>...</table>block. If the requestedindexdoes not exist, the function returns#NO_TABLEso you can see the problem in the cell. - The chosen table is split into
<tr>rows, and each row into<td>/<th>cells.cleanCellstrips the tags, decodes common entities, and collapses whitespace. - Fully empty rows — spacer rows some sites add for layout — are dropped, and the remaining grid is returned. Sheets spills it across the cells below.
Example run
Say https://example.com/prices contains this table:
<table>
<tr><th>SKU</th><th>Product</th><th>Price</th></tr>
<tr><td>NW-1001</td><td>Oak desk</td><td>£420</td></tr>
<tr><td>NW-1002</td><td>Steel shelf</td><td>£180</td></tr>
</table>
Typing =FETCH_TABLE("https://example.com/prices", 0) into A1 spills:
| SKU | Product | Price |
|---|---|---|
| NW-1001 | Oak desk | £420 |
| NW-1002 | Steel shelf | £180 |
Use it
Type the formula into the top-left cell where you want the table to appear:
=FETCH_TABLE("https://example.com/prices", 0)
The result spills down and to the right automatically, so leave those cells empty. Custom functions cache aggressively and Sheets will not re-fetch unless the formula text changes. To force an hourly refresh, fold the current hour into the URL so the formula changes once an hour:
=FETCH_TABLE("https://example.com/prices?v=" & TEXT(NOW(), "HH"), 0)
Watch out for
- This reads the raw HTML the server sends. Pages that build their table with
JavaScript in the browser will look empty to
UrlFetchAppand return#NO_TABLE. - Regex parsing of HTML is good enough for clean, simple tables. Nested tables,
cells that span rows or columns (
rowspan/colspan), and unusual markup will not line up correctly — inspect the result before trusting it. - The six-hour cache means a freshly published change can take up to six hours to appear unless you use the hourly-refresh trick above.
- Scraping a site without permission may breach its terms of service, and a layout change there can break this function without warning. Where a site offers a proper data feed, use it instead — see Sync invoices to QuickBooks or Xero for an example of working against a real API.
Related
Parse messy mixed-format dates
Normalise inconsistently formatted strings into real date values with a single formula.
Updated Aug 2, 2025
Mask sensitive columns for shareable copies
Redact PII with a custom function so you can share a copy of the sheet without exposing names, emails, or numbers.
Updated Jul 26, 2025
Build a sentiment-scoring function without AI
Rate text positive or negative with a tiny built-in lexicon — no API key, no quota.
Updated Jul 23, 2025
Build a unit-conversion function library
Convert between any units with one custom formula — kg/lb, km/mi, °C/°F, and the rest.
Updated Jul 19, 2025
Create a readability-scoring function
Rate text columns for reading difficulty with a Flesch reading-ease score in one formula.
Updated Jul 16, 2025