Build a Doc-to-clean-HTML exporter
Convert formatted Docs into publish-ready HTML — strip the Google CSS, keep the structure.
Published Jul 20, 2025
Northwind writes blog drafts in Google Docs because that is where the editing
and comments happen. The trouble starts at publish time: Docs’ built-in
File → Download → Web Page export wraps everything in pages of inline
<style> rules and class attributes that fight the CMS theme. Paste it in
and the post looks broken.
This script exports a Doc to HTML and then scrubs it down to the structural
tags a CMS actually wants — headings, paragraphs, lists, and links — with all
the Google-generated styling removed. The writer gets a clean .html file they
can paste straight into the editor.
What you’ll need
- A Google Doc to export — Northwind’s blog drafts live in a shared Drive folder.
- The Doc’s file ID, taken from its URL (
/document/d/THIS_PART/edit). - A Drive folder to hold the exported
.htmlfile. This example drops it in the root of My Drive. - No extra setup — the export uses the Doc service’s own authorised token, so there is no API key to manage.
The script
// The Doc to export when running exportCurrentDraft directly.
const BLOG_DRAFT_ID = '1abcBlogDraftId';
// Filename written to Drive for the cleaned export.
const OUTPUT_FILENAME = 'draft.html';
/**
* Exports a Google Doc to HTML and strips out the Google-generated
* styling, returning publish-ready markup.
* @param {string} docId The file ID of the Doc to export.
* @return {string} Cleaned HTML — structural tags only.
*/
function docToCleanHtml(docId) {
// 1. Ask the Docs export endpoint for the HTML version of the file.
const url = `https://docs.google.com/feeds/download/documents/export/Export?id=${docId}&exportFormat=html`;
const raw = UrlFetchApp.fetch(url, {
headers: { Authorization: `Bearer ${ScriptApp.getOAuthToken()}` },
}).getContentText();
// 2. Strip the export down to clean, structural HTML.
return raw
// Remove the inline stylesheet Docs injects.
.replace(/<style[\s\S]*?<\/style>/g, '')
// Remove meta tags and the whole head.
.replace(/<meta[^>]*>/g, '')
.replace(/<head>[\s\S]*?<\/head>/, '')
// Drop class and id attributes that hook into the dropped CSS.
.replace(/ class="[^"]*"/g, '')
.replace(/ id="[^"]*"/g, '')
// Unwrap span tags — they only ever carried styling.
.replace(/<span[^>]*>([\s\S]*?)<\/span>/g, '$1')
// Collapse anchor tags down to a plain href.
.replace(/<a[^>]*href="([^"]+)"[^>]*>/g, '<a href="$1">')
// Delete empty paragraphs (Docs leaves a lot of these).
.replace(/<p[^>]*>(\s|<br>)*<\/p>/g, '');
}
/**
* Exports the configured blog draft and saves the cleaned HTML to Drive.
*/
function exportCurrentDraft() {
const html = docToCleanHtml(BLOG_DRAFT_ID);
const file = DriveApp.createFile(OUTPUT_FILENAME, html, 'text/html');
Logger.log(`Saved cleaned HTML to ${file.getUrl()}`);
}
How it works
docToCleanHtmlbuilds the URL for the Docs export endpoint, asking for thehtmlexport format.- It fetches that URL with
UrlFetchApp, passing the script’s own OAuth token as a bearer header — that is what authorises the download without an API key. - It then runs the raw export through a chain of regular-expression replacements.
First it removes the injected
<style>block, the<meta>tags, and the whole<head>section. - Next it strips every
classandidattribute, since those only existed to target the CSS that was just deleted. - It unwraps
<span>tags — keeping their text content — because Docs uses spans purely as styling hooks. - It simplifies each
<a>tag down to a barehref, dropping the Google redirect wrappers and styling attributes. - Finally it deletes empty paragraphs, which Docs scatters generously through the export.
exportCurrentDraftcalls the cleaner for the configured draft and writes the result to Drive as an.htmlfile.
Example run
A Doc heading and paragraph export from Google as something like:
<p class="c3"><span class="c1">Northwind ships faster</span></p>
<p class="c2"><span class="c0">Our new release cuts setup time in half.</span></p>
<p class="c2"></p>
After docToCleanHtml runs, the same content comes out clean:
<p>Northwind ships faster</p>
<p>Our new release cuts setup time in half.</p>
The classes, spans, and the trailing empty paragraph are gone — what is left pastes cleanly into the CMS.
Run it
This is an on-demand job — run it whenever a draft is ready to publish:
- Set
BLOG_DRAFT_IDto the file ID of the Doc you want to export. - In the Apps Script editor select
exportCurrentDraftand click Run. - Approve the authorisation prompt the first time.
- Open the logged URL to download
draft.html, then paste its contents into the CMS.
Watch out for
- Regex cleaning is a pragmatic tool, not a full HTML parser. It handles the predictable patterns in a Docs export well, but unusual content — nested tables, deeply styled spans — may leave stray tags. Spot-check the output.
- The export keeps heading levels (
<h1>,<h2>) and lists, but inline formatting like bold and italic survives as<b>/<i>tags. If your CMS expects<strong>/<em>, add another replacement. - Images in the Doc export as base64 data URIs or linked Google URLs, not as files. For image-heavy posts, upload images to the CMS separately.
- The export endpoint is unofficial. It has been stable for years, but it is not a documented API — if it ever changes, fall back to the Drive API’s export method.
DriveApp.createFiledrops the file in the root of My Drive. Pass a folder viaDriveApp.getFolderById(...).createFile(...)if you want it filed away.
Related
Build a contract-clause assembly system
Construct Northwind agreements from a library of approved clauses — drag-drop in code.
Updated Feb 1, 2026
Generate personalized study guides from notes
Reformat raw notes into structured study guides — for Northwind's internal training programme.
Updated Feb 8, 2026
Translate and resolve Doc comments
Localise reviewer feedback on a shared Doc so multilingual teams can collaborate.
Updated Jan 25, 2026
Auto-archive finalized Docs to dated folders
File completed Northwind Docs by month so the active folder stays focused on in-flight work.
Updated Jan 18, 2026
Build a fillable intake form inside a Doc
Create structured intake forms with placeholder fields readers can fill — for client briefs.
Updated Jan 11, 2026