Design automations that never fail silently
Error handling, logging, and alerting patterns for Northwind's production scripts.
Published Jun 29, 2025
The most dangerous failure mode for an automation is not a crash — it is a crash nobody sees. A time-based trigger runs unattended, so when it throws an error the error goes into a log panel that no one is looking at. The script is broken, and it stays broken until a person happens to notice that the work stopped getting done.
Designing an automation that never fails silently means every failure produces two things: a permanent record you can debug from later, and an active alert that reaches a human now. This guide builds a small wrapper that guarantees both for every entry point, so a broken job announces itself instead of waiting to be discovered.
The three rules of visible failure
A failure that surfaces properly follows three rules. The wrapper below enforces all of them.
| Rule | Why it matters |
|---|---|
| Log it | Gives you the message and stack to debug later |
| Alert it | Reaches a human while the problem is still fresh |
| Re-throw it | Keeps the run marked failed, not falsely “ok” |
The third rule is the one people get wrong. A catch block that logs and then
returns normally hides the failure — the executions panel shows a green
success, and the automation has lied to you.
The pattern
Wrap every entry point in a safe() helper that catches any error, records it,
raises an alert, and then re-throws so the failure stays visible.
// Decorate an entry-point function so any uncaught error is logged,
// alerted, and still surfaced as a failed run.
function safe(fn, name = fn.name) {
return (...args) => {
try {
return fn(...args);
} catch (e) {
// Rule 1: write a permanent record with the stack trace.
log('error', name, e.message, e.stack);
// Rule 2: push an alert to a channel a human watches.
alert(name + ': ' + e.message);
// Rule 3: re-throw so the run is marked failed, not silently swallowed.
throw e;
}
};
}
// Append one row to a log spreadsheet — durable, sortable, readable by anyone.
function log(level, name, message, extra = '') {
SpreadsheetApp.openById('1abcLogId').getSheets()[0]
.appendRow([new Date(), level, name, message, extra]);
}
// Send an alert to Slack so a failure reaches a person immediately.
function alert(text) {
UrlFetchApp.fetch(
PropertiesService.getScriptProperties().getProperty('SLACK_WEBHOOK'),
{
method: 'post',
contentType: 'application/json',
payload: JSON.stringify({ text: ':rotating_light: ' + text }),
}
);
}
The webhook URL lives in Script Properties, not the code, so the same source runs in any environment by pointing at a different channel.
Use it
Wrap every function that runs on its own — triggered functions, time-based jobs, web app handlers. Anything a human does not watch start needs the wrapper.
// The trigger fires the wrapped function, so any failure logs and alerts.
ScriptApp.newTrigger(safe(syncStripeCharges, 'syncStripeCharges').name)
.timeBased()
.everyHours(1)
.create();
Functions you only ever run by hand do not strictly need wrapping — you are there to see the error — but wrapping them anyway costs nothing and keeps the logs complete.
Guard against alert fatigue
An automation that alerts on every run of a persistent failure trains people to ignore it. A few guard rails keep alerts meaningful.
- Deduplicate. Store the last alert’s message and timestamp; skip a repeat within a cooldown window.
- Escalate, do not repeat. A first failure is a Slack message; the tenth in a row is an email to an owner.
- Distinguish expected from unexpected. A transient API timeout that a retry will fix is not the same as a code bug — log the first quietly, alert loudly on the second.
Why this matters
- Without alerts, a broken cron job stays broken until a person stumbles onto the consequences — often a stakeholder, not you.
- Without logs, you cannot reconstruct what happened on a run that finished hours ago.
- Without re-throwing, the executions panel reports success and you lose even the record that anything went wrong.
Common mistakes
- Catching an error, logging it, and returning normally. The run shows green and the failure is hidden — always re-throw.
- Wrapping nothing, on the assumption you will check the executions panel. Nobody checks it until something is already wrong.
- Alerting on every failed run of a persistent fault until the channel is noise — add deduplication and a cooldown.
- Hardcoding the webhook or log ID in the wrapper, so the same code cannot run in a second environment.
- Letting
log()oralert()throw inside thecatch. If alerting fails, the original error is lost — keep those helpers simple and resilient.