Auto-grade open responses against a rubric
Score and rank Northwind submissions with AI — keep grading consistent across reviewers.
Published Sep 3, 2025
Northwind runs an open-call form where applicants write a free-text response, and a panel scores each one. Human scoring drifts — one reviewer is generous on a Friday, another is strict, and the same answer gets a 3 from one person and a 5 from another. The inconsistency is hard to defend when an applicant asks why they missed the cut.
This script gives every submission a consistent first pass. The moment a form is submitted, it sends the response and a fixed rubric to Claude, which scores it 1-5 on relevance, specificity and clarity and returns a total. The scores land in a grades sheet alongside the submission. Reviewers still make the final call, but they start from the same baseline for every applicant.
What you’ll need
- A Google Form with an open-text question titled exactly
Your response. - A Google Sheet to collect the grades — the script appends to its first tab.
- An Anthropic API key saved as
ANTHROPIC_API_KEYin Script Properties — see Store API keys and secrets securely. - The grades sheet ID, which you paste into the config below.
The script
// The spreadsheet that collects the grades.
const GRADES_SHEET_ID = '1abcGradesId';
// Model used for scoring. Haiku is fast and cheap for short rubric work.
const GRADING_MODEL = 'claude-haiku-4-5-20251001';
// The scoring rubric sent with every response.
const RUBRIC = `Score 1-5 each:
- Relevance to the prompt
- Specificity (concrete examples)
- Clarity`;
/**
* Runs on every form submission. Sends the response and rubric to
* Claude, then appends the scores to the grades sheet.
*
* @param {Object} e The form-submit event object.
*/
function onFormSubmit(e) {
// 1. Pull the open-text answer out of the submission.
const text = e.namedValues['Your response'][0];
if (!text || !text.trim()) {
Logger.log('Empty response — skipping.');
return;
}
// 2. Build a prompt: the rubric, the response, and a strict JSON schema.
const prompt = `${RUBRIC}
Response:
${text}
Return JSON: {"relevance":N,"specificity":N,"clarity":N,"total":N}`;
// 3. Score the response with Claude.
const grade = gradeResponse(prompt);
// 4. Append the scores to the grades sheet, with a truncated copy of
// the response so a reviewer can see what was graded.
SpreadsheetApp.openById(GRADES_SHEET_ID).getSheets()[0].appendRow([
new Date(),
text.slice(0, 100),
grade.relevance,
grade.specificity,
grade.clarity,
grade.total,
]);
}
/**
* Sends a grading prompt to the Anthropic API and parses the JSON
* scores out of the reply.
*
* @param {string} prompt The rubric-plus-response prompt.
* @return {{relevance:number, specificity:number, clarity:number, total:number}}
*/
function gradeResponse(prompt) {
const key = PropertiesService.getScriptProperties()
.getProperty('ANTHROPIC_API_KEY');
const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
method: 'post',
contentType: 'application/json',
headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
payload: JSON.stringify({
model: GRADING_MODEL,
max_tokens: 200,
messages: [{ role: 'user', content: prompt }],
}),
muteHttpExceptions: true,
});
// The model returns JSON text; parse the API envelope, then the scores.
return JSON.parse(JSON.parse(res.getContentText()).content[0].text);
}
How it works
onFormSubmitfires on every submission. It reads the open-text answer frome.namedValues['Your response'], the column titledYour response. If the answer is blank it logs a note and stops — no point scoring nothing.- It builds a prompt that stacks three things: the fixed
RUBRIC, the applicant’s response, and an explicit instruction to return JSON in a fixed shape. The fixed schema is what makes the reply parseable. - It calls
gradeResponse, which sends the prompt to Claude Haiku — fast and cheap, which matters when scoring runs on every submission. gradeResponseparses twice: once to unwrap the API envelope, once to turn the model’s JSON text into a real object with four numeric scores.- Back in
onFormSubmit, it appends a row to the grades sheet — a timestamp, the first 100 characters of the response, and the four scores — so a reviewer can see the score and the answer side by side.
Example run
An applicant submits this answer to Your response:
“I’d improve onboarding by adding a guided checklist. At my last role we cut setup time in half with one — new users knew exactly what to do next.”
A few seconds later, a row appears in the grades sheet:
| Timestamp | Response (first 100) | Relevance | Specificity | Clarity | Total |
|---|---|---|---|---|---|
| 2026-05-25 09:14 | I’d improve onboarding by adding a guided checklist. At my last role we cut setup time… | 5 | 4 | 5 | 14 |
Every submission gets the same treatment, so when the panel sorts by Total
they are sorting a consistently graded list, not a pile of personal opinions.
Trigger it
Form-submit needs an installable trigger — bind it to the form, not the sheet:
- Open the form’s bound Apps Script project (Form → three dots → Script editor).
- Open Triggers (the clock icon) and click Add trigger.
- Choose function
onFormSubmit, event source From form, event type On form submit. - Save and approve the authorisation prompt.
Watch out for
- The script depends on a question titled exactly
Your response. Rename the question on the form ande.namedValues['Your response']returnsundefined— update the key if you change the title. - The double
JSON.parsewill throw if Claude wraps the JSON in a code fence or adds prose. Keep the schema instruction strict; if parsing still fails, strip fences from the reply before parsing or log the raw text to debug. - AI scores are a baseline, not a verdict. The model is consistent, but it can still misjudge nuance — a human reviewer should confirm scores near the cut-off.
- Every submission triggers an API call. A burst of submissions means a burst of calls and cost; for a high-volume form, consider batching scoring on a schedule instead of grading live.
muteHttpExceptionsstops a failed call from crashing the trigger, but a failure then surfaces as a parse error. Wrap the call in a try/catch and log the submission if you need every row scored.- The rubric is hard-coded. Editing
RUBRICchanges how every future submission is scored, so past and future grades will not be comparable if you change it mid-call.
Related
Trigger an onboarding sequence on form submit
Kick off tasks when a new Northwind hire submits their starter form.
Updated Oct 17, 2025
Build a content-submission queue
Collect Northwind guest posts or ideas for review through a Form.
Updated Oct 9, 2025
Score sentiment in open-text feedback
Rate Northwind feedback comments without manual review — using the in-Sheet sentiment function.
Updated Oct 5, 2025
Build a peer-nomination and voting system
Collect and tally Northwind nominations for awards or initiatives — one ballot, anonymous.
Updated Oct 1, 2025
Roll a form over each cycle
Archive old responses and reset for the next Northwind cycle — quarterly OKR check-ins.
Updated Sep 27, 2025