appscript.dev
Automation Advanced Forms

Auto-grade open responses against a rubric

Score and rank Northwind submissions with AI — keep grading consistent across reviewers.

Published Sep 3, 2025

Northwind runs an open-call form where applicants write a free-text response, and a panel scores each one. Human scoring drifts — one reviewer is generous on a Friday, another is strict, and the same answer gets a 3 from one person and a 5 from another. The inconsistency is hard to defend when an applicant asks why they missed the cut.

This script gives every submission a consistent first pass. The moment a form is submitted, it sends the response and a fixed rubric to Claude, which scores it 1-5 on relevance, specificity and clarity and returns a total. The scores land in a grades sheet alongside the submission. Reviewers still make the final call, but they start from the same baseline for every applicant.

What you’ll need

  • A Google Form with an open-text question titled exactly Your response.
  • A Google Sheet to collect the grades — the script appends to its first tab.
  • An Anthropic API key saved as ANTHROPIC_API_KEY in Script Properties — see Store API keys and secrets securely.
  • The grades sheet ID, which you paste into the config below.

The script

// The spreadsheet that collects the grades.
const GRADES_SHEET_ID = '1abcGradesId';

// Model used for scoring. Haiku is fast and cheap for short rubric work.
const GRADING_MODEL = 'claude-haiku-4-5-20251001';

// The scoring rubric sent with every response.
const RUBRIC = `Score 1-5 each:
- Relevance to the prompt
- Specificity (concrete examples)
- Clarity`;

/**
 * Runs on every form submission. Sends the response and rubric to
 * Claude, then appends the scores to the grades sheet.
 *
 * @param {Object} e The form-submit event object.
 */
function onFormSubmit(e) {
  // 1. Pull the open-text answer out of the submission.
  const text = e.namedValues['Your response'][0];
  if (!text || !text.trim()) {
    Logger.log('Empty response — skipping.');
    return;
  }

  // 2. Build a prompt: the rubric, the response, and a strict JSON schema.
  const prompt = `${RUBRIC}

Response:
${text}

Return JSON: {"relevance":N,"specificity":N,"clarity":N,"total":N}`;

  // 3. Score the response with Claude.
  const grade = gradeResponse(prompt);

  // 4. Append the scores to the grades sheet, with a truncated copy of
  //    the response so a reviewer can see what was graded.
  SpreadsheetApp.openById(GRADES_SHEET_ID).getSheets()[0].appendRow([
    new Date(),
    text.slice(0, 100),
    grade.relevance,
    grade.specificity,
    grade.clarity,
    grade.total,
  ]);
}

/**
 * Sends a grading prompt to the Anthropic API and parses the JSON
 * scores out of the reply.
 *
 * @param {string} prompt The rubric-plus-response prompt.
 * @return {{relevance:number, specificity:number, clarity:number, total:number}}
 */
function gradeResponse(prompt) {
  const key = PropertiesService.getScriptProperties()
    .getProperty('ANTHROPIC_API_KEY');

  const res = UrlFetchApp.fetch('https://api.anthropic.com/v1/messages', {
    method: 'post',
    contentType: 'application/json',
    headers: { 'x-api-key': key, 'anthropic-version': '2023-06-01' },
    payload: JSON.stringify({
      model: GRADING_MODEL,
      max_tokens: 200,
      messages: [{ role: 'user', content: prompt }],
    }),
    muteHttpExceptions: true,
  });

  // The model returns JSON text; parse the API envelope, then the scores.
  return JSON.parse(JSON.parse(res.getContentText()).content[0].text);
}

How it works

  1. onFormSubmit fires on every submission. It reads the open-text answer from e.namedValues['Your response'], the column titled Your response. If the answer is blank it logs a note and stops — no point scoring nothing.
  2. It builds a prompt that stacks three things: the fixed RUBRIC, the applicant’s response, and an explicit instruction to return JSON in a fixed shape. The fixed schema is what makes the reply parseable.
  3. It calls gradeResponse, which sends the prompt to Claude Haiku — fast and cheap, which matters when scoring runs on every submission.
  4. gradeResponse parses twice: once to unwrap the API envelope, once to turn the model’s JSON text into a real object with four numeric scores.
  5. Back in onFormSubmit, it appends a row to the grades sheet — a timestamp, the first 100 characters of the response, and the four scores — so a reviewer can see the score and the answer side by side.

Example run

An applicant submits this answer to Your response:

“I’d improve onboarding by adding a guided checklist. At my last role we cut setup time in half with one — new users knew exactly what to do next.”

A few seconds later, a row appears in the grades sheet:

TimestampResponse (first 100)RelevanceSpecificityClarityTotal
2026-05-25 09:14I’d improve onboarding by adding a guided checklist. At my last role we cut setup time…54514

Every submission gets the same treatment, so when the panel sorts by Total they are sorting a consistently graded list, not a pile of personal opinions.

Trigger it

Form-submit needs an installable trigger — bind it to the form, not the sheet:

  1. Open the form’s bound Apps Script project (Form → three dots → Script editor).
  2. Open Triggers (the clock icon) and click Add trigger.
  3. Choose function onFormSubmit, event source From form, event type On form submit.
  4. Save and approve the authorisation prompt.

Watch out for

  • The script depends on a question titled exactly Your response. Rename the question on the form and e.namedValues['Your response'] returns undefined — update the key if you change the title.
  • The double JSON.parse will throw if Claude wraps the JSON in a code fence or adds prose. Keep the schema instruction strict; if parsing still fails, strip fences from the reply before parsing or log the raw text to debug.
  • AI scores are a baseline, not a verdict. The model is consistent, but it can still misjudge nuance — a human reviewer should confirm scores near the cut-off.
  • Every submission triggers an API call. A burst of submissions means a burst of calls and cost; for a high-volume form, consider batching scoring on a schedule instead of grading live.
  • muteHttpExceptions stops a failed call from crashing the trigger, but a failure then surfaces as a parse error. Wrap the call in a try/catch and log the submission if you need every row scored.
  • The rubric is hard-coded. Editing RUBRIC changes how every future submission is scored, so past and future grades will not be comparable if you change it mid-call.

Related