appscript.dev
Automation Intermediate Sheets

Create a fuzzy text-matching function

Match near-duplicate names across two lists with a Levenshtein-based custom function.

Published Jun 23, 2025

Northwind keeps a clean Clients list, but every imported vendor file spells things its own way — “Acme Inc.” here, “Acme, Inc” there, “ACME inc” somewhere else. A plain VLOOKUP matches none of those to each other, so reconciling two lists turns into an afternoon of eyeballing rows and pasting by hand.

This custom function does the eyeballing for you. FUZZYMATCH takes a name and a range of candidates, scores how similar each candidate is using the Levenshtein edit distance, and returns the closest one — but only if it clears a similarity threshold you set. Drag it down a column and the near-duplicates line up automatically, with the genuine non-matches left blank.

What you’ll need

  • Two lists in the same spreadsheet: the names you want to match (for example a Clients column) and the names to match against (for example a Vendors column). Both can be on any tab.
  • A spare column next to the first list for the formula.

The script

// Default similarity needed to count as a match, from 0 (no overlap) to
// 1 (identical). Callers can override this per formula.
const DEFAULT_THRESHOLD = 0.8;

/**
 * Finds the closest fuzzy match for a value within a range of candidates.
 *
 * @param {string} needle The value to look up.
 * @param {Array<Array<string>>} haystackRange The range of candidate
 *   values to match against, e.g. Vendors!A:A.
 * @param {number} [threshold] Minimum similarity (0-1) to accept a
 *   match. Defaults to 0.8.
 * @return {string} The best-matching candidate, or blank if none clears
 *   the threshold.
 * @customfunction
 */
function FUZZYMATCH(needle, haystackRange, threshold = DEFAULT_THRESHOLD) {
  // Nothing to look up — return blank rather than an error.
  if (!needle) return '';

  // Flatten the 2-D range into a plain list and drop empty cells.
  const candidates = [].concat.apply([], haystackRange).filter(Boolean);

  // Track the highest-scoring candidate seen so far.
  let best = { value: '', score: 0 };
  for (const candidate of candidates) {
    const score = similarity(String(needle), String(candidate));
    if (score > best.score) best = { value: candidate, score };
  }

  // Only return a match if it is similar enough to trust.
  return best.score >= threshold ? best.value : '';
}

/**
 * Scores how similar two strings are, from 0 to 1, based on their
 * Levenshtein edit distance.
 *
 * @param {string} a The first string.
 * @param {string} b The second string.
 * @return {number} A similarity score between 0 and 1.
 */
function similarity(a, b) {
  // Compare case-insensitively so "ACME" and "acme" score as identical.
  const distance = levenshtein(a.toLowerCase(), b.toLowerCase());

  // Normalise the edit distance by the longer string's length, then
  // flip it so a smaller distance means a higher score.
  return 1 - distance / Math.max(a.length, b.length);
}

/**
 * Computes the Levenshtein edit distance between two strings: the
 * minimum number of single-character insertions, deletions, or
 * substitutions to turn one into the other.
 *
 * @param {string} a The first string.
 * @param {string} b The second string.
 * @return {number} The edit distance.
 */
function levenshtein(a, b) {
  const m = a.length;
  const n = b.length;

  // dp[i][j] is the edit distance between the first i chars of a and
  // the first j chars of b. Row 0 is seeded with 0..n.
  const dp = Array.from({ length: m + 1 }, (_, i) => [i, ...Array(n).fill(0)]);
  for (let j = 0; j <= n; j++) dp[0][j] = j;

  // Fill the table row by row.
  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      dp[i][j] = a[i - 1] === b[j - 1]
        ? dp[i - 1][j - 1]                                    // chars match
        : 1 + Math.min(dp[i - 1][j - 1],   // substitution
                       dp[i - 1][j],       // deletion
                       dp[i][j - 1]);      // insertion
    }
  }
  return dp[m][n];
}

How it works

  1. FUZZYMATCH takes the value to look up, a range of candidates, and an optional threshold. An empty needle returns blank straight away.
  2. The candidate range arrives as a 2-D array (a column is many one-cell rows), so [].concat.apply flattens it into a plain list and filter(Boolean) drops empty cells.
  3. It scores every candidate with similarity and keeps the highest scorer in best.
  4. It only returns that best match if the score clears the threshold — otherwise blank, so a genuine non-match never gets a wrong answer attached.
  5. similarity lower-cases both strings, computes their Levenshtein distance, and converts it to a 0-to-1 score by dividing by the longer string’s length and subtracting from one.
  6. levenshtein fills a dynamic-programming table where each cell is the cheapest way to transform one string prefix into another, using insertions, deletions, and substitutions.

Use it

With the name to match in A2 and the candidate list in column A of a Vendors tab, type the formula into a spare cell:

=FUZZYMATCH(A2, Vendors!A:A, 0.85)

Drag it down the column. Each row shows the closest vendor name if its similarity is 0.85 or higher, and stays blank otherwise. The third argument is optional — =FUZZYMATCH(A2, Vendors!A:A) uses the default threshold of 0.8.

For example, with A2 holding Acme, Inc and the Vendors list containing Acme Inc., the formula returns Acme Inc. — the two strings are only a couple of edits apart, comfortably above the threshold.

Watch out for

  • The threshold is a judgement call. Set it too low and Acme matches Acne; too high and Acme Inc. no longer matches Acme, Inc. Test it against your real data and tune — 0.8 to 0.9 suits most company-name work.
  • Levenshtein measures edits, not meaning. IBM and International Business Machines are the same company but score near zero, while two unrelated short codes can score high. It catches typos and punctuation drift, not synonyms.
  • Short strings are fragile. A single edit in a four-letter word is a 25 percent hit to the score, so abbreviations and codes match unreliably. The function is most dependable on longer names.
  • Every formula scans the whole candidate range. A column of FUZZYMATCH formulas against a long Vendors!A:A is doing a lot of comparisons — if the sheet feels slow, point the second argument at a tighter range.
  • The match is one-directional. FUZZYMATCH finds the best candidate for one needle; it does not flag that two candidates are duplicates of each other. For a full reconciliation, run it from both lists and compare.

Related