Create a fuzzy text-matching function
Match near-duplicate names across two lists with a Levenshtein-based custom function.
Published Jun 23, 2025
Northwind keeps a clean Clients list, but every imported vendor file spells
things its own way — “Acme Inc.” here, “Acme, Inc” there, “ACME inc” somewhere
else. A plain VLOOKUP matches none of those to each other, so reconciling two
lists turns into an afternoon of eyeballing rows and pasting by hand.
This custom function does the eyeballing for you. FUZZYMATCH takes a name and
a range of candidates, scores how similar each candidate is using the
Levenshtein edit distance, and returns the closest one — but only if it clears
a similarity threshold you set. Drag it down a column and the near-duplicates
line up automatically, with the genuine non-matches left blank.
What you’ll need
- Two lists in the same spreadsheet: the names you want to match (for example a
Clientscolumn) and the names to match against (for example aVendorscolumn). Both can be on any tab. - A spare column next to the first list for the formula.
The script
// Default similarity needed to count as a match, from 0 (no overlap) to
// 1 (identical). Callers can override this per formula.
const DEFAULT_THRESHOLD = 0.8;
/**
* Finds the closest fuzzy match for a value within a range of candidates.
*
* @param {string} needle The value to look up.
* @param {Array<Array<string>>} haystackRange The range of candidate
* values to match against, e.g. Vendors!A:A.
* @param {number} [threshold] Minimum similarity (0-1) to accept a
* match. Defaults to 0.8.
* @return {string} The best-matching candidate, or blank if none clears
* the threshold.
* @customfunction
*/
function FUZZYMATCH(needle, haystackRange, threshold = DEFAULT_THRESHOLD) {
// Nothing to look up — return blank rather than an error.
if (!needle) return '';
// Flatten the 2-D range into a plain list and drop empty cells.
const candidates = [].concat.apply([], haystackRange).filter(Boolean);
// Track the highest-scoring candidate seen so far.
let best = { value: '', score: 0 };
for (const candidate of candidates) {
const score = similarity(String(needle), String(candidate));
if (score > best.score) best = { value: candidate, score };
}
// Only return a match if it is similar enough to trust.
return best.score >= threshold ? best.value : '';
}
/**
* Scores how similar two strings are, from 0 to 1, based on their
* Levenshtein edit distance.
*
* @param {string} a The first string.
* @param {string} b The second string.
* @return {number} A similarity score between 0 and 1.
*/
function similarity(a, b) {
// Compare case-insensitively so "ACME" and "acme" score as identical.
const distance = levenshtein(a.toLowerCase(), b.toLowerCase());
// Normalise the edit distance by the longer string's length, then
// flip it so a smaller distance means a higher score.
return 1 - distance / Math.max(a.length, b.length);
}
/**
* Computes the Levenshtein edit distance between two strings: the
* minimum number of single-character insertions, deletions, or
* substitutions to turn one into the other.
*
* @param {string} a The first string.
* @param {string} b The second string.
* @return {number} The edit distance.
*/
function levenshtein(a, b) {
const m = a.length;
const n = b.length;
// dp[i][j] is the edit distance between the first i chars of a and
// the first j chars of b. Row 0 is seeded with 0..n.
const dp = Array.from({ length: m + 1 }, (_, i) => [i, ...Array(n).fill(0)]);
for (let j = 0; j <= n; j++) dp[0][j] = j;
// Fill the table row by row.
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
dp[i][j] = a[i - 1] === b[j - 1]
? dp[i - 1][j - 1] // chars match
: 1 + Math.min(dp[i - 1][j - 1], // substitution
dp[i - 1][j], // deletion
dp[i][j - 1]); // insertion
}
}
return dp[m][n];
}
How it works
FUZZYMATCHtakes the value to look up, a range of candidates, and an optional threshold. An emptyneedlereturns blank straight away.- The candidate range arrives as a 2-D array (a column is many one-cell rows),
so
[].concat.applyflattens it into a plain list andfilter(Boolean)drops empty cells. - It scores every candidate with
similarityand keeps the highest scorer inbest. - It only returns that best match if the score clears the threshold — otherwise blank, so a genuine non-match never gets a wrong answer attached.
similaritylower-cases both strings, computes their Levenshtein distance, and converts it to a 0-to-1 score by dividing by the longer string’s length and subtracting from one.levenshteinfills a dynamic-programming table where each cell is the cheapest way to transform one string prefix into another, using insertions, deletions, and substitutions.
Use it
With the name to match in A2 and the candidate list in column A of a
Vendors tab, type the formula into a spare cell:
=FUZZYMATCH(A2, Vendors!A:A, 0.85)
Drag it down the column. Each row shows the closest vendor name if its
similarity is 0.85 or higher, and stays blank otherwise. The third argument is
optional — =FUZZYMATCH(A2, Vendors!A:A) uses the default threshold of 0.8.
For example, with A2 holding Acme, Inc and the Vendors list containing
Acme Inc., the formula returns Acme Inc. — the two strings are only a couple
of edits apart, comfortably above the threshold.
Watch out for
- The threshold is a judgement call. Set it too low and
AcmematchesAcne; too high andAcme Inc.no longer matchesAcme, Inc. Test it against your real data and tune — 0.8 to 0.9 suits most company-name work. - Levenshtein measures edits, not meaning.
IBMandInternational Business Machinesare the same company but score near zero, while two unrelated short codes can score high. It catches typos and punctuation drift, not synonyms. - Short strings are fragile. A single edit in a four-letter word is a 25 percent hit to the score, so abbreviations and codes match unreliably. The function is most dependable on longer names.
- Every formula scans the whole candidate range. A column of
FUZZYMATCHformulas against a longVendors!A:Ais doing a lot of comparisons — if the sheet feels slow, point the second argument at a tighter range. - The match is one-directional.
FUZZYMATCHfinds the best candidate for one needle; it does not flag that two candidates are duplicates of each other. For a full reconciliation, run it from both lists and compare.
Related
Parse messy mixed-format dates
Normalise inconsistently formatted strings into real date values with a single formula.
Updated Aug 2, 2025
Scrape a web table into cells with one formula
Pull HTML tables into Sheets as a custom function — no IMPORTHTML quirks.
Updated Jul 30, 2025
Mask sensitive columns for shareable copies
Redact PII with a custom function so you can share a copy of the sheet without exposing names, emails, or numbers.
Updated Jul 26, 2025
Build a sentiment-scoring function without AI
Rate text positive or negative with a tiny built-in lexicon — no API key, no quota.
Updated Jul 23, 2025
Build a unit-conversion function library
Convert between any units with one custom formula — kg/lb, km/mi, °C/°F, and the rest.
Updated Jul 19, 2025