Mark As Completed Discussion

Good afternoon! Here's our prompt for today.

The Edit Distance Challenge

What's the Challenge?

Your task is to find the Edit Distance between two strings. The edit distance measures how different two strings are by calculating the minimum number of transformations required to convert one string into another. The allowed transformations are:

  • Insertion: Adding a character.
  • Deletion: Removing a character.
  • Substitution: Replacing one character with another.

For instance, if the two strings are identical, such as cat and cat, the edit distance is 0. That's because no transformations are needed to make the two strings the same.

An Example to Illuminate

Consider two strings mitten and sitting. If we call getEditDistance('mitten', 'sitting'), the function should return 3. Here's how we arrive at this number:

  1. mittensitten: Substitute "s" for "m."
  2. sittensittin: Substitute "i" for "e."
  3. sittinsitting: Insert "g" at the end.

Method Signature

You are to implement a function called getEditDistance(a, b) that will return the edit distance between strings a and b.

JAVASCRIPT
1function getEditDistance(a, b) {
2  // Your code here
3}

Constraints to Keep in Mind

  • The lengths of both given strings a and b will be less than or equal to 1000.
  • The answer will fit within the integer range.
  • You are only allowed to change one of the strings.
  • Let m and n be the lengths of a and b, respectively.
  • Time Complexity: O(n * m)
  • Space Complexity: O(n * m)

Try to solve this here or in Interactive Mode.

How do I practice this challenge?

JAVASCRIPT
OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment

Here's our guided, illustrated walk-through.

How do I use this guide?

The Essence of Levenshtein Edit Distance: A Mathematical Lens

The Birth of the Concept

The Levenshtein edit distance algorithm was introduced by Vladimir Levenshtein in 1965. It's a mathematical formula that calculates the least number of single-character edits needed to change one string into another.

The Intimidating Equation

Upon first glance, the formula for Levenshtein Distance may seem complex:

Step Two

Breaking Down the Formula

Step Two

The Min Function

The min function is key to understanding the formula. What we're doing is considering three options at each step:

  1. Insertion: This is represented by Levenshtein(a1,b)
  2. Deletion: This is represented by Levenshtein(a,b1)
  3. Substitution: This is represented by Levenshtein(a1,b1)

We take the minimum of these three options because we want to find the least number of edits needed.

The Cost Factor

The "cost" is either 0 or 1, depending on whether the current characters in the strings a and b are equal or not. If they are equal, the cost is 0; otherwise, it's 1.

A More Intuitive View

Think of this as a journey where you start at one corner of a grid. Your destination is the opposite corner. The grid cells represent the characters in the strings you're comparing. Each move from one cell to another corresponds to an edit operation:

  • Moving horizontally represents insertion.
  • Moving vertically represents deletion.
  • Moving diagonally represents substitution.

You're trying to find the shortest path from one corner to another, making the least number of moves (edits). That's your Levenshtein Distance.

Skip this screen if you don't care about the math and want to know how the implementation works.

The mathematical definition is as follows: the Levenshtein distance between two strings a and b can be calculated by:

Mathematical Definition

This makes sense, we're inputting a and b and are arriving an outcome after processing the two values somehow.

So how do we process them?

By using an indicator function, which is a defined as a function defined on a set X that indicates membership of an element in a subset A of X. This is represented by a 1 in all elements of A where there's a membership and a 0 for all elements of X not in A.

So the following:

Mathematical Definition

Is the indicator function equal to 0 when ai = bj (when the letters are the same in the same position) and equal to 1 otherwise (if they differ).

To calculate the actual distance, we use:

Mathematical Definition

The above is the distance between the first i characters of a and the first j characters of b.

Confused? Don't worry, we'll move beyond the math in the next section.

Unlocking the Intuition: The Shopping Analogy for Edit Distance

Shopping for String Transformations: The Essence of Edit Distance

Imagine you're at a shopping mall, but this is no ordinary mall. Instead of clothes and gadgets, you're shopping for string transformations. Your goal is to transform one string into another with the least amount of "money" spent. Each "store" in this mall specializes in one kind of string operation: insertion, deletion, or substitution.

The Cost of String Operations: Understanding the Pricing System

Just like in any shopping spree, understanding the pricing system is crucial:

  • No Purchase Necessary: If both strings have the same letter at a given position, no action is needed. It's like finding an item you already own; you don't need to buy it again. Cost: 0.

  • Make a Purchase: If the letters are different, then it's time to shop! You have three stores to consider:

    • The "Insertion" store
    • The "Deletion" store
    • The "Substitution" store

Your aim is to make the most cost-effective purchase to align the strings.

Why the Matrix? Your Shopping List for Optimal Choices

So, why are we filling out a matrix? Think of the matrix as your shopping list or, better yet, your comparison shopping chart. Each cell in the matrix represents a decision point where you have to choose the most cost-effective operation.

  1. From Above (Insertion Store): Imagine you're considering adding a new letter to string 1. Look at the price from the cell directly above and add 1.

  2. From the Left (Deletion Store): Now imagine you're contemplating deleting a letter from string 1. Check the price from the cell directly to the left and add 1.

  3. From the Diagonal (Substitution Store): Finally, you're thinking about substituting one letter for another. Look at the diagonal price and add the cost, which is 0 if the letters are the same and 1 otherwise.

Completing Your Shopping List: The Bottom-Right Cell

As you fill in the matrix (your shopping list), you're effectively tracking your "spending" at each point, always opting for the cheapest viable operation. When you finally reach the bottom-right cell of the matrix, you've found the least expensive route to make both strings identical. That's your total "spending," or the final edit distance.

The Beauty of This Approach

This approach works beautifully because it allows you to compare every possible pair of prefixes between the two strings. You are essentially exploring all potential shopping routes and jotting down the least expensive "purchases" you can make at each decision point. By the time you reach the bottom-right cell, you've done your due diligence and found the best deal. That's why this method is so effective and why the final number in the bottom-right cell truly represents the minimum edit distance between the two strings.

Are you sure you're getting this? Is this statement true or false?

Suppose we're trying to find the edit distance of two strings, string A = "milk" and string B = "melt".

When comparing the character of string A at index 0, and string B at index 0, no edit is neeed so we should not add 1 to our running distance.

However, at index 1 for the two strings, we want to add 1 to account for I and E.

Press true if you believe the statement is correct, or false otherwise.

So we now know when there is a deletion, addition, or substitution, we'll need to account for that. The key to understanding the levenshtein edit distance is that we'll use an array matrix to account for the n^2 possible transformations.

Here's a nice visual to grok how we'll apply this logic across a matrix:

Using a Matrix

Image credit to https://www.python-course.eu.

Building the Edit Distance Matrix: A Concrete Example

Let's dive into another example and step through the entire thing.

The Core Idea: Cumulative Transformation Distance

To find the smallest number of transformations needed, we will fill in a matrix while keeping track of the cumulative transformation distance. This is the running total of changes needed to make the strings match up to that point.

Here's how we fill in each cell of the matrix:

  1. From Above: Take the value from the cell immediately above and add 1: d[i1,j]+1
  2. From the Left: Take the value from the cell immediately to the left and add 1: d[i,j1]+1
  3. From the Diagonal: Take the value from the cell diagonally above and to the left, and add the cost: d[i1,j1]+cost

Step-by-Step Matrix Building: The Example of best vs test

Let's make this real by constructing the matrix for the strings best and test. First, we initialize a row and a column based on the lengths of our two strings:

best
01234
t1START
e2
s3
t4

Now let's fill it in, guided by our three rules:

  1. From Above: If we just want to insert a letter, we look at the cell above and add 1.
  2. From the Left: If we want to delete a letter, we look at the cell to the left and add 1.
  3. From the Diagonal: If we want to substitute a letter, we look diagonally and add the cost (which is 0 if the letters are the same, and 1 otherwise).

Moving Left to Right: Starting with Column "B"

To understand how to fill in the matrix, let's begin with the column corresponding to the letter B from the string best. Our goal is to move left to right and populate each cell with the least "costly" way to transform the strings up to that point.

The Three Choices for Each Cell

Remember our guidelines for each cell:

  1. From Above: The cell immediately above plus 1.
  2. From the Left: The cell immediately to the left plus 1.
  3. From the Diagonal: The cell diagonally above and to the left, plus the cost (0 if the letters are the same, and 1 otherwise).

Filling In the "B" Column

Starting at the * in the prior table, we see that B and T are different. So, the edit cost is 1. With this in mind, we have three calculations:

  1. From Above: (1 + 1 = 2)
  2. From the Left: (1 + 1 = 2)
  3. From the Diagonal: (0 + 1 = 1)

The minimum of these is 1, so we place it in the cell where B intersects with T.

Here's how our matrix looks after filling in the "B" column:

Best
01234
T11
e22
s33
t44

We've successfully filled in the first column for the letter B. We'd continue this process for each of the remaining columns to complete the matrix and find the total edit distance.

The Column for Letter "E"

Now that we've filled in the column for the letter B, let's move on to the column for the letter E from the string best.

Calculating the Costs

Starting with E and T, we again find that the two letters are different, so the edit cost is 1. Here are the calculations for the cell:

  1. From Above: (1 + 1 = 2)
  2. From the Left: (2 + 1 = 3)
  3. From the Diagonal: (1 + 1 = 2)

The minimum of these is 2, so we place it in the cell where E intersects with T.

Next, we move to E and e. The two letters are the same, so the edit cost is 0. The calculations for this cell are:

  1. From Above: (2 + 1 = 3)
  2. From the Left: (1 + 1 = 2)
  3. From the Diagonal: (0 + 1 = 1)

The minimum is 1, so we place it in the cell where E intersects with e.

The Updated Matrix

Here's how our matrix looks after filling in the "E" column:

bEst
01234
T112
e221
s332
t443

We've successfully filled in the second column for the letter E. As before, we'd continue this process for the remaining columns to complete the matrix and find the total edit distance.

The Column for Letter "S"

Having filled the columns for B and E, let's now move on to the column for the letter S from the string best.

Calculating the Costs for "S"

Starting with S and T, we find that the two letters are different, so the edit cost is 1. The calculations for this cell are:

  1. From Above: (2 + 1 = 3)
  2. From the Left: (2 + 1 = 3)
  3. From the Diagonal: (2 + 1 = 3)

The minimum of these is 3, so we place it in the cell where S intersects with T.

The Updated Matrix with "S" Column Filled

Here's how our matrix looks after filling in the "S" column:

beSt
01234
T1123
e2212
s3321
t4432

With the column for S filled in, we're getting closer to completing our matrix. We would continue this step-by-step approach for the remaining columns to determine the final edit distance between the two strings.

The Column for Letter "T"

Now that we've successfully filled in the columns for B, E, and S, we can wrap up our matrix with the last column for the letter T from the string best.

Calculating the Costs for "T"

Starting with the first row, T and t are the same characters, so the edit cost is 0. The calculations for this cell are:

  1. From Above: (4 + 1 = 5)
  2. From the Left: (3 + 1 = 4)
  3. From the Diagonal: (3 + 0 = 3)

The minimum of these is 3, so we place it in the cell where T intersects with t.

The Final Matrix

Here's how our fully completed matrix looks:

besT
01234
t11233
e22123
s33212
t44321

Congratulations, you've successfully completed the matrix! The value in the bottom-right corner tells us that the edit distance between best and test is 1, confirming that it would take a minimum of 1 edit to transform one string into the other.

Decoding the Matrix: The Final Edit Distance

After filling in our matrix, we look to the bottom-right corner to find our final edit distance. In this example, that value is 1, which confirms that it takes just a single edit to transform "best" into "test". Specifically, we can swap the first b with a t.

Here's the final matrix for quick reference:

besT
01234
t11233
e22123
s33212
T44321 (*)

Complexity Analysis of the Final Solution

Space Complexity

The matrix we've built has dimensions that depend on the lengths of our two strings. Specifically, if m and n are the lengths of the two strings, the matrix is of size m x n. This gives us a space complexity of O(m x n).

Time Complexity

To fill in this matrix, we used two nested loops that iterate through each row and column, performing constant-time operations at each cell. This results in a time complexity of O(m x n) as well.

Both the time and space complexities for this solution are O(m x n), making it a quadratic algorithm.

One Pager Cheat Sheet

  • The Edit Distance between two strings is calculated by finding the minimum number of insertion, deletion, and substitution transformations required to turn the first string into the second.
  • The Levenshtein edit distance is a mathematical formula developed by Vladimir Levenshtein in 1965, which measures the number of edits required to convert one string to another.
  • The Levenshtein distance between two strings a and b can be calculated using an indicator function and a mathematical formula.
  • The cost of transformation between two strings is measured by the distance between them, being 0 for no difference and increasing for any differences.
  • The running distance between two strings should be incremented by 1 if their characters at the same index are different.
  • The Levenshtein edit distance can be calculated using an array matrix to track the n^2 possible transformations.
  • The algorithm uses a table to accumulate costs in order to find the minimum cumulative transformation distance between two strings.
  • We need to move from the * indicated cell to the right, adding up the cost of 1 at each intersection as we go, and selecting the minimum of the three options at every step.
  • By comparing the letters e and t first, and noting that e and e have no cost, we can determine the optimal cost of 1 + 1 for the first 2 columns.
  • We arrive at 3 (2 + 1) by s, the bottom-right corner.
  • The 2x2 matrix with values 1 through 4 results in the best score when diagonally mirrored.
  • Our Final Solution has O(m*n) time and space complexity.

This is our final solution.

To visualize the solution and step through the below code, click Visualize the Solution on the right-side menu or the VISUALIZE button in Interactive Mode.

JAVASCRIPT
OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment

You're doing a wonderful job. Keep going!

If you had any problems with this tutorial, check out the main forum thread here.