Skip this screen if you don't care about the math and want to know how the implementation works.
The mathematical definition is as follows: the Levenshtein distance between two strings a
and b
can be calculated by:
This makes sense, we're inputting a
and b
and are arriving an outcome after processing the two values somehow.
So how do we process them?
By using an indicator function
, which is a defined as a function defined on a set X
that indicates membership of an element in a subset A
of X
. This is represented by a 1
in all elements of A
where there's a membership and a 0
for all elements of X
not in A
.
So the following:
Is the indicator function
equal to 0
when ai
= bj
(when the letters are the same in the same position) and equal to 1
otherwise (if they differ).
To calculate the actual distance, we use:
The above is the distance between the first i
characters of a
and the first j
characters of b
.
Confused? Don't worry, we'll move beyond the math in the next section.