Core Function Levenshtein

From Sputnik Wiki
Jump to: navigation, search
Levenshtein( <str1>, <str2>, <cost_ins>, <cost_rep>, <cost_de> )

Contents

Description

Calculate Levenshtein distance between two strings.

Parameters

string

The input string.

Return Value

This function returns the Levenshtein-Distance between the two argument strings or -1, if one of the argument strings is longer than the limit of 255 characters.

Remarks

The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2.

The complexity of the algorithm is O(m*n), where n and m are the length of str1 and str2 (rather good when compared to similar_text(), which is O(max(n,m)**3), but still expensive).

In its simplest form the function will take only the two strings as parameter and will calculate just the number of insert, replace and delete operations needed to transform str1 into str2.

A second variant will take three additional parameters that define the cost of insert, replace and delete operations. This is more general and adaptive than variant one, but not as efficient.

Example

printf("%s\n", Levenshtein("Dog", "Cat"));
// Prints 3
echo Levenshtein("Hello World","ello World",10,20,30);
// Prints 30

Practical example

// input misspelled word
my $input = 'carrrot';
 
// array of words to check against
my $words  = array('apple','pineapple','banana','orange',
                'radish','carrot','pea','bean','potato');
 
// no shortest distance found, yet
my $shortest = -1;
 
// no closest found, yet
my $closest;
 
// loop through words to find the closest
foreach ($words as $word) 
{
    // calculate the distance between the input word,
    // and the current word
    my $lev = levenshtein($input, $word);
 
    // check for an exact match
    if ($lev == 0) 
    {
        // closest word is this one (exact match)
        $closest = $word;
        $shortest = 0;
        // break out of the loop; we've found an exact match
        break;
    }
 
    // if this distance is less than the next found shortest
    // distance, OR if a next shortest word has not yet been found
    if ($lev <= $shortest || $shortest < 0)
    {
        // set the closest match, and shortest distance
        $closest  = $word;
        $shortest = $lev;
    }
}
 
echo "Input word: $input\n";
if ($shortest == 0)
    echo "Exact match found: $closest\n";
else 
    echo "Did you mean: $closest?\n";
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox