Saturday, September 12, 2020
How to find similarity between two strings in Oracle (UTL_MATCH) - Levenshtein Distance & Jaro-Winkler algorithm
In this post, we have given a basic details about an useful Oracle utility named UTL_MATCH. This utility can be used to find the similarity of two strings with two algorithms given below.
Algorithm |
Oracle Utilities |
Levenshtein Distance |
UTL_MATCH.EDIT_DISTANCE(<string1>,<string2>) |
Jaro-Winkler |
UTL_MATCH.JARO_WINKLER(<string1>,<string2>) |
Oracle Document Explanation for quick Byte:
Levenshtein Distance:
"Edit Distance" also known as "Levenshtein Distance "(named after the Russian scientist Vladimir Levenshtein, who devised the algorithm in 1965), is a measure of Similarity between two strings, s1 and s2. The distance is the number of insertions, deletions or substitutions required to transform s1 to s2
Jaro-Winkler algorithm:
The "Jaro-Winkler algorithm" is another way of calculating Edit distance between two strings. This method, developed at the U.S. Census, is a String Comparator measure that gives values of partial agreement between two strings. The string comparator accounts for length of strings and partially accounts for typical human errors made in alphanumeric strings.
Lets do some practical:
Just to understand the functionality of the functions, I tried executing the comparison for below strings.
Scenario |
STRING 1 |
STRING 2 |
One letter not found in String 2 comparing to String 1 |
TEST |
TAST |
Two letter not found in String 2 comparing to String 1 |
TEST |
TT |
Matching Strings |
TEST |
TEST |
No characters from first string found in second string |
TEST |
ABCDFG |
Results:
Oracle Reference:
https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/u_match.htm
0 Responses to “How to find similarity between two strings in Oracle (UTL_MATCH) - Levenshtein Distance & Jaro-Winkler algorithm”
Post a Comment