Saturday, September 12, 2020

How to find similarity between two strings in Oracle (UTL_MATCH) - Levenshtein Distance & Jaro-Winkler algorithm

 

 

 

 

 

 

                In this post, we have given a basic details about an useful Oracle utility named UTL_MATCH. This utility can be used to find the similarity of two strings with two algorithms given below.

Algorithm

Oracle Utilities

Levenshtein Distance

UTL_MATCH.EDIT_DISTANCE(<string1>,<string2>)
UTL_MATCH.EDIT_DISTANCE_SIMILARITY(<string1>,<string2>)

Jaro-Winkler

UTL_MATCH.JARO_WINKLER(<string1>,<string2>)
UTL_MATCH.JARO_WINKLER_SIMILARITY(<string1>,<string2>)

 

Oracle Document Explanation for quick Byte:

Levenshtein Distance:

"Edit Distance" also known as "Levenshtein Distance "(named after the Russian scientist Vladimir Levenshtein, who devised the algorithm in 1965), is a measure of Similarity between two strings, s1 and s2. The distance is the number of insertions, deletions or substitutions required to transform s1 to s2

Jaro-Winkler algorithm:

The "Jaro-Winkler algorithm" is another way of calculating Edit distance between two strings. This method, developed at the U.S. Census, is a String Comparator measure that gives values of partial agreement between two strings. The string comparator accounts for length of strings and partially accounts for typical human errors made in alphanumeric strings.

Lets do some practical:

Just to understand the functionality of the functions, I tried executing the comparison for below strings.

Scenario

STRING 1

STRING 2

One letter not found in String 2 comparing to String 1

TEST

TAST

Two letter not found in String 2 comparing to String 1

TEST

TT

Matching Strings

TEST

TEST

No characters from first string found in second string

TEST

ABCDFG

 Results: 









 

 

 

 

 

 

Oracle Reference:

https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/u_match.htm

0 Responses to “How to find similarity between two strings in Oracle (UTL_MATCH) - Levenshtein Distance & Jaro-Winkler algorithm”

Post a Comment

Disclaimer

The ideas, thoughts and concepts expressed here are my own. They, in no way reflect those of my employer or any other organization/client that I am associated. The articles presented doesn't imply to any particular organization or client and are meant only for knowledge Sharing purpose. The articles can't be reproduced or copied without the Owner's knowledge or permission.