Home > DBMS_PACKEGE, UTL_MATCH > utl_match package for string matching

utl_match package for string matching

There are four functions included in the package utl_match use different methods to compare a source string and destination string, and return an assessment of what it would take to turn the source string into the destination string.

Source: $ORACLE_HOME/rdbms/admin/utlmatch.sql

Oracle added the UTL_MATCH package in Version 10gRelease 2 to compare strings. The four functions included in the package use different methods to compare a source string and destination string, and return an assessment of what it would take to turn the source into the destination string. The functions are broken down into two categories. The categories are actually the algorithms employed to analyze the strings.

Levenshtein Distance

The Levenshtein Distance (LD) algorithm, commonly called the Edit Distance (ED) algorithm, is the older of the two supported methods. It measures the distance between the source and destination strings. By distance, we’re referring to the number of changes required to turn the source string into the destination string.

Jaro-Winkler

The Jaro-Winkler algorithm is the second category of algorithms used in UTL_ MATCH. These functions take the same two arguments, but instead of simply calculating the number of steps required to change the source string to the destination string, it determines how closely the two strings agree with each other. The algorithm also tries to take into account the possibility of a data entry error when determining similarity.

Procedures/Functions

edit_distance

Returns the number of changes required to turn the source string into the destination string using the Levenshtein Distance algorithm.

function edit_distance returns binary_integer (
 s1    in        varchar2,
 s2    in        varchar2       
);

edit_distance_similarity

Returns an integer between 0 and 100, where 0 indicates no similarity at all and 100 indicates a perfect match.

function edit_distance_similarity returns binary_integer (
 s1    in        varchar2,
 s2    in        varchar2       
);

jaro_winkler

Instead of simply calculating the number of steps required to change the source string to the destination string returns similarity based on Jaro-Winkler distance algorithm, determines how closely the two strings agree with each other and tries to take into account the possibility of a data entry error.

function jaro_winkler returns binary_double (
 s1    in       varchar2,
 s2    in       varchar2       
);

jaro_winkler_similarity

Returns an integer between 0 and 100, where 0 indicates no similarity at all and 100 indicates a perfect match but tries to take into account possible data entry errors.

function jaro_winkler_similarity returns binary_integer (
 s1    in                 varchar2,
 s2    in                 varchar2       
);

Example 1:

SQL> SELECT UTL_MATCH.EDIT_DISTANCE('espresso', 'expresso')
  2    AS DISTANCE
  3  FROM DUAL;

  DISTANCE
----------
         1

SQL> SELECT UTL_MATCH.EDIT_DISTANCE_SIMILARITY('espresso', 'expresso')
  2    AS DISTANCE
  3  FROM DUAL;

  DISTANCE
----------
        88

SQL> SELECT UTL_MATCH.JARO_WINKLER('espresso', 'expresso')
  2    AS DISTANCE
  3  FROM DUAL;

  DISTANCE
----------
 9.25E-001

SQL> SELECT UTL_MATCH.JARO_WINKLER_SIMILARITY ('espresso', 'expresso')
  2    AS DISTANCE
  3  FROM DUAL;

  DISTANCE
----------
        92

Example 2:

CREATE TABLE TEST
( 
        test_id        number primary key, 
        test_name      varchar2(20), 
        test_date      date
);  

INSERT INTO TEST values(1, 'Sony CD Player', '20-FEB-2010');
INSERT INTO TEST values(2, 'Sony CD Player', '24-FEB-2010');
INSERT INTO TEST values(3, 'Pioneer DVD Player', '25-FEB-2010');
INSERT INTO TEST values(4, 'Sony CD Player', '25-FEB-2010');
INSERT INTO TEST values(5, 'Bose Speaker', '22-FEB-2010');
INSERT INTO TEST values(6, 'Tascam CD Burner', '25-FEB-2010');
INSERT INTO TEST values(7, 'Nikon digital camera', '22-FEB-2010');
INSERT INTO TEST values(8, 'Canon digital camera', '26-FEB-2010');

Commit;

Select TEST_ID,TEST_NAME,TEST_DATE
FROM TEST
WHERE  UTL_MATCH.JARO_WINKLER_SIMILARITY(test_name,'dogotal') > 60;

TEST_ID                TEST_NAME            TEST_DATE
---------------------- -------------------- -------------------------
7                      Nikon digital camera 22-FEB-00
8                      Canon digital camera 26-FEB-00

Advertisements
Categories: DBMS_PACKEGE, UTL_MATCH Tags:
  1. January 25, 2011 at 11:11 am

    ;** I am very thankful to this topic because it really gives up to date information “;,

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: