Sound Ex

A phonetic-based string matching algorithm developed by telephone companies before the invention of electronic computers to help find names and addresses. Each word is reduced to a letter followed by three numerals. Vowels are ignored and consonants are grouped into related sounds. Although not a perfect algorithm, it has proven to stand the test of time and is relatively easy to implement. In other words, its effectiveness divided by complexity ratio is remarkably high. (Fancier versions with more "parts and dials" do exist.)

Here is some PerlLanguage code to return the Soundex code for a given word.

 # return the soundex code of a word
 sub soundex {
   my $theword = shift;
   my $result  = '0000';

$theword =~ s/\s//g; # trim whitespace return $result unless length $theword > 0;

$theword = lc $theword; my @chars = $theword =~ /./g; # split word into individual chars $result = shift @chars; # result starts with first letter

# convert remaining chars to numbers & append to result my $prev = ''; foreach my $thechar (@chars) { my $code; SWITCH: { if ('bfpv' =~ /$thechar/) { $code = '1'; last; } if ('cgjkqsxz' =~ /$thechar/) { $code = '2'; last; } if ('dt' =~ /$thechar/) { $code = '3'; last; } if ('l' eq $thechar ) { $code = '4'; last; } if ('mn' =~ /$thechar/) { $code = '5'; last; } if ('r' eq $thechar ) { $code = '6'; last; } $code = ''; # default when $thechar is vowel-like } $result .= $code unless $code eq $prev; # append to result, ignoring dups $prev = $code; }

$result .= '0000'; # pad to at least four symbols return substr($result, 0, 4); # return only first four symbols }
The Soundex algorithm works best when it is tweaked to the language (and surnames) in use.


See also EveryWordCanBeAbbreviatedToFourLetters


CategoryAlgorithm


EditText of this page (last edited June 16, 2009) or FindPage with title or text search