Soundex in Phonics

Thursday December 24, 2015

•  data science •  linguistics •  mathematics •  Metaphone •  phonetics •  phonics •  R •  scientific computing •  software •  source code •  systems science •  text analysis • 

It’s Christmas Eve, I have about eleventy things I need to get done, so I checked out how the yak was doing. I just pushed a new v0.5.1 of Phonics in R that includes LEIN (implemented months ago) and both Soundex and the Apache refined Soundex algorithms, which I wrote this morning.

LEIN, like NYSIIS and others, are implemented as regular expression replacement series. Soundex and refined Soundex, however, are implemented in C++, which makes them quite fast. I am not terrible impressed with the implementation, but it is correct. The Soundex implementation is very loosely based on the Apache Commons implementation.

Image by Dennis Jarvis / Wikimedia.