Soundex in Phonics

It’s Christmas Eve, I have about eleventy things I need to get done, so I checked out how the yak was doing. I just pushed a new v0.5.1 of Phonics in R that includes LEIN (implemented months ago) and both Soundex and the Apache refined Soundex algorithms, which I wrote this morning.

LEIN, like NYSIIS and others, are implemented as regular expression replacement series. Soundex and refined Soundex, however, are implemented in C++, which makes them quite fast. I am not terrible impressed with the implementation, but it is correct. The Soundex implementation is very loosely based on the Apache Commons implementation.

Image by Dennis Jarvis / Wikimedia.