Metaphone in R20 Sep 2015
I was working on a data merge this weekend with some county-level data. This has to do with the NFIP. But one of the datasets did not include FIPS codes; it was just county names. Well, there are plenty of rational ways one could deal with this. But I saw before me a glorious yak who desperately needed a shave. So I did the obvious thing and decided to Metaphone the county names to ensure normalization.
This, of course, required writing an implementation of Metaphone in R. And before that, I had decided on what Metaphone. First, Metaphone is a family of three algorithms. First is the original Metaphone, which is widely regarded as flawed and still one of the best options for phonetic spelling. Second is Double Metaphone which produces two different encodings for the same sound. Third, is Metaphone3, which is patented and therefore essentially unusable for a few more years.
The package is available for download from GitHub and I have set up a dedicated page here.
Over the next few weeks, I hope to add other phonetic algorithms. Of course, contributions are welcome.
Image by Arian Zwegers / Flickr.