The phonics package for R is designed to provide a variety of phonetic indexing algorithms in common and not-so-common use today. The algorithms generally reduce a string to a symbolic representation approximating the sound made by pronouncing the string. They can be used to match names, strings, and as a proxy for assorted string distance algorithms. The algorithm reduces a string to a symbolic representation approximating the sound. It can be used to match names, strings, and as a proxy for assorted string distance algorithms.

phonics(word, method, clean = TRUE)

Arguments

word

string or vector of strings to encode

method

vector of method names to use

clean

if TRUE, return NA for unknown alphabetical characters

Value

Returns a data frame containing the phonetic spellings of the input for each method applied.

Details

The phonics package for R is designed to provide a variety of phonetic indexing algorithms in common and not-so-common use today. The algorithms generally reduce a string to a symbolic representation approximating the sound made by pronouncing the string. They can be used to match names, strings, and as a proxy for assorted string distance algorithms. The algorithm reduces a string to a symbolic representation approximating the sound. It can be used to match names, strings, and as a proxy for assorted string distance algorithms.

The variable word is a character string or a vector of character strings to be encoded.

Different phonetic algorithm are only defined for inputs over the limited alphabets, Non-alphabetical characters are removed from the string in a locale-dependent fashion. This strips spaces, hyphens, and numbers. For inputs outside of its known range, the output is undefined and NA is returned and a warning this thrown. If clean is FALSE, phonics attempts to process the strings. The default is TRUE.

The method parameter should be a character vector containing one or more methods that should be used. The available list of methods is "caverphone", "caverphone.modified", "cologne", "lein", "metaphone", "nysiis", "nysiis.modified", "onca", "onca.modified", "onca.refined", "onca.modified.refined", "phonex", "rogerroot", "soundex", "soundex.refined", and "statcan".

References

James P. Howard, II, "Phonetic Spelling Algorithm Implementations for R," Journal of Statistical Software, vol. 25, no. 8, (2020), p. 1--21, <10.18637/jss.v095.i08>.

See also

Other phonics: caverphone(), cologne(), lein(), metaphone(), mra_encode(), nysiis(), onca(), phonex(), rogerroot(), soundex(), statcan()

Examples

phonics(c("Peter", "Peady"), c("soundex", "soundex.refined"))
#>    word soundex soundex.refined
#> 1 Peter    P360          P10609
#> 2 Peady    P300           P1060