The Phonex name coding procedure.

phonex(word, maxCodeLen = 4, clean = TRUE)

Arguments

word

string or vector of strings to encode

maxCodeLen

maximum length of the resulting encodings, in characters

clean

if TRUE, return NA for unknown alphabetical characters

Value

the Phonex encoded character vector

Details

The variable word is the name to be encoded. The variable maxCodeLen is the limit on how long the returned name code should be. The default is 4.

The phonex algorithm is only defined for inputs over the standard English alphabet, i.e., "A-Z," "Ä," "Ö," "Ü," and "ß." Non-alphabetical characters are removed from the string in a locale-dependent fashion. This strips spaces, hyphens, and numbers. Other letters, such as "ç," may be permissible in the current locale but are unknown to phonex. For inputs outside of its known range, the output is undefined and NA is returned and a warning this thrown. If clean is FALSE, phonex attempts to process the strings. The default is TRUE.

References

James P. Howard, II, "Phonetic Spelling Algorithm Implementations for R," Journal of Statistical Software, vol. 25, no. 8, (2020), p. 1--21, <10.18637/jss.v095.i08>.

A.J. Lait and Brian Randell. "An assessment of name matching algorithms." Technical Report Series-University of Newcastle Upon Tyne Computing Science (1996).

See also

Other phonics: caverphone(), cologne(), lein(), metaphone(), mra_encode(), nysiis(), onca(), phonics(), rogerroot(), soundex(), statcan()

Examples

phonex("William")
#> [1] "W450"
phonex(c("Peter", "Peady"))
#> [1] "B360" "B300"
phonex("Stevenson", maxCodeLen = 8)
#> [1] "S3152500"