The Caverphone family of phonetic algorithms

caverphone(word, maxCodeLen = NULL, modified = FALSE, clean = TRUE)

Arguments

word

string or vector of strings to encode

maxCodeLen

maximum length of the resulting encodings, in characters

modified

if TRUE, use the Caverphone 2 algorithm

clean

if TRUE, return NA for unknown alphabetical characters

Value

the Caverphone encoded character vector

Details

The variable maxCodeLen is the limit on how long the returned Caverphone code should be. The default is 6, unless modified is set to TRUE, then the default is 10.

The variable modified directs caverphone to use the Caverphone2 method, instead of the original.

The caverphone algorithm is only defined for inputs over the standard English alphabet, i.e., "A-Z.". Non-alphabetical characters are removed from the string in a locale-dependent fashion. This strips spaces, hyphens, and numbers. Other letters, such as "Ü," may be permissible in the current locale but are unknown to caverphone. For inputs outside of its known range, the output is undefined and NA is returned and a warning this thrown. If clean is FALSE, caverphone attempts to process the strings. The default is TRUE.

References

David Hood, "Caverphone: Phonetic matching algorithm," Technical Paper CTP060902, University of Otago, New Zealand, 2002.

David Hood, "Caverphone Revisited," Technical Paper CTP150804 University of Otago, New Zealand, 2004.

James P. Howard, II, "Phonetic Spelling Algorithm Implementations for R," Journal of Statistical Software, vol. 25, no. 8, (2020), p. 1--21, <10.18637/jss.v095.i08>.

See also

Other phonics: cologne(), lein(), metaphone(), mra_encode(), nysiis(), onca(), phonex(), phonics(), rogerroot(), soundex(), statcan()

Examples

caverphone("William")
#> [1] "WLM111"
caverphone(c("Peter", "Peady"), modified = TRUE)
#> [1] "PTA1111111" "PTA1111111"
caverphone("Stevenson", maxCodeLen = 4)
#> [1] "STFN"