The Caverphone family of phonetic algorithms
caverphone(word, maxCodeLen = NULL, modified = FALSE, clean = TRUE)
word | string or vector of strings to encode |
---|---|
maxCodeLen | maximum length of the resulting encodings, in characters |
modified | if |
clean | if |
the Caverphone encoded character vector
The variable maxCodeLen
is the limit on how long the returned
Caverphone code should be. The default is 6, unless modified
is set to TRUE
, then the default is 10.
The variable modified
directs caverphone
to use the
Caverphone2 method, instead of the original.
The caverphone
algorithm is only defined for inputs over the
standard English alphabet, i.e., "A-Z.". Non-alphabetical
characters are removed from the string in a locale-dependent fashion.
This strips spaces, hyphens, and numbers. Other letters, such as
"Ü," may be permissible in the current locale but are unknown to
caverphone
. For inputs outside of its known range, the output is
undefined and NA
is returned and a warning
this thrown.
If clean
is FALSE
, caverphone
attempts to process the
strings. The default is TRUE
.
David Hood, "Caverphone: Phonetic matching algorithm," Technical Paper CTP060902, University of Otago, New Zealand, 2002.
David Hood, "Caverphone Revisited," Technical Paper CTP150804 University of Otago, New Zealand, 2004.
James P. Howard, II, "Phonetic Spelling Algorithm Implementations for R," Journal of Statistical Software, vol. 25, no. 8, (2020), p. 1--21, <10.18637/jss.v095.i08>.
Other phonics:
cologne()
,
lein()
,
metaphone()
,
mra_encode()
,
nysiis()
,
onca()
,
phonex()
,
phonics()
,
rogerroot()
,
soundex()
,
statcan()
caverphone("William")
#> [1] "WLM111"
caverphone(c("Peter", "Peady"), modified = TRUE)
#> [1] "PTA1111111" "PTA1111111"
caverphone("Stevenson", maxCodeLen = 4)
#> [1] "STFN"