The Cologne phonetic name coding procedure.
cologne(word, maxCodeLen = NULL, clean = TRUE)
word | string or vector of strings to encode |
---|---|
maxCodeLen | maximum length of the resulting encodings, in characters |
clean | if |
the Cologne encoded character vector
The variable word
is the name to be encoded. The variable
maxCodeLen
is the limit on how long the returned name code
should be. The default is 4.
The cologne
algorithm is only defined for inputs over the
standard English alphabet, i.e., "A-Z," "Ä," "Ö," "Ü," and
"ß." Non-alphabetical characters are removed from the string in a
locale-dependent fashion. This strips spaces, hyphens, and numbers.
Other letters, such as "ç," may be permissible in the current locale
but are unknown to cologne
. For inputs outside of its known
range, the output is undefined and NA
is returned and a
warning
this thrown. If clean
is FALSE
,
cologne
attempts to process the strings. The default is
TRUE
.
James P. Howard, II, "Phonetic Spelling Algorithm Implementations for R," Journal of Statistical Software, vol. 25, no. 8, (2020), p. 1--21, <10.18637/jss.v095.i08>.
Hans Joachim Postel. "Die Koelner Phonetik. Ein Verfahren zur Identifizierung von Personennamen auf der Grundlage der Gestaltanalyse." IBM-Nachrichten 19. Jahrgang, 1969, p. 925-931.
Other phonics:
caverphone()
,
lein()
,
metaphone()
,
mra_encode()
,
nysiis()
,
onca()
,
phonex()
,
phonics()
,
rogerroot()
,
soundex()
,
statcan()
cologne("William")
#> [1] "356"
cologne(c("Peter", "Peady"))
#> [1] "127" "12"
cologne("Stevenson", maxCodeLen = 8)
#> [1] "823686"