The NYSIIS phonetic algorithm
nysiis(word, maxCodeLen = 6, modified = FALSE, clean = TRUE)
word | string or vector of strings to encode |
---|---|
maxCodeLen | maximum length of the resulting encodings, in characters |
modified | if |
clean | if |
the NYSIIS encoded character vector
The nysiis
function phentically encodes the given
string using the New York State Identification and Intelligence
System (NYSIIS) algorithm. The algorithm is based on the
implementation provided by Wikipedia and is implemented in pure R
using regular expressions.
The variable maxCodeLen
is the limit on how long the returned
NYSIIS code should be. The default is 6.
The variable modified
directs nysiis
to use the
modified method instead of the original.
The nysiis
algorithm is only defined for inputs over the
standard English alphabet, i.e., "A-Z.". Non-alphabetical
characters are removed from the string in a locale-dependent fashion.
This strips spaces, hyphens, and numbers. Other letters, such as
"Ü," may be permissible in the current locale but are unknown to
nysiis
. For inputs outside of its known range, the output is
undefined and NA
is returned and a warning
this thrown.
If clean
is FALSE
, nysiis
attempts to process the
strings. The default is TRUE
.
James P. Howard, II, "Phonetic Spelling Algorithm Implementations for R," Journal of Statistical Software, vol. 25, no. 8, (2020), p. 1--21, <10.18637/jss.v095.i08>.
Robert L. Taft, Name search techniques, Bureau of Systems Development, Albany, New York, 1970.
Other phonics:
caverphone()
,
cologne()
,
lein()
,
metaphone()
,
mra_encode()
,
onca()
,
phonex()
,
phonics()
,
rogerroot()
,
soundex()
,
statcan()
nysiis("Robert")
#> [1] "RABAD"
nysiis("rupert")
#> [1] "RAPAD"
nysiis(c("Alabama", "Alaska"), modified = TRUE)
#> [1] "ALABAN" "ALASC"
nysiis("mississippi", 4)
#> [1] "MASA"