
NaN versus NA in R
18 Jul 2016R has two different ways of representing missing data and understanding
each is important for the user. NaN
means “not a number” and it
means there is a result, but it cannot be represented in the computer.
The second, NA
, explains that the data is just missing for unknown
reasons. These appear at different times when working with R and
each has different implications.
NaN
is distinct from NA
. NaN
implies a result that cannot
be calculated for whatever reason, or is not a floating point number.
Some calculations that lead to NaN
, other than [latex]0 / 0[/latex],
are attempting to take a square root of a negative number, or perform
calculations with infinities that lead to undefined results:
> sqrt(-1)
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced
> Inf - Inf
[1] NaN
However, adding two infinities produces infinity:
> Inf + Inf
[1] Inf
NA
is different from NaN
in that NA
is not a part of the IEEE
standard for floating point numbers. NA
is a construction of R
used to represent a value that is not known, as a placeholder. NA
says no result was available or the result is missing. It can be
used in a matrix to fill in a value of a vector:
> c(1, 2, 3, 4, NA, 5, 6)
[1] 1 2 3 4 NA 5 6
> matrix(c(1, 2, NA, 4, NA, 6, NA, 8, 9), 3)
[,1] [,2] [,3]
[1,] 1 4 NA
[2,] 2 NA 8
[3,] NA 6 9
Any operation with NA
results in NA
:
> 1 + NA
[1] NA
> sqrt(NA)
[1] NA
> NA + NaN
[1] NA
Update: David Smith at Microsoft covered NAs in a blog post today, too.
Image by Torindkflt / Wikimedia Commons.