R has two different ways of representing missing data and understanding each is important for the user.
NaN means “not a number” and it means there is a result, but it cannot be represented in the computer. The second,
NA, explains that the data is just missing for unknown reasons. These appear at different times when working with R and each has different implications.
NaN is distinct from
NaN implies a result that cannot be calculated for whatever reason, or is not a floating point number. Some calculations that lead to
NaN, other than , are attempting to take a square root of a negative number, or perform calculations with infinities that lead to undefined results:
In sqrt(-1) : NaNs produced
Inf – Inf
However, adding two infinities produces infinity:
Inf + Inf
NA is different from
NaN in that
NA is not a part of the IEEE standard for floating point numbers.
NA is a construction of R used to represent a value that is not known, as a placeholder.
NA says no result was available or the result is missing. It can be used in a matrix to fill in a value of a vector:
c(1, 2, 3, 4, NA, 5, 6)
 1 2 3 4 NA 5 6
matrix(c(1, 2, NA, 4, NA, 6, NA, 8, 9), 3)
[,1] [,2] [,3]
[1,] 1 4 NA
[2,] 2 NA 8
[3,] NA 6 9
Any operation with
NA results in
1 + NA
NA + NaN
Update: David Smith at Microsoft covered NAs in a blog post today, too.
Image by Torindkflt / Wikimedia Commons.