Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions _episodes/02-numpy.md
Original file line number Diff line number Diff line change
Expand Up @@ -429,20 +429,20 @@ Wave height standard deviation: 1.1440155050316319
> for example: `help(numpy.cumprod)`.
{: .callout}

> ## What about NaNs?
> ## What about `nan`s?
>
> In real datasets, particularly ones which come from observational data, it's quite common
> for some values to be missing. There are various strategies to deal with missing values; one of which is to
> give them a value that would be clearly wrong (e.g. -1 for a temperature column with units in
> Kelvin, or 999 for a missing latitude or longitude value). However, the issue with this is that
> we would need to check for these values before calculating any summary statistic.
>
> Instead, we can use NumPy's `NaN` ("not a number") value, which will tell NumPy that these are
> values that need to be dealt with in a special manner. NumPy also provides various functions to help deal with NaNs.
> However, we can't use NumPy's normal statistical functions on any array that contains a NaN, as this returns a NaN:
> Instead, we can use NumPy's `nan` ("not a number") value, which will tell NumPy that these are
> values that need to be dealt with in a special manner. NumPy also provides various functions to help deal with nans.
> However, we can't use NumPy's normal statistical functions on any array that contains a nan, as this returns a nan:
>
> ~~~
> data = numpy.array([[1,2,3],[1,numpy.NaN,3],[1,2,3]])
> data = numpy.array([[1,2,3],[1,numpy.nan,3],[1,2,3]])
> numpy.mean(data)
> ~~~
> {: .language-python}
Expand All @@ -455,7 +455,7 @@ Wave height standard deviation: 1.1440155050316319
> Instead, we need to use the NumPy function `nanmean`:
>
> ~~~
> data = numpy.array([[1,2,3],[1,numpy.NaN,3],[1,2,3]])
> data = numpy.array([[1,2,3],[1,numpy.nan,3],[1,2,3]])
> numpy.nanmean(data)
> ~~~
> {: .language-python}
Expand All @@ -465,7 +465,7 @@ Wave height standard deviation: 1.1440155050316319
> ~~~
> {: .output}
>
> If, at a later date, we'd like to replace all the NaNs with a sensible numerical value
> If, at a later date, we'd like to replace all the `nan`s with a sensible numerical value
> (e.g. the mean of the column), NumPy also provides functions that can help with this
{: .callout}

Expand Down