Add pipeline value operations tests #234

edoyango · 2026-01-08T23:32:31Z

The tests covers all the operation classes in operations/{dask,numpy,xarray}/values.py. Some important changes:

refactored xarray fillna to behave like numpy fillna
- before, xarray fillna only filled nans, whereas numpy fillna also filled in positive and negative infinities.
completed the implementation of dask fillna
- This was raising a NotImplementedError, and just needed to add a missing arg in da.nan_to_num to make it work.
use the Replace class instead of replace_value class from pyearthtool.data.transforms.mask as replace_value doesn't exist anymore.
Renamed ForceNormalised operation to Clip since that matches the functions in dask/numpy/xarray
- also refactored the classes to use the corresponding clip function
- This is probably the most opinionated change here, happy to change back if it's a problem.
fixed a problem in xarray.Derive where Derive.apply_func was also updating the input dataset.
fixed a warning in pyearthtools.data.transforms.derive by replacing the deprecated Dataset.drop with Dataset.drop_vars

Clip is more appropriate as there is an equivalent dask/numpy/xarray function. Additionally, no normalisation is occuring.

This aligns xarray FillNan with numpy FillNan

Clip is more appropriate as there is an equivalent dask/numpy/xarray operation.

when creating a new variable with Derive, the new variable was being added to the input dataset. This commit fixes that by making a shallow copy of the input and adding the variables to that shallow copy before returning it.

Clip is more appropriate as there is an equivalent dask/numpy/xarray operation.

coveralls · 2026-01-08T23:40:18Z

Pull Request Test Coverage Report for Build 20835553598

Details

151 of 152 (99.34%) changed or added relevant lines in 7 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.9%) to 65.91%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
packages/data/src/pyearthtools/data/transforms/derive.py	7	8	87.5%

Totals
Change from base Build 20305027296:	0.9%
Covered Lines:	10711
Relevant Lines:	15893

💛 - Coveralls

packages/pipeline/src/pyearthtools/pipeline/operations/dask/values.py

nikeethr · 2026-01-12T22:07:31Z

Thanks for the changes. Just had a minor comment (see above). Otherwise, it looks good.

appreciate the comments around deep copy and what they affect. Given that lot of applications of the pipeline can be memory hungry

tennlee · 2026-01-16T07:44:25Z

packages/pipeline/src/pyearthtools/pipeline/operations/dask/values.py



-class ForceNormalised(DaskOperation):
+class Clip(DaskOperation):


I'm fine with this change, and there are no issues with dependencies on it at the moment. However, the docs will need to be updated (e.g. docs/api/pipeline/pipeline_api.md) and the notebooks (I think maybe only docs/notebooks/pipeline/Operations.ipynb).

tennlee · 2026-01-16T07:45:54Z

packages/pipeline/src/pyearthtools/pipeline/operations/xarray/values.py

    def apply_func(self, sample: T) -> T:
-        return sample.fillna(self.nan)
+
+        if not (isinstance(sample, xr.DataArray) or isinstance(sample, xr.Dataset)):


I wonder if we should have a general xarray subclass of operation which handles xarray type checking rather than doing it down in apply_func... this is okay, but it might be better to put it up higher.

I did have a similar review comment, but I didn't mention it because I wasn't familiar enough with the codebase to suggest where an ideal place would be, and how much impact putting it at a higher level would have on everything else.

I guess (in hindsight) those would be the considerations, and also if it needs to be in a separate issue. But yes, it does seem like a very common "entrypoint" check that applies to many things.

tennlee · 2026-01-16T07:49:20Z

packages/pipeline/src/pyearthtools/pipeline/operations/xarray/values.py


    def apply_func(self, sample: T) -> T:
-        return sample.fillna(self.nan)
+


Just reading the docstring above (it won't let me comment on unchanged lines), it says "If no value is passed then positive infinity values will be replaced with a very large number. Defaults to None.". But None is not a very large number. So by default that comment doesn't parse for me.

weird, I'm guilty of commenting on unchanged code a lot. I usually just hover over the vertical separator until it gives me a blue "+" - maybe different color/style based on your theme. See:

Hopefully that helps.

edoyango added 12 commits January 9, 2026 10:32

use Replace class instead of replace_values

9157a6a

rename numpy ForceNormalised to Clip

f22c67a

Clip is more appropriate as there is an equivalent dask/numpy/xarray function. Additionally, no normalisation is occuring.

add numpy values operations tests

23f91f1

add pos/neg inf replacement in xarray FillNan

c1a6a9c

This aligns xarray FillNan with numpy FillNan

add xarray MaskValue op tests

e4cceb6

rename xarray ForceNormalised to Clip

578b9fd

Clip is more appropriate as there is an equivalent dask/numpy/xarray operation.

prevent derive from adding vars to input

5ef7a6e

when creating a new variable with Derive, the new variable was being added to the input dataset. This commit fixes that by making a shallow copy of the input and adding the variables to that shallow copy before returning it.

add test for xarray derive op

eba6df0

complete implementation of dask FillNan

f8f50c2

add dask maskvalue tests

085abea

rename dask ForceNormalised to Clip

898786c

Clip is more appropriate as there is an equivalent dask/numpy/xarray operation.

use xr drop_vars instead of depracated drop

241c73b

edoyango force-pushed the values-tests branch from 3149c8f to 241c73b Compare January 8, 2026 23:33

nikeethr reviewed Jan 12, 2026

View reviewed changes

packages/pipeline/src/pyearthtools/pipeline/operations/dask/values.py Show resolved Hide resolved

tennlee reviewed Jan 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add pipeline value operations tests #234

Add pipeline value operations tests #234

Uh oh!

edoyango commented Jan 8, 2026

Uh oh!

coveralls commented Jan 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

nikeethr commented Jan 12, 2026 •

edited

Loading

Uh oh!

tennlee Jan 16, 2026

Uh oh!

tennlee Jan 16, 2026

Uh oh!

nikeethr Jan 16, 2026 •

edited

Loading

Uh oh!

tennlee Jan 16, 2026

Uh oh!

nikeethr Jan 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		class ForceNormalised(DaskOperation):
		class Clip(DaskOperation):


		def apply_func(self, sample: T) -> T:
		return sample.fillna(self.nan)

Add pipeline value operations tests #234

Are you sure you want to change the base?

Add pipeline value operations tests #234

Uh oh!

Conversation

edoyango commented Jan 8, 2026

Uh oh!

coveralls commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 20835553598

Details

💛 - Coveralls

Uh oh!

Uh oh!

nikeethr commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tennlee Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

tennlee Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

nikeethr Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tennlee Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

nikeethr Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coveralls commented Jan 8, 2026 •

edited

Loading

nikeethr commented Jan 12, 2026 •

edited

Loading

nikeethr Jan 16, 2026 •

edited

Loading

nikeethr Jan 16, 2026 •

edited

Loading