Skip to content

Conversation

@edoyango
Copy link
Collaborator

@edoyango edoyango commented Jan 8, 2026

The tests covers all the operation classes in operations/{dask,numpy,xarray}/values.py. Some important changes:

  • refactored xarray fillna to behave like numpy fillna
    • before, xarray fillna only filled nans, whereas numpy fillna also filled in positive and negative infinities.
  • completed the implementation of dask fillna
    • This was raising a NotImplementedError, and just needed to add a missing arg in da.nan_to_num to make it work.
  • use the Replace class instead of replace_value class from pyearthtool.data.transforms.mask as replace_value doesn't exist anymore.
  • Renamed ForceNormalised operation to Clip since that matches the functions in dask/numpy/xarray
    • also refactored the classes to use the corresponding clip function
    • This is probably the most opinionated change here, happy to change back if it's a problem.
  • fixed a problem in xarray.Derive where Derive.apply_func was also updating the input dataset.
  • fixed a warning in pyearthtools.data.transforms.derive by replacing the deprecated Dataset.drop with Dataset.drop_vars

Clip is more appropriate as there is an equivalent dask/numpy/xarray
function. Additionally, no normalisation is occuring.
This aligns xarray FillNan with numpy FillNan
Clip is more appropriate as there is an equivalent dask/numpy/xarray
operation.
when creating a new variable with Derive, the new variable
was being added to the input dataset. This commit fixes
that by making a shallow copy of the input and adding the
variables to that shallow copy before returning it.
Clip is more appropriate as there is an equivalent dask/numpy/xarray
operation.
@coveralls
Copy link

coveralls commented Jan 8, 2026

Pull Request Test Coverage Report for Build 20835553598

Details

  • 151 of 152 (99.34%) changed or added relevant lines in 7 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.9%) to 65.91%

Changes Missing Coverage Covered Lines Changed/Added Lines %
packages/data/src/pyearthtools/data/transforms/derive.py 7 8 87.5%
Totals Coverage Status
Change from base Build 20305027296: 0.9%
Covered Lines: 10711
Relevant Lines: 15893

💛 - Coveralls

@nikeethr
Copy link
Collaborator

nikeethr commented Jan 12, 2026

Thanks for the changes. Just had a minor comment (see above). Otherwise, it looks good.

appreciate the comments around deep copy and what they affect. Given that lot of applications of the pipeline can be memory hungry



class ForceNormalised(DaskOperation):
class Clip(DaskOperation):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change, and there are no issues with dependencies on it at the moment. However, the docs will need to be updated (e.g. docs/api/pipeline/pipeline_api.md) and the notebooks (I think maybe only docs/notebooks/pipeline/Operations.ipynb).

def apply_func(self, sample: T) -> T:
return sample.fillna(self.nan)

if not (isinstance(sample, xr.DataArray) or isinstance(sample, xr.Dataset)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have a general xarray subclass of operation which handles xarray type checking rather than doing it down in apply_func... this is okay, but it might be better to put it up higher.

Copy link
Collaborator

@nikeethr nikeethr Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have a similar review comment, but I didn't mention it because I wasn't familiar enough with the codebase to suggest where an ideal place would be, and how much impact putting it at a higher level would have on everything else.

I guess (in hindsight) those would be the considerations, and also if it needs to be in a separate issue. But yes, it does seem like a very common "entrypoint" check that applies to many things.


def apply_func(self, sample: T) -> T:
return sample.fillna(self.nan)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading the docstring above (it won't let me comment on unchanged lines), it says "If no value is passed then positive infinity values will be replaced with a very large number. Defaults to None.". But None is not a very large number. So by default that comment doesn't parse for me.

Copy link
Collaborator

@nikeethr nikeethr Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, I'm guilty of commenting on unchanged code a lot. I usually just hover over the vertical separator until it gives me a blue "+" - maybe different color/style based on your theme. See:

image

Hopefully that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants