-
Notifications
You must be signed in to change notification settings - Fork 24
Add pipeline value operations tests #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Clip is more appropriate as there is an equivalent dask/numpy/xarray function. Additionally, no normalisation is occuring.
This aligns xarray FillNan with numpy FillNan
Clip is more appropriate as there is an equivalent dask/numpy/xarray operation.
when creating a new variable with Derive, the new variable was being added to the input dataset. This commit fixes that by making a shallow copy of the input and adding the variables to that shallow copy before returning it.
Clip is more appropriate as there is an equivalent dask/numpy/xarray operation.
Pull Request Test Coverage Report for Build 20835553598Details
💛 - Coveralls |
|
Thanks for the changes. Just had a minor comment (see above). Otherwise, it looks good. appreciate the comments around deep copy and what they affect. Given that lot of applications of the pipeline can be memory hungry |
|
|
||
|
|
||
| class ForceNormalised(DaskOperation): | ||
| class Clip(DaskOperation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this change, and there are no issues with dependencies on it at the moment. However, the docs will need to be updated (e.g. docs/api/pipeline/pipeline_api.md) and the notebooks (I think maybe only docs/notebooks/pipeline/Operations.ipynb).
| def apply_func(self, sample: T) -> T: | ||
| return sample.fillna(self.nan) | ||
|
|
||
| if not (isinstance(sample, xr.DataArray) or isinstance(sample, xr.Dataset)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have a general xarray subclass of operation which handles xarray type checking rather than doing it down in apply_func... this is okay, but it might be better to put it up higher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did have a similar review comment, but I didn't mention it because I wasn't familiar enough with the codebase to suggest where an ideal place would be, and how much impact putting it at a higher level would have on everything else.
I guess (in hindsight) those would be the considerations, and also if it needs to be in a separate issue. But yes, it does seem like a very common "entrypoint" check that applies to many things.
|
|
||
| def apply_func(self, sample: T) -> T: | ||
| return sample.fillna(self.nan) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reading the docstring above (it won't let me comment on unchanged lines), it says "If no value is passed then positive infinity values will be replaced with a very large number. Defaults to None.". But None is not a very large number. So by default that comment doesn't parse for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

The tests covers all the operation classes in
operations/{dask,numpy,xarray}/values.py. Some important changes:NotImplementedError, and just needed to add a missing arg inda.nan_to_numto make it work.Replaceclass instead ofreplace_valueclass frompyearthtool.data.transforms.maskasreplace_valuedoesn't exist anymore.xarray.DerivewhereDerive.apply_funcwas also updating the input dataset.pyearthtools.data.transforms.deriveby replacing the deprecatedDataset.dropwithDataset.drop_vars