Use UVParameter forms in UVData.__add__ and UVData.fast_concat to prevent missing parameters.#1606
Use UVParameter forms in UVData.__add__ and UVData.fast_concat to prevent missing parameters.#1606
Conversation
e0ef552 to
829fab9
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1606 +/- ##
==========================================
- Coverage 99.93% 99.93% -0.01%
==========================================
Files 67 67
Lines 22688 22670 -18
==========================================
- Hits 22674 22656 -18
Misses 14 14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@kartographer this is a slightly different approach than you took in the |
f6b1427 to
33b50d9
Compare
89f562f to
c604805
Compare
945b3a6 to
34449d2
Compare
a1f87dc to
26ed06f
Compare
steven-murray
left a comment
There was a problem hiding this comment.
Thanks @bhazelton and sorry for the very slow review! I have a few comments. Mostly I think we can make some of these functions a bit more clear in what they are doing. They seem like they will become quite central in how UVxxx objects operate, so we should make them as clear as possible for our own future reference.
It might also be useful to do a quick profiling for these functions, as I expect they might be bottleneck functions in some applications (especially in terms of memory)
src/pyuvdata/uvbase.py
Outdated
|
|
||
| def _get_param_axis(self, axis_name: str, single_named_axis: bool = False): | ||
| """ | ||
| Get a mapping of parameters that have a given axis to the axis number. |
There was a problem hiding this comment.
Can we use "attributes" instead of "parameters" here? I was confused what this function was doing until I realized you were talking about attributes.
Probably this will be even more easily cleared up by including a single simple example.
There was a problem hiding this comment.
There is a problem here in that the attributes are UVParameter objects. I'll work on the wording and an example.
There was a problem hiding this comment.
Ok, I updated the name and the docstring. Let me know if it's more clear.
| new_array = np.concatenate( | ||
| [ | ||
| getattr(self, param), | ||
| getattr(other, "_" + param).get_from_form(other_form_dict), |
There was a problem hiding this comment.
Is there a reason we couldn't just use np.take(getattr(other, param), other_inds, axis) here?
There was a problem hiding this comment.
Maybe. I didn't write get_from_form, it looks like it tries to do a slicing if possible and if it can't it just uses np.take, so I think it's already somewhat optimized? But probably worth doing some performance testing on.
| if param not in multi_axis_params: | ||
| continue |
There was a problem hiding this comment.
Why are we not also padding single-axis objects? This is confusing
There was a problem hiding this comment.
It is confusing! Among other things, this is handling the case where we have data divided into chunks along more than one axis and we're trying to combine it. If you combine first along one axis and then start adding in the first chunk along the next axis you have to pad out the multi-dimensional arrays with zeros (and flags) for the corners you don't have actual information yet. That doesn't come up for single axis parameters.
Imagine a 2D array split into quadrants. The first two (along any axis) add fine. When you add in the third, you need to pad the fourth quadrant with zeros (and flags). Then when we add in the fourth, if all the data are zero and flagged we allow it to overwrite that quadrant. Does that make sense? Happy to add some more comments or docstrings if you think it'd be helpful.
| order_dict : dict | ||
| dict giving the final sort indices for each axis (keys are axes, values | ||
| are index arrays for sorting). |
There was a problem hiding this comment.
Since this sorting occurs in multiple functions, might it be better to just have a standalone "arbitrary axis sorting helper"?
There was a problem hiding this comment.
hmm, possibly. I think the reason I did it this way is that each UVParmeter is only affected by one of these functions, so the sorting is currently only done only once per parameter. It's just done in a different function depending on whether it's a single axis or multiaxis array. For the single axis array, the sorting can be done at the same time as the indexing with the np.take call, so I think that's why I did it this way. But open to refactoring if it's not a performance hit.
26ed06f to
347cfdd
Compare
|
@steven-murray I finally got memray up and running. I started with a 9 GB MWA uvfits file, which I split along the frequency axis into two objects which I saved out to uvh5 files (each file was ~3 GB -- not sure why that's so much less than half the uvfits size). Then I used memray to run two scripts on this branch and on main. The two scripts just read in the two files into a UVData object using I'm attaching the memray flame graph html files for all 4 runs as well as the pngs of the memory graph for all 4 runs (the png is just made from the html, use the html to get tooltip readings). My take away is that this refactor causes a small improvement on the memory usage for the memray-flamegraph-output_add_main.html I also did a similar test but with adding/concatenating over the blt axis (I split the input data in half by time) and saw very similar results. |
|
So it's good that this does not reduce memory performance, and maybe improves it a little. It is weird that reading and concatenating two 3GB files results in 15 GB of peak memory usage at best (with |
|
It looks like we're allocating enough to read the two files first (~6GB), then soon after allocating another amount equivalent to that (so up to ~12GB), and then a little later another ~3-4GB (not sure where that comes from). Since peak memory can be a huge limitation, we should think about how we might get away with the peak memory being close to the final size of the arrays. |
|
@steven-murray I totally agree, now that I have memray working I am digging into memory usage more and we should definitely work to reduce it. But for this PR I am content that it doesn't make it worse. |
|
Cool. I think you answered my other points -- though this is still in draft so I can't be approving yet :-) |




Description
This is following some of the ideas in UVBase._select_along_axis. This could probably be generalized to UVBase, but I wanted to get UVData's version done to work out the ideas before moving it up UVBase. And I wanted to at least get a draft up for @rlbyrne to try out.
Motivation and Context
fixes #1595
Types of changes
Checklist:
Bug fix checklist: