Use UVParameter forms in UVData.__add__ and UVData.fast_concat to prevent missing parameters. by bhazelton · Pull Request #1606 · RadioAstronomySoftwareGroup/pyuvdata

bhazelton · 2025-08-18T21:55:09Z

Description

This is following some of the ideas in UVBase._select_along_axis. This could probably be generalized to UVBase, but I wanted to get UVData's version done to work out the ideas before moving it up UVBase. And I wanted to at least get a draft up for @rlbyrne to try out.

Motivation and Context

fixes #1595

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation change (documentation changes only)
Version change
Build or continuous integration change
Other

Checklist:

I have read the contribution guide.
My code follows the code style of this project.

Bug fix checklist:

My fix includes a new test that breaks as a result of the bug (if possible).
I have updated the CHANGELOG.

codecov · 2025-08-18T22:22:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.93%. Comparing base (65efcce) to head (347cfdd).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1606      +/-   ##
==========================================
- Coverage   99.93%   99.93%   -0.01%     
==========================================
  Files          67       67              
  Lines       22688    22670      -18     
==========================================
- Hits        22674    22656      -18     
  Misses         14       14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bhazelton · 2025-08-18T23:11:29Z

@kartographer this is a slightly different approach than you took in the Telescope.__add__ method. Before I start thinking about moving it up to UVBase I'd like to get your thoughts about whether there is something that could be done better.

tests/uvdata/test_mir.py

steven-murray

Thanks @bhazelton and sorry for the very slow review! I have a few comments. Mostly I think we can make some of these functions a bit more clear in what they are doing. They seem like they will become quite central in how UVxxx objects operate, so we should make them as clear as possible for our own future reference.

It might also be useful to do a quick profiling for these functions, as I expect they might be bottleneck functions in some applications (especially in terms of memory)

src/pyuvdata/utils/tools.py

steven-murray · 2025-12-29T22:59:06Z

src/pyuvdata/uvbase.py


+    def _get_param_axis(self, axis_name: str, single_named_axis: bool = False):
+        """
+        Get a mapping of parameters that have a given axis to the axis number.


Can we use "attributes" instead of "parameters" here? I was confused what this function was doing until I realized you were talking about attributes.

Probably this will be even more easily cleared up by including a single simple example.

There is a problem here in that the attributes are UVParameter objects. I'll work on the wording and an example.

Ok, I updated the name and the docstring. Let me know if it's more clear.

src/pyuvdata/uvbase.py

steven-murray · 2025-12-29T23:04:37Z

src/pyuvdata/uvbase.py

+            new_array = np.concatenate(
+                [
+                    getattr(self, param),
+                    getattr(other, "_" + param).get_from_form(other_form_dict),


Is there a reason we couldn't just use np.take(getattr(other, param), other_inds, axis) here?

Maybe. I didn't write get_from_form, it looks like it tries to do a slicing if possible and if it can't it just uses np.take, so I think it's already somewhat optimized? But probably worth doing some performance testing on.

steven-murray · 2025-12-29T23:08:22Z

src/pyuvdata/uvbase.py

+            if param not in multi_axis_params:
+                continue


Why are we not also padding single-axis objects? This is confusing

It is confusing! Among other things, this is handling the case where we have data divided into chunks along more than one axis and we're trying to combine it. If you combine first along one axis and then start adding in the first chunk along the next axis you have to pad out the multi-dimensional arrays with zeros (and flags) for the corners you don't have actual information yet. That doesn't come up for single axis parameters.

Imagine a 2D array split into quadrants. The first two (along any axis) add fine. When you add in the third, you need to pad the fourth quadrant with zeros (and flags). Then when we add in the fourth, if all the data are zero and flagged we allow it to overwrite that quadrant. Does that make sense? Happy to add some more comments or docstrings if you think it'd be helpful.

steven-murray · 2025-12-29T23:10:23Z

src/pyuvdata/uvbase.py

+        order_dict : dict
+            dict giving the final sort indices for each axis (keys are axes, values
+            are index arrays for sorting).


Since this sorting occurs in multiple functions, might it be better to just have a standalone "arbitrary axis sorting helper"?

hmm, possibly. I think the reason I did it this way is that each UVParmeter is only affected by one of these functions, so the sorting is currently only done only once per parameter. It's just done in a different function depending on whether it's a single axis or multiaxis array. For the single axis array, the sorting can be done at the same time as the indexing with the np.take call, so I think that's why I did it this way. But open to refactoring if it's not a performance hit.

bhazelton · 2026-02-20T00:08:59Z

@steven-murray I finally got memray up and running. I started with a 9 GB MWA uvfits file, which I split along the frequency axis into two objects which I saved out to uvh5 files (each file was ~3 GB -- not sure why that's so much less than half the uvfits size). Then I used memray to run two scripts on this branch and on main. The two scripts just read in the two files into a UVData object using from_file and passing a list containing the two file names, the only difference was that in one script I specified axis="freq" so it used the fast_concat method rather than the __add__ method.

I'm attaching the memray flame graph html files for all 4 runs as well as the pngs of the memory graph for all 4 runs (the png is just made from the html, use the html to get tooltip readings). My take away is that this refactor causes a small improvement on the memory usage for the __add__ method. There is very little difference for fast_concat but this refactor might be using very slightly more memory. I'm interested in your read of these:

memray-flamegraph-output_add_main.html
memray-flamegraph-output_add_refactor.html
memray-flamegraph-output_concat_main.html
memray-flamegraph-output_concat_refactor.html

main __add__:

this branch __add__:

main fast_concat:

this branch fast_concat :

I also did a similar test but with adding/concatenating over the blt axis (I split the input data in half by time) and saw very similar results.

steven-murray · 2026-02-20T13:11:38Z

So it's good that this does not reduce memory performance, and maybe improves it a little.

It is weird that reading and concatenating two 3GB files results in 15 GB of peak memory usage at best (with fast_concat). I think we should have a good look at that as a team, but not in this PR.

steven-murray · 2026-02-20T13:13:54Z

It looks like we're allocating enough to read the two files first (~6GB), then soon after allocating another amount equivalent to that (so up to ~12GB), and then a little later another ~3-4GB (not sure where that comes from).

Since peak memory can be a huge limitation, we should think about how we might get away with the peak memory being close to the final size of the arrays.

bhazelton · 2026-02-20T18:36:49Z

@steven-murray I totally agree, now that I have memray working I am digging into memory usage more and we should definitely work to reduce it. But for this PR I am content that it doesn't make it worse.

steven-murray · 2026-02-20T18:43:08Z

Cool. I think you answered my other points -- though this is still in draft so I can't be approving yet :-)

bhazelton added bug UVData labels Aug 18, 2025

bhazelton force-pushed the add_concat_use_axis branch from e0ef552 to 829fab9 Compare August 18, 2025 22:04

bhazelton commented Aug 19, 2025

View reviewed changes

tests/uvdata/test_mir.py Outdated Show resolved Hide resolved

bhazelton force-pushed the add_concat_use_axis branch 3 times, most recently from f6b1427 to 33b50d9 Compare August 19, 2025 23:38

bhazelton requested a review from kartographer August 25, 2025 23:08

bhazelton requested a review from steven-murray September 8, 2025 16:27

bhazelton force-pushed the add_concat_use_axis branch from 89f562f to c604805 Compare September 9, 2025 02:08

bhazelton force-pushed the add_concat_use_axis branch 5 times, most recently from 945b3a6 to 34449d2 Compare September 24, 2025 00:32

bhazelton mentioned this pull request Nov 3, 2025

Parameter scan_number_array is not updated in downsample_in_time #1631

Open

bhazelton force-pushed the add_concat_use_axis branch 2 times, most recently from a1f87dc to 26ed06f Compare December 11, 2025 22:29

steven-murray reviewed Dec 29, 2025

View reviewed changes

bhazelton added 9 commits January 21, 2026 19:12

add tests that breaks due to the bug

0bebcc9

start work on making add and fast_concat use forms

19e7a47

finish up add and fast concat work using forms

fa19b0d

refactor with more convenience methods

e94f95e

handle spw, freq in the same way as time, bl

b3429fd

update code comments

96ce456

fix handling for spws with descending freqs

a550c4c

docstrings and annotations for convenience methods

7e8c226

handle multidimensional arrays programmatically

59fbe8b

bhazelton added 8 commits January 21, 2026 19:12

minor fixes

74792ef

check that overlap is not valid data programmatically

035d26f

Better comments and variable names

eed063c

Move functions out of uvdata for better reusability

dbe5f32

cleanup add to use a single dict to carry all the per axis info

147be29

use a single dict in fast_concat as well

00eb2eb

update comments

54b4304

address review comments

347cfdd

bhazelton force-pushed the add_concat_use_axis branch from 26ed06f to 347cfdd Compare January 27, 2026 23:24

Comments

Conversation

bhazelton commented Aug 18, 2025

Description

Motivation and Context

Types of changes

Checklist:

Uh oh!

codecov bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bhazelton commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

steven-murray left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhazelton commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steven-murray commented Feb 20, 2026

Uh oh!

steven-murray commented Feb 20, 2026

Uh oh!

bhazelton commented Feb 20, 2026

Uh oh!

steven-murray commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Aug 18, 2025 •

edited

Loading

bhazelton commented Aug 18, 2025 •

edited

Loading

bhazelton commented Feb 20, 2026 •

edited

Loading