Skip to content

Close leaked CPS HDF stores#694

Open
MaxGhenis wants to merge 3 commits intomainfrom
codex/fix-hdf5-warnings
Open

Close leaked CPS HDF stores#694
MaxGhenis wants to merge 3 commits intomainfrom
codex/fix-hdf5-warnings

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • close raw CPS HDF stores after previous-year income preprocessing and after the auto-loan/net-worth preprocessing path
  • close the initial raw CPS store as soon as the base CPS entity tables are materialized
  • add a focused regression test that verifies the previous-year CPS handles are closed

Testing

  • UV_PYTHON=3.14 uv run pytest tests/unit/datasets/test_cps_file_handles.py tests/unit/datasets/test_cps_takeup.py tests/unit/test_extended_cps.py -q
  • python3 -m py_compile policyengine_us_data/datasets/cps/cps.py tests/unit/datasets/test_cps_file_handles.py
  • UV_PYTHON=3.14 uv run ruff format --check policyengine_us_data/datasets/cps/cps.py tests/unit/datasets/test_cps_file_handles.py

Copy link
Copy Markdown
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Completely optional, but here's an argument for context managers over try/finally:

For a two-line body the try/finally is a lot of ceremony. pd.HDFStore supports with, so:

  with self.raw_cps(require=True).load() as raw_data:                                                         
      raw_person = raw_data["person"]
      cps["is_married"] = raw_person.A_MARITL.isin([1, 2]).values                                             
                                                                                                              
  Same behavior, less noise. I'd recommend context managers throughout if the PR author wants to close these  
  handles. 

raw_data[entity] for entity in ENTITIES
]
raw_data = self.raw_cps(require=True).load()
try:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the try/finally is arguably overkill as the list comprehension over a fixed ENTITIES constant is unlikely to fail. A context manager (with) would be cleaner if the HDFStore supports it (and Pandas HDFStore does)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants