Skip to content

[codex] Align SPM thresholds and stabilize clone-half priors#696

Closed
MaxGhenis wants to merge 3 commits intoPolicyEngine:mainfrom
MaxGhenis:codex/spm-threshold-align-data
Closed

[codex] Align SPM thresholds and stabilize clone-half priors#696
MaxGhenis wants to merge 3 commits intoPolicyEngine:mainfrom
MaxGhenis:codex/spm-threshold-align-data

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented Apr 7, 2026

What changed

  • align data-side SPM threshold helpers with the same reference-threshold and equivalence-scale logic used in policyengine-us
  • make CD geographic adjustments tenure-specific instead of applying a renter-style adjustment to every cloned SPM unit
  • replace the stale hardcoded housing target with a year-specific Census CPS ASEC SPM_CAPHOUSESUB benchmark for spm_unit_capped_housing_subsidy
  • add a separate HUD USER benchmark path for modeled housing_assistance, so Census SPM capped subsidy and HUD spending/assisted-household counts are no longer mixed together
  • stop stage-2 QRF from imputing spm_unit_spm_threshold for the PUF clone half
  • rebuild clone-half spm_unit_spm_threshold deterministically from the donor half's geography and the current threshold formula
  • replace the additive +1 sparse-reweighting prior with deterministic near-zero priors for zero-weight clone households, while keeping donor-half priors close to their survey weights
  • add regression tests for future-year threshold reconstruction, tenure-specific CD GEOADJ values, clone-half threshold rebuilding, and sparse-prior initialization

Why

The data pipeline had three distinct SPM issues:

  • threshold reconstruction in policyengine-us-data had drifted from the model-side logic in policyengine-us
  • local-area recalculation reused a renter-style GEOADJ across owners and renters
  • clone-half enhanced CPS generation was giving zero-weight synthetic households a meaningful starting prior and letting stage-2 QRF learn spm_unit_spm_threshold, even though thresholds should be derived from donor geography plus composition, not predicted statistically

The housing benchmark cleanup is separate but related concept hygiene:

  • spm_unit_capped_housing_subsidy is a Census SPM concept and should be benchmarked to CPS ASEC SPM_CAPHOUSESUB
  • housing_assistance is a HUD program/spending concept and should be benchmarked separately to HUD USER assisted-household counts and spending totals

Impact

  • CPS-derived thresholds and local-area cloned thresholds now use the same future-year reference threshold path as policyengine-us
  • CD-based local-area outputs no longer reuse one GEOADJ across owners and renters
  • national housing calibration for the SPM capped subsidy now uses the Census concept instead of a stale mixed HUD/Census hardcoded value
  • validation output now reports the Census capped-subsidy benchmark and HUD USER housing-assistance benchmark as separate rows
  • clone-half enhanced CPS generation now starts zero-weight synthetic households near zero in the sparse optimizer instead of around weight 1
  • clone-half SPM thresholds are rebuilt from donor geography instead of being stage-2 QRF outputs

Root cause

policyengine-us-data had:

  • a separate threshold forecast path from policyengine-us
  • one CD GEOADJ per district built with renter assumptions and reused for all tenures
  • a housing benchmark path that mixed Census SPM capped subsidy and HUD spending concepts
  • clone-half enhanced CPS generation that treated zero-weight synthetic households like ordinary weighted donors during sparse-prior initialization and allowed spm_unit_spm_threshold into the CPS-only QRF output set

Validation

  • uv run pytest -q tests/unit/test_extended_cps.py
  • uv run pytest -q tests/unit/calibration/test_calibration_puf_impute.py
  • uv run pytest -q policyengine_us_data/tests/test_local_area_calibration/test_spm_thresholds.py
  • uv run pytest -q tests/integration/test_enhanced_cps.py -k 'household_count or poverty_rate_reasonable'
  • git diff --check

@MaxGhenis MaxGhenis changed the title [codex] Align SPM threshold recalculation [codex] Align SPM thresholds and stabilize clone-half priors Apr 8, 2026
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Superseded by #702. This draft was opened from a forked branch and is blocked by the workflow gate, so CI cannot run here.

@MaxGhenis MaxGhenis closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant