Skip to content

Add second-stage QRF imputation for FRS-only variables#362

Merged
MaxGhenis merged 2 commits intomainfrom
add-second-stage-qrf-frs-vars
Apr 18, 2026
Merged

Add second-stage QRF imputation for FRS-only variables#362
MaxGhenis merged 2 commits intomainfrom
add-second-stage-qrf-frs-vars

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Fixes item 3 of #1621 — biggest-ticket architectural gap in the enhanced-FRS pipeline.

The zero-weight SPI-donor subsample's income columns are rewritten by the SPI-trained first-stage QRF, but every other FRS column (benefit _reported values, pension contributions, savings-interest income, state-pension and disability-benefit _reported amounts, council-tax benefit) stays as whatever middle-income FRS donor was sampled. A £2M imputed self-employment earner keeps its donor's £120 UC _reported value, tiny pension contributions, and typical rent. Under calibration upweight these cascade into false benefit aggregates, depressed allowances, and distorted housing-cost totals.

This PR adds a second-stage QRF in frs_only.py that:

  • Trains on the original full-FRS with predictors = [demographics + the six stage-1 income components] and outputs = the FRS-only variables listed in FRS_ONLY_PERSON_VARIABLES.
  • Predicts for every SPI-donor row using [demographics + newly-imputed incomes] as predictors.
  • Overwrites only the listed outputs; non-negative clamp; missing columns skipped.

Mirrors the CPS-only stage-2 QRF introduced in PolicyEngine/policyengine-us-data#589 and the same training pattern used by _impute_cps_only_variables.

Expected impact

High-income SPI-donor rows should now carry income-consistent benefit _reported values (close to zero for £500k+ earners), realistic pension contributions, and savings-income correlated with imputed income. Should substantially reduce the +£4-6bn drift across income_support, esa_contrib, working_tax_credit, child_tax_credit, and housing_benefit aggregates that the tracking issue attributes to donor leakage.

Test plan

  • Unit tests (4): non-negative outputs, non-target-column preservation, missing-column tolerance, training-pattern gradient (income -> UC receipt)
  • make format passes
  • Full data build in CI (integration path)
  • Benefit-aggregate comparison vs OBR on the new build (manual verification)

Generated with Claude Code

MaxGhenis and others added 2 commits April 18, 2026 07:37
The enhanced-FRS pipeline's zero-weight SPI-donor subsample has its
income columns rewritten by a SPI-trained first-stage QRF, but every
other FRS column (benefit `_reported` values, pension contributions,
savings income, council tax benefit) stays as whatever middle-income FRS
donor was sampled. After calibration upweight this cascades into false
benefit aggregates, distorted allowances, and housing-cost mismatches —
the tracking issue decomposes about £4-6bn of benefit-aggregate drift to
this failure mode (most visibly the "£1M earners with zero everything
else" pattern described in #1621).

Adds a second-stage QRF (`frs_only.py`) that trains on the original
full-FRS build with predictors = [demographics + first-stage income
outputs] and outputs = a curated list of FRS-only variables, then
predicts for every SPI-donor row. High-earner predictions collapse UC /
HB / WTC receipt toward zero, pension contributions rescale, and savings
interest correlates with imputed income. Mirrors the CPS-only stage-2
QRF introduced in policyengine-us-data#589.

Unit tests cover: non-negative outputs, that non-target columns are
untouched, that missing train/target columns are skipped silently, and
that the predictions track the training-data income → receipt gradient.
The real full-FRS retrain runs in CI via the integration data-build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI surfaced the KeyError: 'region' because the FRS build stores region
on the household frame, not the person frame. Route the lookup through
person_household_id so the stage-2 QRF trains and predicts on the
household-derived region column without needing a full Microsimulation
bootstrap (which would require a host of unrelated household columns
like council_tax, tenure_type, etc., that test fixtures don't carry).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis force-pushed the add-second-stage-qrf-frs-vars branch from ae3b4b6 to 4c06956 Compare April 18, 2026 11:37
@MaxGhenis MaxGhenis merged commit 5c726b6 into main Apr 18, 2026
3 checks passed
@MaxGhenis MaxGhenis deleted the add-second-stage-qrf-frs-vars branch April 18, 2026 12:08
MaxGhenis added a commit that referenced this pull request Apr 19, 2026
The weighted-UK-population drift that motivated #310 has already
dropped from ~6.5% to ~1.6% on current main as a side-effect of the
data-pipeline improvements landed yesterday (stage-2 QRF #362, TFC
target refresh #363, reported-anchor takeup #359).

Tightens `test_population` tolerance from 7 % to 3 % to lock in that
gain — any future calibration change that regresses back toward the
pre-April-2026 overshoot now trips CI instead of silently drifting.
Adds a new `test_population_fidelity.py` with four regression tests
extracted from the #310 draft:

- weighted-total ONS match (3 % tolerance)
- household-count sanity range (25-33 M)
- non-inflation guard (< 72 M)
- country-populations-sum-to-UK consistency

Does not include #310's loss-function change or Scotland target
removal; those are independent proposals and should be evaluated on
their own merits once the practical overshoot is resolved.

Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MaxGhenis added a commit that referenced this pull request Apr 19, 2026
* Tighten population tolerance and add fidelity tests

The weighted-UK-population drift that motivated #310 has already
dropped from ~6.5% to ~1.6% on current main as a side-effect of the
data-pipeline improvements landed yesterday (stage-2 QRF #362, TFC
target refresh #363, reported-anchor takeup #359).

Tightens `test_population` tolerance from 7 % to 3 % to lock in that
gain — any future calibration change that regresses back toward the
pre-April-2026 overshoot now trips CI instead of silently drifting.
Adds a new `test_population_fidelity.py` with four regression tests
extracted from the #310 draft:

- weighted-total ONS match (3 % tolerance)
- household-count sanity range (25-33 M)
- non-inflation guard (< 72 M)
- country-populations-sum-to-UK consistency

Does not include #310's loss-function change or Scotland target
removal; those are independent proposals and should be evaluated on
their own merits once the practical overshoot is resolved.

Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Loosen population tolerance 3% -> 4% for stochastic calibration variance

First CI run on this branch produced 71.8M (3.31% over target) where
yesterday's main build produced 70.97M (1.58%). Stochastic dropout
in the calibration optimiser (`dropout_weights(weights, 0.05)`) gives
~1-2 percentage point build-to-build variance on the population total.

4% keeps the regression gate well below the pre-April-2026 overshoot
(~6.5%) while not flaking on normal stochastic variance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant