core's random() function (lines 308-348) is used for 3 variables in policyengine-us and 15 variables in policyengine-uk. Two inputs determine the seed for each person:
- Entity ID (
f"{population.entity.key}_id")
- Call count: How many times random() has been called in the simulation
The seed formula:
seed = int(abs(id * 100 + population.simulation.count_random_calls))
Example:
- Person 5, 1st call to random() →
seed = int(abs(5 * 100 + 1)) = 501
- Person 5, 2nd call to random() →
seed = int(abs(5 * 100 + 2)) = 502
- Person 7, 1st call to random() →
seed = int(abs(7 * 100 + 3)) = 703
For context, the multiplication by 100 has caused integer overflow problems in the past.
In the local area calibration case, where donor households must have their state_fips swapped, re-keying with with new person_ ids is unavoidable. Because the random() function is linked to person_id, the final Microsimulation from local area calibration will never match the matrix times the weights.
For instance, here is how snap in policyengine-us relates to the random function:
snap
└── snap_gross_income
└── snap_unearned_income (uses `adds`)
└── ssi (SSI benefit amount)
└── is_ssi_eligible
└── meets_ssi_resource_test
└── random() ← stochastic eligibility
What this means is that a household with $2000 in snap, that was assigned a weight of 150 - partially due to this snap value - might end up in the final microsimulation with $1800 in snap, but still a weight of 150. Then when we run Microsimulation.calculate('snap').sum(), we don't match the values from X @ w in the calibration. Whether that because of a bug in the construction of the very complex X, or is it because of random snap, is very difficult to tell.
Recommendation: Use seeds stored in the microdata like the SNAP take-up seed, and remove random from core.
SNAP's "take-up seed" works quite differently. (policyengine-us/policyengine_us/variables/gov/usda/snap/snap_take_up_seed.py)
Here's the SNAP takeup mechanism:
File 1: snap_take_up_seed.py (lines 1-8)
class snap_take_up_seed(Variable):
value_type = float
entity = SPMUnit
label = "Randomly assigned seed for SNAP take-up"
definition_period = YEAR
No formula
File 2: takes_up_snap_if_eligible.py (lines 10-13)
def formula(spm_unit, period, parameters):
seed = spm_unit("snap_take_up_seed", period)
takeup_rate = parameters(period).gov.usda.snap.takeup_rate
return seed < takeup_rate
Here, the snap_take_up_seed is defined in cps.py (line 230) in policyengine-us-data:
data["snap_take_up_seed"] = generator.random(len(data["spm_unit_id"]))
This approach work better with local area calibration because that seed becomes linked to the household as a sort of property. We could really define one seed value per person, household, etc. (really every unit) and anything random could depend on it. Reproducibility would also be much simpler.
core's random() function (lines 308-348) is used for 3 variables in policyengine-us and 15 variables in policyengine-uk. Two inputs determine the seed for each person:
f"{population.entity.key}_id")The seed formula:
seed = int(abs(id * 100 + population.simulation.count_random_calls))Example:
seed = int(abs(5 * 100 + 1)) = 501seed = int(abs(5 * 100 + 2)) = 502seed = int(abs(7 * 100 + 3)) = 703For context, the multiplication by 100 has caused integer overflow problems in the past.
In the local area calibration case, where donor households must have their state_fips swapped, re-keying with with new person_ ids is unavoidable. Because the random() function is linked to person_id, the final Microsimulation from local area calibration will never match the matrix times the weights.
For instance, here is how snap in policyengine-us relates to the random function:
What this means is that a household with $2000 in snap, that was assigned a weight of 150 - partially due to this snap value - might end up in the final microsimulation with $1800 in snap, but still a weight of 150. Then when we run
Microsimulation.calculate('snap').sum(), we don't match the values from X @ w in the calibration. Whether that because of a bug in the construction of the very complex X, or is it because of random snap, is very difficult to tell.Recommendation: Use seeds stored in the microdata like the SNAP take-up seed, and remove random from core.
SNAP's "take-up seed" works quite differently. (policyengine-us/policyengine_us/variables/gov/usda/snap/snap_take_up_seed.py)
Here's the SNAP takeup mechanism:
File 1: snap_take_up_seed.py (lines 1-8)
class snap_take_up_seed(Variable):
value_type = float
entity = SPMUnit
label = "Randomly assigned seed for SNAP take-up"
definition_period = YEAR
No formula
File 2: takes_up_snap_if_eligible.py (lines 10-13)
def formula(spm_unit, period, parameters):
seed = spm_unit("snap_take_up_seed", period)
takeup_rate = parameters(period).gov.usda.snap.takeup_rate
return seed < takeup_rate
Here, the snap_take_up_seed is defined in cps.py (line 230) in policyengine-us-data:
This approach work better with local area calibration because that seed becomes linked to the household as a sort of property. We could really define one seed value per person, household, etc. (really every unit) and anything random could depend on it. Reproducibility would also be much simpler.