feat(0.9.1): Wire 1 — risk-sensitive action annotation (Stage 4)#257
Merged
Conversation
Lifted from docs/plans/bio_emergent_persona_foundations.md § Wire 1
and docs/plans/release_0_9_1.md § Stage 4. Substrate-acquired outcome
variance reaches the LLM through felt-sensation tool-description
annotations — hybrid bio-system + LLM design preserved (the post-1.0
cleaner pre-filter ranker is documented as the future direction).
## What ships
- NAc._event_outcome_welford — per-(agent_id, event_signature)
Welford online variance state on the binary reward signal across
outcomes. Updated once per outcome in _record_outcome_impl under
self._lock. CC4 per-agent stash discipline (required agent_id
derived from event_context["agent_id"]; outcomes without the tag
silently skip the accumulator — documented contract + regression
test).
- NAc.get_action_risk_profile(event_sig=None, *, agent_id,
min_observations=5) — returns {event_signature → variance}
filtered by agent_id and min observations. Empty agent_id raises
ValueError per the CC4 stash rule.
- OutcomePrediction.uncertainty_interval populated in
_predict_impl from the NAc-level Welford state. Reserved field
contract (PR #216) preserved: (lower, upper) tuple, sentinel
(0.0, 0.0) on cold-start or when context lacks agent_id.
- agent_loop.py tool annotation hook runs after Wire 3's
integrity-band annotation. Felt-sensation phrasing
(feels unpredictable) / (feels predictable) matches Wire 3's
register (Wire-A's bracketed style is reserved for its own
prompt-section surface). Idempotent under repeated observations;
strips stale annotation when variance drifts back into the neutral
band.
- WIRE_1_ANNOTATION sim_log event mirrors Wire 3's WIRE_3_FILTER
shape so Roy-3 can measure annotation effect without behavioral
inference. Carries high_variance_tools / reliable_tools /
annotated_variances for post-hoc analysis.
- MAXIM_DISABLE_VARIANCE_ANNOTATION env-var ablation gate mirrors
Wire-A's pattern (default OFF / annotation ON). Conftest autouse
scrub fixture in tests/conftest.py per
feedback_opt_in_env_in_hot_paths.md.
## Plan deviation: variance lives on NAc, not CausalLink
The plan originally placed variance_estimate on CausalLink. During
implementation we found _generate_link_id keys on outcome_signature
(which embeds valence), so each (event_sig, outcome_valence) pair
becomes a separate link. Per-link Welford variance on binary
success/failure is then structurally 0 — useless as the "is this tool
reliable" signal Wire 1 needs. Moving the accumulator one level up to
NAc._event_outcome_welford keyed by (agent_id, event_signature)
captures the cross-link outcome heterogeneity that drives meaningful
annotation. The deviation is documented in the CausalLink docstring
and the test file's module docstring.
## Welford correctness
The online algorithm is numerically stable — no Σ(x²) − (Σx)²/n / n
catastrophic cancellation at low-variance + high-n. The accumulator
fires exactly once per outcome in _record_outcome_impl (NOT per
eligibility-trace credit-distribution event — that path goes through
distribute_reward which touches _reward_bias, not the Welford state).
Zero-observation and n < 2 return the (0.0, 0.0) uncertainty sentinel
without divide-by-zero. All three pre-merge correctness concerns
covered by unit tests.
## Honest scope caveat (preserved)
Wire 1's behavioral effect goes through the LLM (it reads the
annotations and adjusts). It is hybrid bio-system + LLM, not pure
substrate-driven. A cleaner post-1.0 design adds a real risk-weighted
action ranker that pre-filters tools before the LLM sees them. The
hybrid version ships in 0.9.1 to keep scope tight. Roy-3's three-arm
results will reveal whether this hybrid wiring carries enough
substrate signal or whether the post-1.0 pre-filter ranker is needed.
## Persistence
_event_outcome_welford round-trips through NAc dump() / load_state().
Composite key joined with \x1f (ASCII unit separator) to handle
event_signatures containing :. Backward-compat: missing field on
pre-0.9.1 dumps loads as empty dict; first new outcome bootstraps
state cleanly. Corrupt entries are skipped without crashing load.
## Test coverage (49 tests)
- Welford correctness across N (one, two-identical, alternating,
small-n naive cross-check, numerical stability at 1000 samples,
monotone-grows-with-alternation).
- Fires once per outcome (record_outcome increment, distribute_reward
no-op, no-agent_id skip).
- Persistence round-trip (state preserved, backward-compat empty load,
corrupt entries skipped).
- OutcomePrediction.uncertainty_interval (cold-start sentinel, high
variance widens, low variance narrows, no-agent_id sentinel).
- get_action_risk_profile (empty-agent_id ValueError, cold-start
empty, below-threshold filter, threshold-inclusive, per-agent
isolation on shared NAc, event_sig filter, agent_id-less links
skipped).
- agent_loop annotation assembly (high/low/middle band, idempotency,
band transition stripping, Wire 1 / Wire 3 coexistence, tool:use:*
skipped).
- Env ablation gate parser shape.
- End-to-end pin: outcomes → get_action_risk_profile → band
classification.
## Roy-3 measurability
WIRE_1_ANNOTATION sim_log events make post-hoc analysis structurally
possible — distinguishes "annotation reached LLM" from "LLM ignored
annotation" without behavioral inference. Roy-3 can count
annotation-on vs annotation-off divergence on tool-family choice
distributions via the new env-var gate.
## Fast-suite green
6808 passed, 9 skipped, 40 deselected (7m45s). Zero regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two parallel pre-merge reviews (architecture lens + bio-fidelity lens)
raised 22 findings across 4 SEVERE + 10 SIGNIFICANT + 8 NIT bands. The
cross-confirmed findings (called out by both lenses independently)
landed first per `feedback_cross_confirmed_review_findings.md`:
- Honest scope caveat at consumer site: Bio S3 + Exec G6
- WIRE_1_ANNOTATION measurability gap: Bio S4 + Exec N1
- Multi-attribution / agent_id discipline: Bio S2 + Exec G6/G1/N4
## Architecture-lens folds
- **Exec S1**: Fixed orphan `variance_estimate` docstring references
on `CausalLink` and `OutcomePrediction.uncertainty_interval` — both
now point at `NAc._event_outcome_welford[(agent_id, event_sig)]`,
consistent with the moved accumulator.
- **Exec S2**: Bumped `_NAC_FORMAT_VERSION` from "1.1" → "1.2". Wire 2
introduced 1.0→1.1 (percept_valences); Wire 1 adds the
`event_outcome_welford` top-level key so the version ratchets
forward per the CLAUDE.md "Persistence-format contract" rule.
Backward-compat reader handles missing keys on 1.0 and 1.1 payloads
as empty dicts. Updated `test_wire_2_percept_aversion.py` tests to
pin `>= 1.1` (semver-style ratchet) rather than the exact 1.1
string so future bumps don't regress this test.
- **Exec S3 + G6**: Extracted `NAc._uncertainty_for(event_signature,
predicted_value, context)` helper used by both `_predict_impl` AND
`predict_all_outcomes`. The sibling-method silent-no-op (predict
populated uncertainty_interval; predict_all_outcomes did not) is
now structurally impossible. Helper covers the four sentinel
conditions in one place: no agent_id in context, no Welford state
for the pair, n < 2, variance == 0. Added regression tests.
- **Exec G2**: Replaced inlined `_WIRE1_TRUTHY` frozenset with a call
to Wire-A's canonical `annotation_disabled_via_env` parser from
`prompts/cluster_bias_annotation.py`. Single source of truth across
0.9.1's two annotation gates — a future change to the truthy set
flows to both wires.
- **Exec G3**: Documented the `observe()` Welford-skip divergence in
`NAc.observe()` docstring + added regression test
`TestObservePathSkipsWelford` (two tests). The asymmetry is
preserved by design (post-1.0 unification cleanup); pinning the
contract surfaces the divergence at test time for any future
consumer that relies on `get_action_risk_profile` after only
calling `observe()`.
- **Exec G5**: Added concurrent-mutation safety note to the
`_event_outcome_welford` init docstring. The dict value is mutated
in-place under `self._lock`; a future refactor that replaces
`state["n"] += 1.0` with `_event_outcome_welford[key] = state.copy()`
would silently lose updates if the lock isn't held across the full
read-modify-write — the docstring now names the trap.
- **Exec G6** (covered by S3 helper).
- **Exec N1**: Enriched `WIRE_1_ANNOTATION` sim_log payload with
`agent_id` (multi-agent attribution), `felt_phrases` (exact strings
the LLM saw), and `middle_band_variances` (counterfactual surface
for Roy-3 ablation analysis — tools that have substrate variance
but fell in the no-annotation band).
- **Exec N3**: Added exclusive-boundary test for `min_observations`
(n=4 returns empty profile under default min=5).
- **Exec N4**: Added agent_id silently-skip note to `record_outcome`
docstring.
- **Exec N5**: Documented lifetime-cumulative behaviour and the
per-tick decay hook location in `_event_outcome_welford` init.
- **Exec N6**: Added `test_multiple_tool_use_compound_signatures_skipped`
test pinning the skip loop's behaviour with two tool:use:X entries.
## Bio-fidelity-lens folds
- **Bio S1 (SEVERE — phrasing register)**: Shifted Wire 1 phrasing
from "(feels unpredictable)" / "(feels predictable)" to
**"(unpredictable from prior experience)"** /
**"(reliable from prior experience)"**. The original "feels X"
phrasing collapsed Wire 1's metacognitive signal into Wire 3's
somatic register — both surfaces used the same "feels X" stem
with the same parenthesization, so the LLM could not separate
"I will fail because the body is broken" (Wire 3, proprioceptive)
from "I will fail because the outcome is stochastic" (Wire 1,
experience-acquired). The new experience-voice phrasing aligns
with Wire-A's "[... from prior experience]" register, keeping the
two experience-acquired signals coherent across wires while Wire 3
owns the somatic surface alone. Updated regex, constants,
annotation block, and all relevant tests.
- **Bio S2 (SEVERE — context-grain caveat)**: Elevated the
context-averaged variance trade-off from a scope caveat to a
**THESIS CAVEAT** in the `_event_outcome_welford` init docstring.
The key is `(agent_id, event_signature)`, NOT
`(agent_id, event_signature, context_hash)`. The substrate-faithful
version would condition variance on context so a tool that's
reliable against straw dummies but erratic against armored knights
surfaces as two distinct entries the LLM can read separately. Wire 1
ships the averaged version to keep 0.9.1 scope tight; the
context-conditioned version is post-1.0 cleanup if Roy-3 finds the
averaged surface insufficient. Caveat is now load-bearing in the
docstring so a future refactor cannot silently entrench the
averaging.
- **Bio S3**: Added scope caveat paragraph to `get_action_risk_profile`
docstring AND `OutcomePrediction.uncertainty_interval` docstring
naming the hybrid bio + LLM design. Future readers inspecting
either surface alone will see the caveat (previously documented
only in the commit body, where it could erode under refactor
pressure per `feedback_interim_contamination.md`).
- **Bio S4**: WIRE_1_ANNOTATION payload now carries `felt_phrases`
(the LLM-visible text per tool) and `middle_band_variances` (the
counterfactual — tools with substrate variance but no annotation).
Roy-3 post-hoc analysis can now distinguish (a) substrate produced
variance, (b) annotation reached prompt, (c) LLM chose differently,
from each other without behavioural inference.
- **Bio S6**: Reframed the CausalLink plan-deviation docstring from
apology to architectural decision. The moved accumulator IS the
root-cause fix per the no-band-aid rule; the new wording leads with
the architecture, not the deviation, and references the new
CLAUDE.md lesson.
- **Bio N1**: Added "Key-embedded values produce structurally-
degenerate statistics" lesson to CLAUDE.md `## Lessons learned`.
The rule generalises: if your statistic accumulator's key embeds
the dimension you want to vary over, the per-key statistic is
structurally 0. Applies beyond variance to bandit per-arm reward
estimates, goal-conditioned success rates, etc. Cites this Wire 1
finding alongside the existing `_context_similarity` denominator
lesson — same family of silent degenerate-statistic shapes.
## Findings explicitly deferred to post-1.0
- **Bio S5**: NAc state taxonomy (first-moment vs second-moment).
Conceptual refactor; out of 0.9.1 scope.
- **Bio N2**: Single `felt_suffix.py` module for shared Wire-3 /
Wire-1 regex composition. Cosmetic; flagged in agent_loop.py
comment for post-1.0 cleanup.
- **Exec G1**: Multi-attribution variance asymmetry. Production
callers ship one event per outcome; the punt is documented in the
accumulator init docstring. Pre-emptive fix awaits a real
multi-attribution caller.
- **Exec G4**: Integration test through PredictionContext (planner
surface). Added a basic pin via `test_predict_helper_with_prediction_context`;
the deeper integration awaits the planner-side consumer.
- **Exec N2**: `state["n"]` float-typing rationale. Documented
accurately ("uniform numeric type avoids isinstance branching in
load_state") in the init docstring.
## Validation
- 58 Wire 1 tests pass (+ 9 from this fold round).
- Full Wires + persistence regression suite: 308 tests pass.
- mypy public API surface: clean.
- ruff format + lint: clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
release_0_9_1.md(Stage 4) — risk-sensitive action annotation. Substrate-acquired outcome variance reaches the LLM through experience-voice tool-description annotations. Hybrid bio-system + LLM design preserved; a pure substrate-primary pre-filter ranker is the post-1.0 cleaner path.feedback_cross_confirmed_review_findings.md.Wires
(agent_id, event_signature)Welford online variance over the binary reward signal. Updated once per outcome in_record_outcome_implunderself._lock. The plan originally placed variance onCausalLink; per-link variance is structurally 0 for binary outcomes because_generate_link_idkeys onoutcome_signature(which embeds valence) — so the accumulator moved up one level. New CLAUDE.md lesson "Key-embedded values produce structurally-degenerate statistics" generalises the pattern.NAc.get_action_risk_profile(event_sig=None, *, agent_id, min_observations=5)returns{event_signature → variance}per agent. Empty agent_id raisesValueError(CC4 rule).OutcomePrediction.uncertainty_intervalpopulated in both_predict_implandpredict_all_outcomesvia the sharedNAc._uncertainty_forhelper (single source of truth; no sibling-method silent-no-op). Sentinel(0.0, 0.0)on cold-start, missing agent_id, n<2, or variance=0.(unpredictable from prior experience)/(reliable from prior experience)— distinct register from Wire 3's somatic(feels strained)/(feels weakened). Idempotent under repeated observations; strips stale annotation when variance drifts to the neutral band.agent_id,high_variance_tools,reliable_tools,felt_phrases(exact LLM-visible strings),annotated_variances(numeric floats), andmiddle_band_variances(counterfactual — substrate-variance present but no annotation reached the prompt).annotation_disabled_via_envparser. Conftest autouse scrub fixture clears it between tests.Plan deviation: variance lives on NAc, not CausalLink
The plan placed
variance_estimateonCausalLink. Implementation found per-link variance is structurally 0 for binary outcomes because_generate_link_idkeys onoutcome_signaturewhich embeds valence — each(event_sig, outcome_valence)pair allocates a separate link, so the per-link reward stream is constant-valued. The fix moves variance one level up toNAc._event_outcome_welfordkeyed by(agent_id, event_signature), the level where the cross-outcome distribution actually lives. This is the root-cause architecture, not a workaround — per the no-band-aid rule. The new CLAUDE.md lesson generalises the pattern for bandit per-arm estimates, goal-conditioned success rates, and other shapes where the statistic accumulator's key embeds the dimension to vary over.Honest scope caveat (preserved)
Wire 1's behavioural effect goes through the LLM (it reads the annotations and adjusts). It is hybrid bio + LLM, not pure substrate-driven. A cleaner post-1.0 design adds a real risk-weighted action ranker that pre-filters tools before the LLM sees them. The hybrid version ships in 0.9.1 to keep scope tight. Roy-3's three-arm comparison will reveal whether the hybrid annotation carries enough substrate signal or whether the post-1.0 pre-filter ranker is needed.
The caveat is documented in: PR body (here),
get_action_risk_profiledocstring,OutcomePrediction.uncertainty_intervaldocstring, agent_loop annotation block comment, and the CausalLink class docstring. Cross-surface documentation perfeedback_interim_contamination.mdso the caveat cannot erode under refactor pressure.Context-averaging thesis caveat (pre-merge fold)
The Welford accumulator key is
(agent_id, event_signature)— NOT(agent_id, event_signature, context_hash). This averages variance across all contexts an agent has used a tool in. A substrate-faithful version would condition variance on context so a tool that is reliable against straw dummies but erratic against armored knights surfaces as two distinct entries the LLM reads separately. Wire 1 ships the averaged version to keep 0.9.1 scope tight; the context-conditioned version is post-1.0 cleanup if Roy-3 finds the averaged surface insufficient. The caveat is elevated to a thesis caveat (not just a scope caveat) in the accumulator's init docstring so a future refactor cannot silently entrench the averaging.Welford correctness
The online algorithm is numerically stable — no
Σ(x²) − (Σx)²/n / ncatastrophic cancellation at low-variance + high-n. The accumulator fires exactly once per outcome in_record_outcome_impl(NOT per eligibility-trace credit-distribution event —distribute_rewardtouches_reward_bias, not the Welford state). Zero-observation, n<2, and zero-variance cases return the(0.0, 0.0)uncertainty sentinel without divide-by-zero. Verified by 58 Wire 1 unit tests covering Welford correctness across N, fire-once-per-outcome, persistence round-trip, OutcomePrediction.uncertainty_interval helper parity, get_action_risk_profile, agent_loop annotation assembly, env ablation gate, observe()-skip divergence, and end-to-end pin.Persistence
_event_outcome_welfordround-trips through NAcdump()/load_state()with composite key joined by\x1f(ASCII unit separator)._NAC_FORMAT_VERSIONbumped 1.1 → 1.2 (Wire 2 introduced 1.0→1.1). Backward-compat reader handles missing field on pre-Wire-1 dumps as empty dict; first new outcome bootstraps state cleanly. Corrupt entries are skipped without crashing load. Wire 2's version-pin tests updated to semver-style ratchet (>= 1.1) so future bumps don't regress them.Validation
Reference
Test plan
🤖 Generated with Claude Code