Skip to content

feat(0.9.1): Wire 1 — risk-sensitive action annotation (Stage 4)#257

Merged
dennys246 merged 2 commits into
mainfrom
feat/0-9-1-wire-1-risk-annotation
May 17, 2026
Merged

feat(0.9.1): Wire 1 — risk-sensitive action annotation (Stage 4)#257
dennys246 merged 2 commits into
mainfrom
feat/0-9-1-wire-1-risk-annotation

Conversation

@dennys246
Copy link
Copy Markdown
Owner

Summary

  • Wire 1 of release_0_9_1.md (Stage 4) — risk-sensitive action annotation. Substrate-acquired outcome variance reaches the LLM through experience-voice tool-description annotations. Hybrid bio-system + LLM design preserved; a pure substrate-primary pre-filter ranker is the post-1.0 cleaner path.
  • Two pre-merge two-lens reviews (executor + bio) raised 22 findings — folded in commit 2; cross-confirmed findings prioritised per feedback_cross_confirmed_review_findings.md.

Wires

  • NAc._event_outcome_welford: per-(agent_id, event_signature) Welford online variance over the binary reward signal. Updated once per outcome in _record_outcome_impl under self._lock. The plan originally placed variance on CausalLink; per-link variance is structurally 0 for binary outcomes because _generate_link_id keys on outcome_signature (which embeds valence) — so the accumulator moved up one level. New CLAUDE.md lesson "Key-embedded values produce structurally-degenerate statistics" generalises the pattern.
  • NAc.get_action_risk_profile(event_sig=None, *, agent_id, min_observations=5) returns {event_signature → variance} per agent. Empty agent_id raises ValueError (CC4 rule).
  • OutcomePrediction.uncertainty_interval populated in both _predict_impl and predict_all_outcomes via the shared NAc._uncertainty_for helper (single source of truth; no sibling-method silent-no-op). Sentinel (0.0, 0.0) on cold-start, missing agent_id, n<2, or variance=0.
  • agent_loop tool annotation hook runs after Wire 3. Felt-experience phrasing: (unpredictable from prior experience) / (reliable from prior experience) — distinct register from Wire 3's somatic (feels strained) / (feels weakened). Idempotent under repeated observations; strips stale annotation when variance drifts to the neutral band.
  • WIRE_1_ANNOTATION sim_log event mirrors Wire 3's WIRE_3_FILTER for Roy-3 measurability. Carries agent_id, high_variance_tools, reliable_tools, felt_phrases (exact LLM-visible strings), annotated_variances (numeric floats), and middle_band_variances (counterfactual — substrate-variance present but no annotation reached the prompt).
  • MAXIM_DISABLE_VARIANCE_ANNOTATION env-var ablation gate reuses Wire-A's canonical annotation_disabled_via_env parser. Conftest autouse scrub fixture clears it between tests.

Plan deviation: variance lives on NAc, not CausalLink

The plan placed variance_estimate on CausalLink. Implementation found per-link variance is structurally 0 for binary outcomes because _generate_link_id keys on outcome_signature which embeds valence — each (event_sig, outcome_valence) pair allocates a separate link, so the per-link reward stream is constant-valued. The fix moves variance one level up to NAc._event_outcome_welford keyed by (agent_id, event_signature), the level where the cross-outcome distribution actually lives. This is the root-cause architecture, not a workaround — per the no-band-aid rule. The new CLAUDE.md lesson generalises the pattern for bandit per-arm estimates, goal-conditioned success rates, and other shapes where the statistic accumulator's key embeds the dimension to vary over.

Honest scope caveat (preserved)

Wire 1's behavioural effect goes through the LLM (it reads the annotations and adjusts). It is hybrid bio + LLM, not pure substrate-driven. A cleaner post-1.0 design adds a real risk-weighted action ranker that pre-filters tools before the LLM sees them. The hybrid version ships in 0.9.1 to keep scope tight. Roy-3's three-arm comparison will reveal whether the hybrid annotation carries enough substrate signal or whether the post-1.0 pre-filter ranker is needed.

The caveat is documented in: PR body (here), get_action_risk_profile docstring, OutcomePrediction.uncertainty_interval docstring, agent_loop annotation block comment, and the CausalLink class docstring. Cross-surface documentation per feedback_interim_contamination.md so the caveat cannot erode under refactor pressure.

Context-averaging thesis caveat (pre-merge fold)

The Welford accumulator key is (agent_id, event_signature) — NOT (agent_id, event_signature, context_hash). This averages variance across all contexts an agent has used a tool in. A substrate-faithful version would condition variance on context so a tool that is reliable against straw dummies but erratic against armored knights surfaces as two distinct entries the LLM reads separately. Wire 1 ships the averaged version to keep 0.9.1 scope tight; the context-conditioned version is post-1.0 cleanup if Roy-3 finds the averaged surface insufficient. The caveat is elevated to a thesis caveat (not just a scope caveat) in the accumulator's init docstring so a future refactor cannot silently entrench the averaging.

Welford correctness

The online algorithm is numerically stable — no Σ(x²) − (Σx)²/n / n catastrophic cancellation at low-variance + high-n. The accumulator fires exactly once per outcome in _record_outcome_impl (NOT per eligibility-trace credit-distribution event — distribute_reward touches _reward_bias, not the Welford state). Zero-observation, n<2, and zero-variance cases return the (0.0, 0.0) uncertainty sentinel without divide-by-zero. Verified by 58 Wire 1 unit tests covering Welford correctness across N, fire-once-per-outcome, persistence round-trip, OutcomePrediction.uncertainty_interval helper parity, get_action_risk_profile, agent_loop annotation assembly, env ablation gate, observe()-skip divergence, and end-to-end pin.

Persistence

_event_outcome_welford round-trips through NAc dump() / load_state() with composite key joined by \x1f (ASCII unit separator). _NAC_FORMAT_VERSION bumped 1.1 → 1.2 (Wire 2 introduced 1.0→1.1). Backward-compat reader handles missing field on pre-Wire-1 dumps as empty dict; first new outcome bootstraps state cleanly. Corrupt entries are skipped without crashing load. Wire 2's version-pin tests updated to semver-style ratchet (>= 1.1) so future bumps don't regress them.

Validation

  • 58 Wire 1 unit tests pass
  • Full fast suite: 6808 passed, 9 skipped, 40 deselected (initial run before fold)
  • Post-fold wires + persistence regression: 308 passed
  • mypy public API surface: clean
  • ruff format + lint: clean

Reference

Test plan

  • 58 Wire 1 unit tests pass
  • Wires + persistence regression suite passes (308 tests)
  • mypy public API clean
  • ruff format + lint clean
  • Full fast suite green post-fold (rerun in flight at time of PR open)

🤖 Generated with Claude Code

dennys246 and others added 2 commits May 17, 2026 11:21
Lifted from docs/plans/bio_emergent_persona_foundations.md § Wire 1
and docs/plans/release_0_9_1.md § Stage 4. Substrate-acquired outcome
variance reaches the LLM through felt-sensation tool-description
annotations — hybrid bio-system + LLM design preserved (the post-1.0
cleaner pre-filter ranker is documented as the future direction).

## What ships

- NAc._event_outcome_welford — per-(agent_id, event_signature)
  Welford online variance state on the binary reward signal across
  outcomes. Updated once per outcome in _record_outcome_impl under
  self._lock. CC4 per-agent stash discipline (required agent_id
  derived from event_context["agent_id"]; outcomes without the tag
  silently skip the accumulator — documented contract + regression
  test).
- NAc.get_action_risk_profile(event_sig=None, *, agent_id,
  min_observations=5) — returns {event_signature → variance}
  filtered by agent_id and min observations. Empty agent_id raises
  ValueError per the CC4 stash rule.
- OutcomePrediction.uncertainty_interval populated in
  _predict_impl from the NAc-level Welford state. Reserved field
  contract (PR #216) preserved: (lower, upper) tuple, sentinel
  (0.0, 0.0) on cold-start or when context lacks agent_id.
- agent_loop.py tool annotation hook runs after Wire 3's
  integrity-band annotation. Felt-sensation phrasing
  (feels unpredictable) / (feels predictable) matches Wire 3's
  register (Wire-A's bracketed style is reserved for its own
  prompt-section surface). Idempotent under repeated observations;
  strips stale annotation when variance drifts back into the neutral
  band.
- WIRE_1_ANNOTATION sim_log event mirrors Wire 3's WIRE_3_FILTER
  shape so Roy-3 can measure annotation effect without behavioral
  inference. Carries high_variance_tools / reliable_tools /
  annotated_variances for post-hoc analysis.
- MAXIM_DISABLE_VARIANCE_ANNOTATION env-var ablation gate mirrors
  Wire-A's pattern (default OFF / annotation ON). Conftest autouse
  scrub fixture in tests/conftest.py per
  feedback_opt_in_env_in_hot_paths.md.

## Plan deviation: variance lives on NAc, not CausalLink

The plan originally placed variance_estimate on CausalLink. During
implementation we found _generate_link_id keys on outcome_signature
(which embeds valence), so each (event_sig, outcome_valence) pair
becomes a separate link. Per-link Welford variance on binary
success/failure is then structurally 0 — useless as the "is this tool
reliable" signal Wire 1 needs. Moving the accumulator one level up to
NAc._event_outcome_welford keyed by (agent_id, event_signature)
captures the cross-link outcome heterogeneity that drives meaningful
annotation. The deviation is documented in the CausalLink docstring
and the test file's module docstring.

## Welford correctness

The online algorithm is numerically stable — no Σ(x²) − (Σx)²/n / n
catastrophic cancellation at low-variance + high-n. The accumulator
fires exactly once per outcome in _record_outcome_impl (NOT per
eligibility-trace credit-distribution event — that path goes through
distribute_reward which touches _reward_bias, not the Welford state).
Zero-observation and n < 2 return the (0.0, 0.0) uncertainty sentinel
without divide-by-zero. All three pre-merge correctness concerns
covered by unit tests.

## Honest scope caveat (preserved)

Wire 1's behavioral effect goes through the LLM (it reads the
annotations and adjusts). It is hybrid bio-system + LLM, not pure
substrate-driven. A cleaner post-1.0 design adds a real risk-weighted
action ranker that pre-filters tools before the LLM sees them. The
hybrid version ships in 0.9.1 to keep scope tight. Roy-3's three-arm
results will reveal whether this hybrid wiring carries enough
substrate signal or whether the post-1.0 pre-filter ranker is needed.

## Persistence

_event_outcome_welford round-trips through NAc dump() / load_state().
Composite key joined with \x1f (ASCII unit separator) to handle
event_signatures containing :. Backward-compat: missing field on
pre-0.9.1 dumps loads as empty dict; first new outcome bootstraps
state cleanly. Corrupt entries are skipped without crashing load.

## Test coverage (49 tests)

- Welford correctness across N (one, two-identical, alternating,
  small-n naive cross-check, numerical stability at 1000 samples,
  monotone-grows-with-alternation).
- Fires once per outcome (record_outcome increment, distribute_reward
  no-op, no-agent_id skip).
- Persistence round-trip (state preserved, backward-compat empty load,
  corrupt entries skipped).
- OutcomePrediction.uncertainty_interval (cold-start sentinel, high
  variance widens, low variance narrows, no-agent_id sentinel).
- get_action_risk_profile (empty-agent_id ValueError, cold-start
  empty, below-threshold filter, threshold-inclusive, per-agent
  isolation on shared NAc, event_sig filter, agent_id-less links
  skipped).
- agent_loop annotation assembly (high/low/middle band, idempotency,
  band transition stripping, Wire 1 / Wire 3 coexistence, tool:use:*
  skipped).
- Env ablation gate parser shape.
- End-to-end pin: outcomes → get_action_risk_profile → band
  classification.

## Roy-3 measurability

WIRE_1_ANNOTATION sim_log events make post-hoc analysis structurally
possible — distinguishes "annotation reached LLM" from "LLM ignored
annotation" without behavioral inference. Roy-3 can count
annotation-on vs annotation-off divergence on tool-family choice
distributions via the new env-var gate.

## Fast-suite green

6808 passed, 9 skipped, 40 deselected (7m45s). Zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two parallel pre-merge reviews (architecture lens + bio-fidelity lens)
raised 22 findings across 4 SEVERE + 10 SIGNIFICANT + 8 NIT bands. The
cross-confirmed findings (called out by both lenses independently)
landed first per `feedback_cross_confirmed_review_findings.md`:

  - Honest scope caveat at consumer site:   Bio S3 + Exec G6
  - WIRE_1_ANNOTATION measurability gap:    Bio S4 + Exec N1
  - Multi-attribution / agent_id discipline: Bio S2 + Exec G6/G1/N4

## Architecture-lens folds

- **Exec S1**: Fixed orphan `variance_estimate` docstring references
  on `CausalLink` and `OutcomePrediction.uncertainty_interval` — both
  now point at `NAc._event_outcome_welford[(agent_id, event_sig)]`,
  consistent with the moved accumulator.
- **Exec S2**: Bumped `_NAC_FORMAT_VERSION` from "1.1" → "1.2". Wire 2
  introduced 1.0→1.1 (percept_valences); Wire 1 adds the
  `event_outcome_welford` top-level key so the version ratchets
  forward per the CLAUDE.md "Persistence-format contract" rule.
  Backward-compat reader handles missing keys on 1.0 and 1.1 payloads
  as empty dicts. Updated `test_wire_2_percept_aversion.py` tests to
  pin `>= 1.1` (semver-style ratchet) rather than the exact 1.1
  string so future bumps don't regress this test.
- **Exec S3 + G6**: Extracted `NAc._uncertainty_for(event_signature,
  predicted_value, context)` helper used by both `_predict_impl` AND
  `predict_all_outcomes`. The sibling-method silent-no-op (predict
  populated uncertainty_interval; predict_all_outcomes did not) is
  now structurally impossible. Helper covers the four sentinel
  conditions in one place: no agent_id in context, no Welford state
  for the pair, n < 2, variance == 0. Added regression tests.
- **Exec G2**: Replaced inlined `_WIRE1_TRUTHY` frozenset with a call
  to Wire-A's canonical `annotation_disabled_via_env` parser from
  `prompts/cluster_bias_annotation.py`. Single source of truth across
  0.9.1's two annotation gates — a future change to the truthy set
  flows to both wires.
- **Exec G3**: Documented the `observe()` Welford-skip divergence in
  `NAc.observe()` docstring + added regression test
  `TestObservePathSkipsWelford` (two tests). The asymmetry is
  preserved by design (post-1.0 unification cleanup); pinning the
  contract surfaces the divergence at test time for any future
  consumer that relies on `get_action_risk_profile` after only
  calling `observe()`.
- **Exec G5**: Added concurrent-mutation safety note to the
  `_event_outcome_welford` init docstring. The dict value is mutated
  in-place under `self._lock`; a future refactor that replaces
  `state["n"] += 1.0` with `_event_outcome_welford[key] = state.copy()`
  would silently lose updates if the lock isn't held across the full
  read-modify-write — the docstring now names the trap.
- **Exec G6** (covered by S3 helper).
- **Exec N1**: Enriched `WIRE_1_ANNOTATION` sim_log payload with
  `agent_id` (multi-agent attribution), `felt_phrases` (exact strings
  the LLM saw), and `middle_band_variances` (counterfactual surface
  for Roy-3 ablation analysis — tools that have substrate variance
  but fell in the no-annotation band).
- **Exec N3**: Added exclusive-boundary test for `min_observations`
  (n=4 returns empty profile under default min=5).
- **Exec N4**: Added agent_id silently-skip note to `record_outcome`
  docstring.
- **Exec N5**: Documented lifetime-cumulative behaviour and the
  per-tick decay hook location in `_event_outcome_welford` init.
- **Exec N6**: Added `test_multiple_tool_use_compound_signatures_skipped`
  test pinning the skip loop's behaviour with two tool:use:X entries.

## Bio-fidelity-lens folds

- **Bio S1 (SEVERE — phrasing register)**: Shifted Wire 1 phrasing
  from "(feels unpredictable)" / "(feels predictable)" to
  **"(unpredictable from prior experience)"** /
  **"(reliable from prior experience)"**. The original "feels X"
  phrasing collapsed Wire 1's metacognitive signal into Wire 3's
  somatic register — both surfaces used the same "feels X" stem
  with the same parenthesization, so the LLM could not separate
  "I will fail because the body is broken" (Wire 3, proprioceptive)
  from "I will fail because the outcome is stochastic" (Wire 1,
  experience-acquired). The new experience-voice phrasing aligns
  with Wire-A's "[... from prior experience]" register, keeping the
  two experience-acquired signals coherent across wires while Wire 3
  owns the somatic surface alone. Updated regex, constants,
  annotation block, and all relevant tests.
- **Bio S2 (SEVERE — context-grain caveat)**: Elevated the
  context-averaged variance trade-off from a scope caveat to a
  **THESIS CAVEAT** in the `_event_outcome_welford` init docstring.
  The key is `(agent_id, event_signature)`, NOT
  `(agent_id, event_signature, context_hash)`. The substrate-faithful
  version would condition variance on context so a tool that's
  reliable against straw dummies but erratic against armored knights
  surfaces as two distinct entries the LLM can read separately. Wire 1
  ships the averaged version to keep 0.9.1 scope tight; the
  context-conditioned version is post-1.0 cleanup if Roy-3 finds the
  averaged surface insufficient. Caveat is now load-bearing in the
  docstring so a future refactor cannot silently entrench the
  averaging.
- **Bio S3**: Added scope caveat paragraph to `get_action_risk_profile`
  docstring AND `OutcomePrediction.uncertainty_interval` docstring
  naming the hybrid bio + LLM design. Future readers inspecting
  either surface alone will see the caveat (previously documented
  only in the commit body, where it could erode under refactor
  pressure per `feedback_interim_contamination.md`).
- **Bio S4**: WIRE_1_ANNOTATION payload now carries `felt_phrases`
  (the LLM-visible text per tool) and `middle_band_variances` (the
  counterfactual — tools with substrate variance but no annotation).
  Roy-3 post-hoc analysis can now distinguish (a) substrate produced
  variance, (b) annotation reached prompt, (c) LLM chose differently,
  from each other without behavioural inference.
- **Bio S6**: Reframed the CausalLink plan-deviation docstring from
  apology to architectural decision. The moved accumulator IS the
  root-cause fix per the no-band-aid rule; the new wording leads with
  the architecture, not the deviation, and references the new
  CLAUDE.md lesson.
- **Bio N1**: Added "Key-embedded values produce structurally-
  degenerate statistics" lesson to CLAUDE.md `## Lessons learned`.
  The rule generalises: if your statistic accumulator's key embeds
  the dimension you want to vary over, the per-key statistic is
  structurally 0. Applies beyond variance to bandit per-arm reward
  estimates, goal-conditioned success rates, etc. Cites this Wire 1
  finding alongside the existing `_context_similarity` denominator
  lesson — same family of silent degenerate-statistic shapes.

## Findings explicitly deferred to post-1.0

- **Bio S5**: NAc state taxonomy (first-moment vs second-moment).
  Conceptual refactor; out of 0.9.1 scope.
- **Bio N2**: Single `felt_suffix.py` module for shared Wire-3 /
  Wire-1 regex composition. Cosmetic; flagged in agent_loop.py
  comment for post-1.0 cleanup.
- **Exec G1**: Multi-attribution variance asymmetry. Production
  callers ship one event per outcome; the punt is documented in the
  accumulator init docstring. Pre-emptive fix awaits a real
  multi-attribution caller.
- **Exec G4**: Integration test through PredictionContext (planner
  surface). Added a basic pin via `test_predict_helper_with_prediction_context`;
  the deeper integration awaits the planner-side consumer.
- **Exec N2**: `state["n"]` float-typing rationale. Documented
  accurately ("uniform numeric type avoids isinstance branching in
  load_state") in the init docstring.

## Validation

- 58 Wire 1 tests pass (+ 9 from this fold round).
- Full Wires + persistence regression suite: 308 tests pass.
- mypy public API surface: clean.
- ruff format + lint: clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dennys246 dennys246 merged commit 6610566 into main May 17, 2026
5 checks passed
@dennys246 dennys246 deleted the feat/0-9-1-wire-1-risk-annotation branch May 17, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant