Skip to content

fix: sync upstream + revive AdaBoost & ResidualChain (Object=M pattern, closes #3)#4

Merged
jagan-nuvai merged 9 commits intomasterfrom
chore/sync-upstream-2026-05-01
May 1, 2026
Merged

fix: sync upstream + revive AdaBoost & ResidualChain (Object=M pattern, closes #3)#4
jagan-nuvai merged 9 commits intomasterfrom
chore/sync-upstream-2026-05-01

Conversation

@jagan-nuvai
Copy link
Copy Markdown

@jagan-nuvai jagan-nuvai commented May 1, 2026

Summary

Two concerns bundled (originally just the upstream sync, then expanded with the structural fix):

  1. Sync with upstream rust-ml/linfa master (7 commits since 0.8.1)
  2. Revive AdaBoost and ResidualChain by reworking their trait bounds with the Object = M pattern, eliminating the trait-solver recursion under ndarray 0.17 (closes linfa-ensemble: AdaBoost trait bounds need structural rework (orphaned in #2) #3)

If preferred, I can split into two separate PRs (sync-only first, then rework). Bundled here for atomic review since the rework un-orphans what the sync orphaned.

Part 1 — Upstream sync (commit 47c7ed1)

7 incoming commits from rust-ml/linfa master:

Commit Description
1abc88f feat: add symmetric mean absolute percentage error (sMAPE) (rust-ml#437)
12c6c73 fix: realign PreprocessingError variants with error strings (rust-ml#434)
17f8696 Fix label ordering in binary logistic regression (rust-ml#432)
c7c2af5 Add generic ResidualChain composing method (rust-ml#430)
b1f9ddb Relax required test score
2197362 Update to Zola 0.22
1de164b feat(linfa-tsne): update to bhtsne 0.5.4 (rust-ml#429)

4 conflicts resolved (3 Cargo.toml regions + 1 lib.rs autotraits test that took upstream's L2Dist API change).

Part 2 — AdaBoost + ResidualChain rework (commit 2250107)

Both wrappers used where P: Fit<...> + Clone plus a direct P::Object: SomeTrait bound, which under ndarray 0.17 cascaded into P: ParamGuard not satisfied → infinite <<P::Checked>::Checked>::Checked: ParamGuard regress.

The proper fix mirrors the working EnsembleLearnerValidParams template (linfa-ensemble/src/algorithm.rs:137): introduce a fresh generic M for the inner model type and bind via Fit<..., Object = M>. This decouples the trait solver's job into independent linear obligations:

  • "P implements Fit with Object = M" (verified by where-clause assertion)
  • "M implements " (independent constraint)

Instead of the chained projection that forces blanket-impl resolution.

AdaBoost (algorithms/linfa-ensemble/src/adaboost.rs):

impl<D, T, P, M, R> Fit<Array2<D>, T, Error> for AdaBoostValidParams<P, R>
where
    D: Clone + ndarray::ScalarOperand,
    T: FromTargetArrayOwned + AsTargets,
    T::Elem: Copy + Eq + Hash + std::fmt::Debug + Into<usize>,
    T::Owned: AsTargets<Elem = <T as AsTargets>::Elem>,
    P: Fit<Array2<D>, T::Owned, Error, Object = M>,
    M: PredictInplace<Array2<D>, T::Owned>,
    R: Rng + Clone,
    usize: Into<T::Elem>,

ResidualChain (src/composing/residual_chain.rs):

impl<F1, F2, M1, M2, F: Float, D: Data<Elem = F> + RawDataClone, T, E1, E2>
    Fit<Arr2<D>, T, ResidualChainError<E1, E2>> for ResidualChain<F1, F2, F>
where
    Arr2<D>: Records,
    F1: Fit<Arr2<D>, T, E1, Object = M1>,
    F2: Fit<Arr2<D>, Array1<F>, E2, Object = M2>,
    for<'a> M1: Predict<&'a Arr2<D>, Array1<F>>,
    T: AsTargets<Elem = F, Ix = Ix1>,
    E1: std::error::Error + From<crate::error::Error>,
    E2: std::error::Error + From<crate::error::Error>,

No public API change: users still write AdaBoostParams::new(DecisionTree::params()).fit(&train) and params1.chain(params2).fit(&train) exactly as before. The M generic is inferred from the call site.

Sources/tests/example restored

  • algorithms/linfa-ensemble/src/lib.rsmod adaboost; re-added, 6 AdaBoost test fns un-gated from #[cfg(any())], doc comment AdaBoost section restored.
  • algorithms/linfa-ensemble/examples/adaboost_iris.rs — restored from upstream master (134 lines, no migration needed).
  • src/composing/mod.rspub mod residual_chain; re-added, doc comment "four composition models" restored.

Verification on x86_64-linux (knuckles-dev VM)

Check Result
cargo check --workspace --all-targets exit 0, 0 warnings
cargo test -p linfa-ensemble --lib 16 passed (6 AdaBoost tests now active alongside 10 existing)
cargo test --workspace running, will update once complete

Notes for reviewer

  • Issue linfa-ensemble: AdaBoost trait bounds need structural rework (orphaned in #2) #3 closes: both AdaBoost and ResidualChain are revived with the rework. The Object = M pattern is what actually works — the failed alternatives explored are documented in commit 2250107's message body for posterity.
  • Pattern worth documenting upstream (rust-ml/linfa): wrappers that take generic Fit-able params should bind Fit<..., Object = M> rather than P::Object: .... Saves the next contributor from the same trap. Optional follow-up: contribute a CONTRIBUTING.md note or wrap-pattern docs.
  • Single bundled PR is offered for atomicity (the rework's diff makes no sense without the sync's orphan being present). Splitting is doable but adds chained-PR complexity.

Closes #3

AnthonyMichaelTDM and others added 8 commits February 3, 2026 22:07
* feat(tsne): update to bhtsne 0.5.4

the bhtsne crate was fairly outdated, this new version is parallelized and allows using a custom distance metric (hence the added dependency on linfa-nn).
That said, the version does not let you set the RNG used to initialize sampling (hence the removal of rng related apis).

* test(tsne): add test for iris dataset separation using bharnes-hut

should resolve that code coverage thing
New test was added following bhtsne bump (which cannot now be seeded). Relaxed tolerance to make it more robust in CI
* feat: add linfa-residual-sequence crate

Implements ResidualSequence Struct and StackWith trait for composing regression
models in a boosting / residual-stacking pattern. The second (and any
further) model trains on the residuals left by the previous one;
predictions are summed.

Docs and tests were written with AI assistance.

* remove doc link

* move to composing/ module in linfa main crate

* update docs

* implement PredictInplace instead

* remove unused param error

* use one struct to implement stacking

* add deep chain test

* Rename to  ResidualChain
Implement Shrinkage
implement paramguard for shrinkage

* satisfy zola

* can only shrink by if target has the same float type

* work with predict inplace only

* zola fix

* add link in docs

* simplify comparison

* rename to residual_chain as consistent with struct

* implement copy trait

* add method `chain` which just chains self with an unshrunk corrector.
rename stack_with -> chain_shrunk

* add doc
…-ml#432)

 it now assings positive/negative labels based on Ord ordering of the label values rather than encouter order in the training dat
* feat: add smape metric

* refactor: update sMAPE to Adjusted version and add multi target test

Updated sMAPE formula to Adjusted (Makridakis, 1993) version

Changed output scaling to [0, 200]

Added test_symmetric_mean_absolute_percentage_error_multi_target

* refactor: use F::epsilon instead of f64 cast

Co-authored-by: Rémi Lafage <remi.lafage@onera.fr>

---------

Co-authored-by: Rémi Lafage <remi.lafage@onera.fr>
Brings 7 upstream commits since the 0.8.1 release:
- 1abc88f feat: add symmetric mean absolute percentage error (sMAPE) (rust-ml#437)
- 12c6c73 fix: realign PreprocessingError variants with error strings (rust-ml#434)
- 17f8696 Fixes rust-ml#393. label ordering in binary logistic regression (rust-ml#432)
- c7c2af5 Add generic ResidualChain composing method (rust-ml#430)
- b1f9ddb Relax required test score
- 2197362 Update to Zola 0.22 (docs tooling)
- 1de164b feat(linfa-tsne): update to bhtsne 0.5.4 (rust-ml#429)

Conflict resolution (4 sites in 3 files):
- Cargo.toml dev-dependencies: kept our git-pinned statrs, added upstream's
  new linfa-linear and linfa-svm path entries.
- linfa-tsne/Cargo.toml [dependencies]: ndarray 0.17 (ours) + bhtsne 0.5.4
  (upstream's bump). Dropped ndarray-rand and pdqselect from main deps as
  upstream did (ndarray-rand moved to dev-deps; pdqselect was for old bhtsne).
- linfa-tsne/Cargo.toml [dev-dependencies]: rand 0.9 (ours), ndarray-rand 0.16
  (added per upstream's structural move; bumped to rand-0.9-compat version).
- linfa-tsne/src/lib.rs autotraits test: took upstream's L2Dist API change.
  Upstream's rust-ml#430-era refactor changed TSneParams' second type param from a
  Uniform distribution to a Distance metric. Our rand 0.9 migration of the
  old API path is obsoleted; upstream's new shape is correct.

ResidualChain orphaned pending follow-up
- Removed `pub mod residual_chain;` and its export from src/composing/mod.rs.
- Source file src/composing/residual_chain.rs left on disk for revival.
- Reason: Same trait-recursion class as AdaBoost (issue #3). The
  `where F1: Fit + ParamGuard` shape on a generic-over-Fit-able-types wrapper
  triggers infinite trait-solver demands under ndarray 0.17 because Linfa's
  Fit blanket impl keeps demanding `<F1::Checked>::Checked: ParamGuard`
  recursively.

Verified on x86_64-linux (knuckles-dev VM):
  cargo check --workspace --all-targets   -> exit 0, 0 warnings
  cargo test  --workspace                 -> 471 passed, 0 failed
…loses #3)

Both AdaBoost (linfa-ensemble) and ResidualChain (linfa core's composing
module) wrap a generic Fit-able parameter type and call methods on the
resulting model. Their original where-clauses (`P: Fit<...> + Clone` plus a
direct `P::Object: SomeTrait<...>` constraint) triggered an infinite trait-
solver recursion under ndarray 0.17:

  required for `P` to implement `Fit<...>`
    -> `P: ParamGuard not satisfied`  (via Linfa's ParamGuard blanket impl)
    -> `<P::Checked>: ParamGuard not satisfied`
    -> `<<P::Checked>::Checked>: ParamGuard not satisfied`
    -> ... ad infinitum

PR #2 and PR #4 worked around the issue by orphaning these features
(removing their `mod` declarations + gating their tests). This commit
properly fixes both with a single template that mirrors the working
EnsembleLearnerValidParams pattern in algorithms/linfa-ensemble/src/algorithm.rs:

Pattern: introduce a fresh generic `M` (and `M2` where two inner models exist)
for each inner-model type, and bind the associated `Object` type at the trait
bound via `Fit<..., Object = M>`. This decouples the chase: the solver
verifies "P implements Fit with Object=M" and "M implements <model trait>" as
two independent linear obligations, instead of the recursive projection
P::Object that forces blanket-impl resolution.

linfa-ensemble/src/adaboost.rs - AdaBoost rework
- Drop the failed `P: ParamGuard + Clone, <P::Checked>::Checked: ...` chain.
- New impl signature: `impl<D, T, P, M, R> Fit<Array2<D>, T, Error> for
  AdaBoostValidParams<P, R>` with `P: Fit<Array2<D>, T::Owned, Error,
  Object = M>` and `M: PredictInplace<Array2<D>, T::Owned>`.
- Use `T::Owned` (not `T`) for the inner Fit target — matches
  EnsembleLearner's pattern; the projection lets the solver short-circuit
  before hitting the blanket route.
- `type Object = AdaBoost<M, T::Elem>` (was `AdaBoost<P::Object, T::Elem>`).
- Drop now-redundant `+ ParamGuard` import.

src/composing/residual_chain.rs - ResidualChain rework
- New impl signature: `impl<F1, F2, M1, M2, F, D, T, E1, E2> Fit<...> for
  ResidualChain<F1, F2, F>` with `F1: Fit<..., Object = M1>` and
  `F2: Fit<..., Object = M2>`. T::Owned isn't needed here because T's
  pre-existing `AsTargets<Elem = F, Ix = Ix1>` already pins enough shape.
- `for<'a> M1: Predict<&'a Arr2<D>, Array1<F>>` (was `F1::Object: ...`).
- `type Object = ResidualChain<M1, M2, F>` (was `ResidualChain<F1::Object,
  F2::Object, F>`).

Re-add the orphans:
- linfa-ensemble/src/lib.rs: restore `mod adaboost;` + `mod
  adaboost_hyperparams;` + `pub use ...`, restore AdaBoost section in
  module-level docs, un-gate the 6 AdaBoost test fns from `#[cfg(any())]`.
- linfa-ensemble/examples/adaboost_iris.rs: restore from upstream master
  (134 lines, no migration needed beyond what's already in tree).
- src/composing/mod.rs: restore `pub mod residual_chain;`, restore the
  ResidualChain bullet in module-level docs.

Public API unchanged - users still write
  `AdaBoostParams::new(DecisionTree::params()).fit(&train)`
and
  `params1.chain(params2).fit(&train)`
exactly as before. The model-type generic `M` is inferred at the call site.

Verified on x86_64-linux (knuckles-dev VM):
  cargo check --workspace --all-targets   -> exit 0, 0 warnings
  cargo test  -p linfa-ensemble --lib     -> 16 passed (6 AdaBoost tests now active)
@jagan-nuvai jagan-nuvai changed the title chore: sync with upstream rust-ml/linfa (7 commits since 0.8.1) fix: sync upstream + revive AdaBoost & ResidualChain (Object=M pattern, closes #3) May 1, 2026
@jagan-nuvai
Copy link
Copy Markdown
Author

Final test verification — full workspace

cargo test --workspace on x86_64-linux (knuckles-dev VM): 509 passed, 0 failed (511s wall time).

Net delta vs. pre-rework state:

  • +6 AdaBoost tests (un-gated from #[cfg(any())], now active)
  • +~32 ResidualChain tests (pub mod residual_chain; re-added, all passing)
  • Total: +38 tests restored to the suite

The Object = M rework is fully validated:

  • Compile-time: cargo check --workspace --all-targets zero warnings
  • Run-time: all six AdaBoost iris/synthetic tests + all ResidualChain doctests/integration tests + every other test in the workspace passes

Ready for review/merge.

@jagan-nuvai jagan-nuvai merged commit fad1e4a into master May 1, 2026
33 of 36 checks passed
@jagan-nuvai jagan-nuvai deleted the chore/sync-upstream-2026-05-01 branch May 1, 2026 05:25
jagan-nuvai added a commit that referenced this pull request May 1, 2026
Signals our fork's divergence from upstream rust-ml/linfa 0.8.1:
- ndarray 0.16 -> 0.17 (foundational API surface change)
- rand 0.8 -> 0.9 (bumped because ndarray-rand pinned the major)
- ndarray-stats 0.6 -> 0.7, ndarray-linalg 0.17 -> 0.18, ndarray-npy 0.9 -> 0.10
- statrs git pin (rand 0.9 compat), sprs 0.11.4, rand_xoshiro 0.7
- Forks of linfa-linalg + argmin pulled into git source pins
- Plus bug fixes: linfa-kernel infinite recursion (PR #2), AdaBoost +
  ResidualChain trait-bound rework (PR #4) — public API unchanged

Bumped 73 version sites across 18 Cargo.toml files (workspace member crate
versions + path-dependency version specs). The -nuvai.1 prefix marks this
as the first release in the Nuvai-patched 0.9.x line; subsequent maintenance
releases will be -nuvai.2, etc. Distinguishes from any future upstream 0.9.0.

Verified: cargo check --workspace -> exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

linfa-ensemble: AdaBoost trait bounds need structural rework (orphaned in #2)

6 participants