Make the method registry the single source of truth for foundation-mo… by andreasgoethals · Pull Request #108 · LAMDA-Tabular/TALENT

andreasgoethals · 2026-06-10T13:11:31Z

Summary

This PR eliminates drift between the method registry and the config JSONs for foundation-model row limits, fixes the torch.cuda.amp / use_reentrant deprecation warnings, and removes duplicated code found during a package-wide consistency audit. No new features — only correctness, consistency, and a single source of truth.

1. Registry as single source of truth for training-row caps

Previously each foundation model's row cap lived in two places — train_row_limit in model/method_registry.py and sample_size in the default/opt_space config JSONs — and several had silently drifted (e.g. tabicl_v2.json said 60k while the registry said 1M; tabdpt.json capped at 100k while TabDPT is unlimited).

New Method.resolve_sample_size() / Method.subsample_train_rows() helpers in model/methods/base.py: an explicit config['general']['sample_size'] acts as a per-run override; otherwise the registry's train_row_limit applies.
All foundation methods (tabpfn, tabpfn_v2, tabpfn_real, tabpfn_v2_5, tabpfn_v3, tabicl, tabicl_v2, tabdpt, mitra) now share this one seeded, stratified-for-classification subsampling path.
sample_size removed from all default and opt_space config JSONs, so configs can no longer disagree with the registry.

Effective caps after this PR (registry-governed):

Model	Before (config)	After (registry)
TabPFN (v1)	3,000	1,000
TabPFN v2 / Real-TabPFN	10,000	10,000
TabPFN v2.5	50,000	50,000
TabPFN v3	1,000,000	1,000,000
TabICL	uncapped	500,000
TabICL v2	60,000	1,000,000
Mitra	uncapped at fit	10,000
TabDPT	100,000	unlimited

2. Torch deprecation fixes

torch.cuda.amp.autocast(enabled=...) → torch.amp.autocast("cuda", enabled=...) in model/lib/tabpfn/utils.py and model/lib/limix/model/transformer.py.
use_reentrant=False added to all torch.utils.checkpoint.checkpoint(...) call sites in model/lib/tabpfn/utils.py and model/lib/tabpfn/layer.py (including the partial(checkpoint, ...) variant). This silences the current warnings and is required before torch makes the default a hard error.

3. Consistency / dead-code cleanup

check_softmax was defined 7 times across the package; it now lives once in model/utils.py and is imported everywhere else (names remain importable from their old locations).
Removed duplicate import torch statements from 11 method files.
Mutable default arguments (cat_indices=[]) replaced with the None pattern in tabpfn_v2.py, tabpfn_real.py, tabicl.py (matching tabpfn_v3.py).
Mitra predict() no longer raises KeyError when max_samples_support / max_samples_query are absent from the config (falls back to 8192 / 1024).
opt_space/tabpfn_v3.json had drifted from the default config (n_estimators 4 vs 32, sample_size 50k vs 1M) — aligned.
TabPFN v1's training subsample was unseeded (non-reproducible across runs); it now uses args.seed like every other method.
README: documented that train_row_limit in the registry governs row caps and sample_size is a per-run override.

Verification

python -m compileall clean over model/methods, model/classical_methods, model/utils.py, model/method_registry.py, api.py.
All config JSONs parse; no sample_size keys remain.
Unit-tested the new helpers: registry default resolution, config override precedence, stratified classification subsampling, uniform regression subsampling, uncapped models pass through unchanged, unknown model types fall back to "no cap" without raising.
Audited suspected bug padding_obs_query__ in mitra.py — confirmed it matches the upstream Mitra Tab2D.forward signature (not a typo; unchanged).

Fixes

TabPFN v2 / Real-TabPFN: chunk test-set inference in 8,192-row blocks to bound the
feature-attention CUDA kernel batch, fixing CUDA error: invalid configuration argument
on wide datasets with large test sets (predictions are identical; rows are scored
independently).
Silence UndefinedMetricWarnings by passing zero_division=0 to
precision_score in both metric() implementations (reported values unchanged —
sklearn already returned 0 for ill-defined precision).

…del row limits; fix torch deprecations and consolidate duplicated code

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR centralizes common utilities and training-row capping behavior across methods, reducing duplicated code and making per-method row limits come from the method registry.

Changes:

Added shared check_softmax() utility and removed duplicated inline implementations across multiple methods.
Introduced resolve_sample_size() / subsample_train_rows() on Method to standardize training-row caps based on train_row_limit (with general.sample_size override).
Updated configs to remove hard-coded sample_size defaults and adjusted some inference-related Torch AMP/checkpoint usage.

Reviewed changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
readme.md	Documents registry as source of truth for training-row caps and overrides.
TALENT/model/utils.py	Adds shared `check_softmax()` helper.
TALENT/model/methods/trompt.py	Switches to shared `check_softmax()` import.
TALENT/model/methods/tabptm.py	Removes duplicate `torch` import.
TALENT/model/methods/tabpfn_v3.py	Replaces local sample-size logic with `subsample_train_rows()`.
TALENT/model/methods/tabpfn_v2_5.py	Replaces local sample-size logic with `subsample_train_rows()`.
TALENT/model/methods/tabpfn_v2.py	Avoids mutable default arg; uses centralized `resolve_sample_size()`.
TALENT/model/methods/tabpfn_real.py	Avoids mutable default arg for `cat_indices`.
TALENT/model/methods/tabpfn.py	Replaces local sample-size logic with `subsample_train_rows()`.
TALENT/model/methods/tabnet.py	Removes duplicate `torch` import.
TALENT/model/methods/tabm.py	Switches to shared `check_softmax()` import.
TALENT/model/methods/tabicl_v2.py	Switches to shared `check_softmax()` and centralized row capping.
TALENT/model/methods/tabicl.py	Switches to shared `check_softmax()`, avoids mutable default arg, centralized row capping.
TALENT/model/methods/tabdpt.py	Replaces local sample-size logic with `subsample_train_rows()`.
TALENT/model/methods/tabcaps.py	Removes duplicate `torch` import.
TALENT/model/methods/ptarl.py	Removes duplicate `torch` import.
TALENT/model/methods/mitra.py	Adds centralized row capping; makes `max_samples_*` more robust with defaults.
TALENT/model/methods/limix.py	Removes duplicate `torch` import.
TALENT/model/methods/hyperfast.py	Removes duplicate `torch` import.
TALENT/model/methods/grownet.py	Removes duplicate `torch` import.
TALENT/model/methods/excelformer.py	Switches to shared `check_softmax()` import.
TALENT/model/methods/base.py	Adds `resolve_sample_size()` + `subsample_train_rows()` utilities for consistent capping.
TALENT/model/lib/tabpfn/utils.py	Updates checkpoint + autocast usage (incl. `use_reentrant=False`).
TALENT/model/lib/tabpfn/layer.py	Updates checkpoint usage to pass `use_reentrant=False`.
TALENT/model/lib/limix/model/transformer.py	Updates AMP autocast usage.
TALENT/model/classical_methods/base.py	Removes duplicated `check_softmax()` in favor of shared utility.
TALENT/configs/opt_space/tabpfn_v3.json	Removes `sample_size`; changes `n_estimators`.
TALENT/configs/opt_space/tabpfn_v2_5.json	Removes `sample_size`.
TALENT/configs/opt_space/tabpfn_v2.json	Removes `sample_size` and keeps empty `general`.
TALENT/configs/opt_space/tabpfn_real.json	Removes `sample_size` and keeps empty `general`.
TALENT/configs/opt_space/tabicl_v2.json	Removes `sample_size`.
TALENT/configs/opt_space/tabdpt.json	Removes `sample_size`.
TALENT/configs/default/tabpfn_v3.json	Removes `sample_size`.
TALENT/configs/default/tabpfn_v2_5.json	Removes `sample_size`.
TALENT/configs/default/tabpfn_v2.json	Removes `sample_size` and keeps empty `general`.
TALENT/configs/default/tabpfn_real.json	Removes `sample_size` and keeps empty `general`.
TALENT/configs/default/tabpfn.json	Removes `sample_size` and keeps empty `general`.
TALENT/configs/default/tabicl_v2.json	Removes `sample_size`.
TALENT/configs/default/tabdpt.json	Removes `sample_size`.

Comments suppressed due to low confidence (1)

TALENT/model/utils.py:1

check_softmax() checks normalization using sum(axis=-1) but computes the softmax using axis=1, which is inconsistent and will produce incorrect results (or errors) if logits isn’t strictly 2D with classes on axis 1. Use a consistent axis (typically axis=-1) for np.max/np.sum and the normalization check so the function works correctly for any (…, C) shaped input.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    def subsample_train_rows(self, X, y):
+        """
+        Cap the training rows at ``resolve_sample_size()`` rows.
+
+        Classification subsamples stratified by label to keep class
+        proportions; regression takes a uniform random subset. Both are
+        seeded with ``args.seed`` for reproducibility. Returns (X, y)
+        unchanged when no cap applies.
+        """
+        sample_size = self.resolve_sample_size()
+        if sample_size is None or X.shape[0] <= sample_size:
+            return X, y
+        if not self.is_regression:
+            from sklearn.model_selection import train_test_split
+            X, _, y, _ = train_test_split(
+                X, y,
+                train_size=sample_size,
+                stratify=y,
+                random_state=self.args.seed,
+            )


+    def resolve_sample_size(self):
+        """
+        Resolve the effective training-row cap for this method.
+
+        An explicit ``config['general']['sample_size']`` takes precedence as a
+        per-run override; otherwise the method's ``train_row_limit`` from the
+        method registry applies (the single source of truth for row limits).
+        Returns None when neither is set (no cap).
+        """
+        general = self.args.config.get('general', {}) or {}
+        sample_size = general.get('sample_size')


Make the method registry the single source of truth for foundation-mo…

1265c3a

…del row limits; fix torch deprecations and consolidate duplicated code

Copilot AI review requested due to automatic review settings June 10, 2026 13:11

Merge branch 'LAMDA-Tabular:main' into main

203c19e

Copilot AI reviewed Jun 10, 2026

View reviewed changes

CUDA "invalid configuration argument" fix

da5be76

6sy666 merged commit 9ebd3e4 into LAMDA-Tabular:main Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the method registry the single source of truth for foundation-mo…#108

Make the method registry the single source of truth for foundation-mo…#108
6sy666 merged 3 commits into
LAMDA-Tabular:mainfrom
andreasgoethals:main

andreasgoethals commented Jun 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andreasgoethals commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Registry as single source of truth for training-row caps

2. Torch deprecation fixes

3. Consistency / dead-code cleanup

Verification

Fixes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andreasgoethals commented Jun 10, 2026 •

edited

Loading