Skip to content

preprocessing: handle constant columns in minmax_scaling (#1167)#1169

Open
jbbqqf wants to merge 1 commit into
rasbt:masterfrom
jbbqqf:fix/1167-minmax-constant-cols
Open

preprocessing: handle constant columns in minmax_scaling (#1167)#1169
jbbqqf wants to merge 1 commit into
rasbt:masterfrom
jbbqqf:fix/1167-minmax-constant-cols

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 21, 2026

Summary

mlxtend.preprocessing.minmax_scaling silently returns NaN for any column whose values are all identical: the per-column denominator = max - min is 0, so numerator / denominator becomes 0 / 0 and numpy only emits a low-level RuntimeWarning: invalid value encountered in divide. The sibling function standardize in the same module already collapses constant columns to 0.0 (and documents it). This PR aligns minmax_scaling with that contract.

Fixes #1167minmax_scaling returns NaN silently for constant columns

Context

The bug was reported on 2026-05-21 and points out the asymmetry with standardize: constant columns get NaN with no error path other than the numpy warning, which is easy to miss in a larger pipeline. The reporter's expected output (constant column at 0.0, other columns scaled normally) matches what standardize already does, so the fix here is to make the two functions agree.

Changes

  • mlxtend/preprocessing/scaling.py — in minmax_scaling, replace any zero entry in denominator with 1 before the divide. The numerator for those columns is identically zero, so the column collapses to 0.0 (i.e. min_val) instead of NaN. Behaviour for non-constant columns is unchanged. The Notes section of the docstring now records the contract explicitly. The substitution uses np.where(denominator == 0, 1, denominator) — a 1-line guard with a 5-line comment explaining the invariant so a reviewer reading the diff cold doesn't have to re-derive it.
  • mlxtend/preprocessing/tests/test__scaling__minmax_scaling.py — three new regression tests: numpy path with default (0, 1) range, pandas DataFrame path, and custom (50, 100) range. All three fail on origin/master and pass on this branch.
  • docs/sources/CHANGELOG.md — one entry under "Version 0.25.0 (TBD)".

Reproduce BEFORE/AFTER yourself (copy-paste)

# --- one-time setup ---
git clone https://github.com/rasbt/mlxtend.git /tmp/repro-1167 && cd /tmp/repro-1167
python -m venv .venv && source .venv/bin/activate
pip install -e . pytest pytest-timeout

cat > /tmp/repro_1167.py <<'PY'
import numpy as np
from mlxtend.preprocessing import minmax_scaling

ary = np.array([[5, 1], [5, 2], [5, 3]])
out = minmax_scaling(ary, columns=[0, 1])
print(out)
print("has_nan =", bool(np.isnan(out).any()))
PY

# --- BEFORE (origin/master) ---
git checkout origin/master
pip install -e . --quiet
python /tmp/repro_1167.py
# Expected: column 0 is [nan, nan, nan], "has_nan = True", and a RuntimeWarning

# --- AFTER (this PR) ---
git fetch https://github.com/jbbqqf/mlxtend.git fix/1167-minmax-constant-cols
git checkout FETCH_HEAD
pip install -e . --quiet
python /tmp/repro_1167.py
# Expected: column 0 is [0.0, 0.0, 0.0], "has_nan = False", no warning

What I ran locally

$ pytest mlxtend/preprocessing/tests/test__scaling__minmax_scaling.py -v
============================== 6 passed in 0.60s ==============================

$ pytest mlxtend/preprocessing/ -v --timeout=60
============================== 41 passed in 0.80s ==============================

The three new regression tests fail on origin/master:

FAILED test__scaling__minmax_scaling.py::test_minmax_scaling_constant_column_numpy
  AssertionError: minmax_scaling produced NaN for a constant column; expected the column to be flattened to min_val (0.0).
FAILED test__scaling__minmax_scaling.py::test_minmax_scaling_constant_column_pandas
FAILED test__scaling__minmax_scaling.py::test_minmax_scaling_constant_column_custom_range
RuntimeWarning: invalid value encountered in divide

Edge cases tested

# Scenario Input Expected Verified by
1 Constant column (numpy, default range) [[5,1],[5,2],[5,3]] column 0 → 0.0, column 1 → 0/0.5/1 test_minmax_scaling_constant_column_numpy
2 Constant column (pandas DataFrame) {"const":[5,5,5],"var":[1,2,3]} same, via .loc indexing path test_minmax_scaling_constant_column_pandas
3 Constant column + custom (min_val, max_val) [[7,1],[7,2],[7,3]], (50, 100) column 0 → 50.0 (the lower bound) test_minmax_scaling_constant_column_custom_range
4 Non-constant columns (regression for existing behaviour) original test_pandas_minmax_scaling / test_numpy_minmax_scaling inputs unchanged both pre-existing tests still pass

Risk / blast radius

Additive guard inside a single function. The new code path is only exercised when at least one selected column has zero range — previously that path produced NaN, now it produces min_val. Callers that relied on the NaN as a downstream sentinel will see a behavioural change, but that path was undocumented and inconsistent with standardize; the new behaviour matches the function's already-documented "rescaled column" contract.

Release note

minmax_scaling no longer returns silent NaNs for constant columns; constant columns are now collapsed to min_val, mirroring the existing contract of standardize (#1167).

PR drafted with assistance from Claude Code. The change was reviewed manually against the existing standardize implementation in the same file (which has handled constant columns since at least v0.18) and against the reporter's expected output in #1167. The reproducer block above was used during development; it is the same one a reviewer can paste verbatim.

Constant columns have zero range, so the naive
(x - min) / (max - min) inside minmax_scaling computes 0 / 0 and
silently writes NaN into the output (with only a low-level numpy
`RuntimeWarning: invalid value encountered in divide`). The sibling
function `standardize` already collapses constant columns to 0.0;
this commit aligns `minmax_scaling` with the same contract.

Force the per-column denominator to 1 for constant columns: the
numerator is identically zero in that case, so the column collapses
to 0.0 (i.e. `min_val`) without raising a warning. Behaviour for
non-constant columns is unchanged.

Adds three regression tests covering the numpy path, the pandas
DataFrame path, and the custom `(min_val, max_val)` range.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

minmax_scaling returns NaN silently for constant columns

1 participant