Skip to content

frequent_patterns: reject min_support outside (0, 1] (#864)#1164

Merged
rasbt merged 4 commits into
rasbt:masterfrom
jbbqqf:feat/864-apriori-support-bounds
Jun 6, 2026
Merged

frequent_patterns: reject min_support outside (0, 1] (#864)#1164
rasbt merged 4 commits into
rasbt:masterfrom
jbbqqf:feat/864-apriori-support-bounds

Conversation

@jbbqqf

@jbbqqf jbbqqf commented May 9, 2026

Copy link
Copy Markdown
Contributor

Code of Conduct

I have read the project's Code of Conduct.

Description

min_support is documented as a fraction in the half-open interval (0, 1]. The current validators in apriori, fpgrowth, fpmax, and hmine only reject min_support <= 0, so values like min_support=2 silently pass through. The algorithm then returns an empty DataFrame because no fractional support can ever exceed 1, which is confusing for users who picked the wrong scale (counts vs fractions, percent vs fraction).

This PR tightens each of the four validators to also reject > 1.0 with the same message that already advertises the (0, 1] interval. The change is localised to the four entry points; nothing else in the algorithms moves.

Related issues or pull requests

Fixes #864

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file
  • Added appropriate unit test functions in the ./mlxtend/frequent_patterns/tests/ directory (one shared test in FPTestEx3All runs against all four algorithms via the existing harness)
  • Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (not applicable — the existing docstrings already say min_support is a fraction in (0, 1])
  • Ran PYTHONPATH='.' pytest ./mlxtend/frequent_patterns -q — 103/103 passed
  • Checked for style issues by running flake8 ./mlxtend (clean) and black --check (clean)

Reproduce BEFORE/AFTER yourself (copy-paste)

# --- one-time setup ---
git clone https://github.com/rasbt/mlxtend.git /tmp/repro-864 && cd /tmp/repro-864
python -m venv .venv && source .venv/bin/activate
pip install -e . pytest

# --- BEFORE (origin/master) ---
git checkout origin/master
python - <<'PY'
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, fpgrowth, fpmax
te = TransactionEncoder()
data = [['a','b'],['b','c'],['a','b','c']]
df = pd.DataFrame(te.fit(data).transform(data), columns=te.columns_)
for fn in (apriori, fpgrowth, fpmax):
    try:
        out = fn(df, min_support=2)  # nonsense: support > 1
        print(f"{fn.__name__:8s}: silently returned {len(out)} rows (BUG)")
    except ValueError as e:
        print(f"{fn.__name__:8s}: ValueError -> {e}")
PY
# Expected (BEFORE, BUGGY): each algorithm prints "silently returned 0 rows (BUG)"

# --- AFTER (this PR) ---
git fetch https://github.com/jbbqqf/mlxtend.git feat/864-apriori-support-bounds
git checkout FETCH_HEAD
python - <<'PY'
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, fpgrowth, fpmax
te = TransactionEncoder()
data = [['a','b'],['b','c'],['a','b','c']]
df = pd.DataFrame(te.fit(data).transform(data), columns=te.columns_)
for fn in (apriori, fpgrowth, fpmax):
    try:
        out = fn(df, min_support=2)
        print(f"{fn.__name__:8s}: silently returned {len(out)} rows (BUG)")
    except ValueError as e:
        print(f"{fn.__name__:8s}: ValueError -> {e}")
PY
# Expected (AFTER, FIXED): each algorithm raises
#   ValueError: `min_support` must be a positive number within the interval `(0, 1]`. Got 2.

# --- Regression tests (same on both refs; fail BEFORE, pass AFTER) ---
PYTHONPATH=. pytest mlxtend/frequent_patterns/tests -q -k "test_output4"
# Expected (BEFORE): 4 failed (one per algorithm)
# Expected (AFTER):  4 passed

What I ran locally

  • PYTHONPATH=. pytest mlxtend/frequent_patterns -q → 103/103 passed
  • PYTHONPATH=. pytest mlxtend/frequent_patterns/tests -k "test_output4" → 4/4 passed (one per algorithm)
  • Same 4 tests run against origin/master's sources: 4/4 fail with "ValueError not raised."
  • flake8 + black --check on touched files → clean

Edge cases tested

# Scenario Input Expected Verified by
1 min_support = 2 (above interval) apriori, fpgrowth, fpmax, hmine ValueError: ... within the interval (0, 1]. Got 2. new test_output4_min_support_above_one_issue_864 (FPTestEx3All; runs against all four algorithms)
2 min_support = 0.0 (existing edge) same four algorithms ValueError: ... within the interval (0, 1]. Got 0.0. existing test_output3 (preserved)
3 min_support = 1.0 (boundary, valid) same four algorithms succeeds; returns frequent itemsets with support exactly 1.0 if any existing happy-path tests in FPTestEx1 / FPTestEx2
4 Typical min_support = 0.5 same four algorithms unchanged the rest of mlxtend/frequent_patterns/tests (103 passing)

Risk / blast radius

Minimal. Affects only the entry-point parameter validation of four functions; nothing in the algorithm bodies changes. Users who were passing absolute counts (e.g. min_support=5) and getting silent empty results will now get an explicit ValueError pointing them at the documented fractional interval — strictly more informative.

Release note

Validate `min_support` against the documented `(0, 1]` interval in `apriori`, `fpgrowth`, `fpmax`, and `hmine`. Values like `min_support=2` now raise `ValueError` instead of silently returning empty results.

PR drafted with assistance from Claude Code. The change was reviewed manually against rasbt/mlxtend's source and the existing docstrings, which already specified the (0, 1] interval. The reproducer block above was used during development; it is the same one a reviewer can paste verbatim.

jbbqqf and others added 4 commits May 9, 2026 20:31
`min_support` is documented as a fraction in the half-open interval
`(0, 1]`. The validation in apriori / fpgrowth / fpmax / hmine only
checked `<= 0`, so values like `min_support=2` would pass through and
return an empty DataFrame silently — confusing for users who picked the
wrong scale (counts vs fractions, percent vs fraction).

Tightens each validator to also reject `> 1.0` with the same message
that already advertises the `(0, 1]` interval. Adds a single
`test_output4_min_support_above_one_issue_864` to the shared
`FPTestEx3All` base class so it runs against all four algorithms.

Co-Authored-By: Claude Code <noreply@anthropic.com>
@rasbt rasbt merged commit 21de60d into rasbt:master Jun 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raise error if support value is too large in apriori

2 participants