Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
a4b880c
first iteration of scalar implementation
SvenKlaassen Jan 31, 2026
4f4c255
refactor DoubleMLScalar to split fit() into separate parts
SvenKlaassen Feb 1, 2026
ae2e5be
add plr_scalar implementation
SvenKlaassen Feb 1, 2026
dad5e4c
fix external predictions for doublemlscalar
SvenKlaassen Feb 1, 2026
5f0a137
Enhance PLR and DoubleMLScalar with learner management and validation
SvenKlaassen Feb 1, 2026
384beba
Add architecture documentation for DoubleMLScalar and class hierarchy
SvenKlaassen Feb 1, 2026
838d0ca
Add code simplifier and technical debt finder documentation
SvenKlaassen Feb 3, 2026
54e9eb4
Enhance DoubleMLScalar and PLR with learner management, validation, a…
SvenKlaassen Feb 6, 2026
aa2cffa
Add Interactive Regression Model (IRM) implementation and tests
SvenKlaassen Feb 6, 2026
0947c9d
Refactor documentation and guidelines for DoubleML, including coding …
SvenKlaassen Feb 7, 2026
48ae3a9
Refactor IRM class type hints to use built-in types and improve code …
SvenKlaassen Feb 7, 2026
1886a7d
Refactor DoubleMLScalar to enhance sample splitting functionality; ad…
SvenKlaassen Feb 9, 2026
0ca053d
Refactor IRM and PLR classes to reset fit state after updating learne…
SvenKlaassen Feb 9, 2026
33c8b01
Add copilot documentation for code style, error handling, performance…
SvenKlaassen Feb 9, 2026
45c5f48
Enhance DoubleMLScalar and IRM classes for stratified sample splittin…
SvenKlaassen Feb 9, 2026
0051c77
add post_nuisance checks
SvenKlaassen Feb 13, 2026
3486f5a
Merge branch 'main' into sk-refactoring
SvenKlaassen Feb 28, 2026
35434bb
add guideline for using absolute imports from project root
SvenKlaassen Feb 28, 2026
d195fff
add guidelines for tuning tests and required fixtures for scalar models
SvenKlaassen Feb 28, 2026
15216f0
Enhance DoubleMLScalar with improved tuning functionality and tests
SvenKlaassen Feb 28, 2026
b0026da
add nuisance evalutaion
SvenKlaassen Mar 1, 2026
050fa27
Implement sensitivity analysis for scalar models in DoubleML
SvenKlaassen Mar 1, 2026
e980cca
add first dml vector class
SvenKlaassen Mar 1, 2026
17cf8f3
Add branch status and TODOs documentation for sk-refactoring
SvenKlaassen Mar 25, 2026
3818c2b
Refactor weight handling in IRM and add comprehensive exception tests…
SvenKlaassen Mar 25, 2026
fdcf936
Merge branch 'main' into sk-refactoring
SvenKlaassen May 9, 2026
82d95a5
refactor: enhance validation for weights_bar in IRM and update fit ha…
SvenKlaassen May 9, 2026
10305aa
feat: Add CATE and GATE methods to IRM and PLR scalar models
SvenKlaassen May 9, 2026
71ef483
feat: Implement PLRVector for multi-treatment partially linear regres…
SvenKlaassen May 9, 2026
1ae721c
refactor: move Self type hint import to typing_extensions for 3.10
SvenKlaassen May 9, 2026
f1c0bcd
Fix high priority codacy issues: update set_learners method signature…
SvenKlaassen May 9, 2026
d74e9f9
fix medium codacy issues: streamline learner validation by extracting…
SvenKlaassen May 9, 2026
4c0fe2a
docs: fix docstring lint on new scalar/vector implementations
SvenKlaassen May 9, 2026
b996235
refactor: simplify set_learners method signature by removing kwargs
SvenKlaassen May 9, 2026
39a0101
refactor: remove redundant pass statements in abstract methods and st…
SvenKlaassen May 9, 2026
9913dbf
refactor: remove redundant pass statement in DoubleMLScalar class
SvenKlaassen May 9, 2026
bd75efb
refactor: remove redundant pass statements in abstract methods of Dou…
SvenKlaassen May 9, 2026
93247fd
refactor: simplify docstring for set_learners method in PLRVector class
SvenKlaassen May 9, 2026
d56d105
refactor: add doctest skip directive to evaluate_learners examples in…
SvenKlaassen May 9, 2026
3ffa823
refactor: enhance basis validation in DoubleMLPLR and PLR classes
SvenKlaassen May 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# DoubleML for Python

DoubleML is a Python package implementing Double/Debiased Machine Learning (DML) methods for causal inference:
- Partially Linear Models (PLR, PLIV, PLPR, LPLR)
- Interactive Regression Models (IRM, IIVM, APO, QTE, CVAR, SSM)
- Difference-in-Differences estimators (DID, DIDCSBinary, DIDMulti)
- Regression Discontinuity Design (RDD)

**Docs**: https://docs.doubleml.org | **Source**: https://github.com/DoubleML/doubleml-for-py

**Branch status & TODOs**: `.claude/STATUS.md`

## Architecture

### Class Hierarchy
```
DoubleMLBase (ABC)
└─> DoubleMLScalar (ABC) - single-parameter models
├─> LinearScoreMixin - closed-form solver (θ = -E[ψ_b]/E[ψ_a])
│ ├─> DoubleMLPLR
│ ├─> DoubleMLIRM
│ ├─> DoubleMLPLIV
│ ├─> DoubleMLIIVM
│ └─> DoubleML DID variants
└─> NonLinearScoreMixin - numerical solver (planned)

DoubleML - multi-parameter estimation (extends DoubleMLScalar)
```

### Design Patterns
- **Template Method**: `fit()` orchestrates; subclasses implement `_nuisance_est()`, `_get_score_elements()`
- **Mixin Pattern**: `LinearScoreMixin` provides closed-form coefficient estimation
- **Delegation**: `DoubleMLBase` delegates inference to `DoubleMLFramework`

### Core Files
| File | Purpose |
|------|---------|
| `doubleml/double_ml_base.py` | Abstract base with properties (coef, se, summary) and inference |
| `doubleml/double_ml_scalar.py` | Single-parameter estimation orchestrator |
| `doubleml/double_ml.py` | Multi-parameter estimation with sample splitting |
| `doubleml/double_ml_framework.py` | Statistical inference (confint, bootstrap, sensitivity) |
| `doubleml/double_ml_linear_score.py` | Linear score mixin |

### Package Structure
```
doubleml/
├── data/ # Data containers (DoubleMLData, DoubleMLDIDData, etc.)
├── plm/ # Partially Linear Models (PLR, PLIV, PLPR, LPLR)
├── irm/ # Interactive Regression Models (IRM, IIVM, APO, QTE, etc.)
├── did/ # Difference-in-Differences estimators
├── rdd/ # Regression Discontinuity Design
├── utils/ # Helpers (_checks, _estimation, resampling, tuning)
└── tests/ # Main test directory
```

## Key Dependencies

**Core**: numpy>=2.0.0, pandas>=2.0.0, scipy>=1.7.0, scikit-learn>=1.6.0, statsmodels>=0.14.0
**ML/Tuning**: optuna>=4.6.0, joblib>=1.2.0
**Visualization**: matplotlib>=3.9.0, seaborn>=0.13, plotly>=5.0.0
**Dev**: pytest>=8.3.0, black>=25.1.0, ruff>=0.11.1, mypy>=1.18.0, xgboost>=2.1.0, lightgbm>=4.6.0

## Git Workflow

- **Main branch**: `main`
- **Commits**: Conventional Commits — `feat:`, `fix:`, `docs:`, `refactor:`, `test:`, `chore:`

## Verification

Before completing any task:
```bash
black . # Format
ruff check --fix . # Lint
mypy doubleml # Type check
pytest -m ci # Tests
```

## Coding Standards

Detailed conventions are in `.claude/rules/`:
- **py-code-conventions.md** — Formatting, type hints, docstrings, naming, DML-specific patterns
- **error-handling.md** — Exception types, validation patterns, warnings vs. errors
- **performance-guidelines.md** — Vectorization, pre-allocation, DML computation patterns
- **testing-conventions.md** — Markers, fixtures, assertion patterns
- **dml-scalar-test-structure.md** — Mandatory 5-file test structure for scalar models
75 changes: 75 additions & 0 deletions .claude/STATUS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Branch Status & TODOs

> Tracked in git so it syncs across machines. Update this file as work progresses.
> Reference: `CLAUDE.md` loads this automatically via the line below.

---

## Branch: `sk-refactoring`

**Goal**: Introduce a new `DoubleMLScalar` / `DoubleMLVector` hierarchy alongside
the existing `DoubleML` API — cleaner design, better testability, explicit tuning,
nuisance evaluation, and sensitivity analysis.

### Completed

- [x] **Claude tooling** — `.claude/` dir, `CLAUDE.md`, `rules/`, `agents/`, `skills/`
- [x] **Architecture docs** — `doc/diagrams/architecture.md`, `doc/diagrams/testing_structure.md`
- [x] **`DoubleMLBase`** — abstract base with shared properties (`coef`, `se`, `summary`) and inference delegation (`doubleml/double_ml_base.py`)
- [x] **`LinearScoreMixin`** — closed-form θ = −E[ψ_b]/E[ψ_a] solver (`doubleml/double_ml_linear_score.py`)
- [x] **`DoubleMLScalar`** — single-parameter orchestrator (`doubleml/double_ml_scalar.py`) with:
- `fit()` → `draw_sample_splitting()` + `fit_nuisance_models()` + `estimate_causal_parameters()`
- `tune_ml_models()` via Optuna (`_LEARNER_PARAM_ALIASES`, `_get_tuning_data()` hook)
- `nuisance_targets`, `nuisance_loss`, `evaluate_learners()`
- `_sensitivity_element_est()` hook + full sensitivity analysis pipeline
- [x] **`DoubleMLPLRScalar`** — PLR scalar (`doubleml/plm/plr_scalar.py`) with all 7 test files:
- `test_plr_scalar.py`, `_return_types`, `_exceptions`, `_vs_plr`, `_external_predictions`, `_tune_ml_models`, `_evaluate_learners`, `_sensitivity`
- [x] **`DoubleMLIRMScalar`** — IRM scalar (`doubleml/irm/irm_scalar.py`) with all 7 test files (same structure)
- [x] **`cate()` + `gate()` for IRM scalar** — `doubleml/irm/irm_scalar.py` + `test_irm_scalar_cate_gate.py`
- [x] **`cate()` + `gate()` + `_partial_out()` for PLR scalar** — `doubleml/plm/plr_scalar.py` + `test_plr_scalar_cate_gate.py`. Multi-rep × multi-column basis fully supported.
- [x] **`DoubleMLBLP` per-rep basis API** — `basis` may be a single `pd.DataFrame` (shared) or a `list[pd.DataFrame]` of length `n_rep`. Also fixes the legacy `DoubleMLPLR.cate()` multi-rep bug (`basis * D_tilde` mis-broadcast for `n_rep>1` and `d_basis>1`).
- [x] **`DoubleMLVector`** — multi-treatment base class first iteration (`doubleml/double_ml_vector.py`)
- [x] **BLP multi-rep support** — `doubleml/utils/blp.py`
- [x] **`PLRVector`** — first concrete `DoubleMLVector` subclass (`doubleml/plm/plr_vector.py`) with 5 test files: `test_plr_vector.py`, `_return_types`, `_exceptions`, `_vs_plr`, `_external_predictions`. Validates exact equivalence with legacy `DoubleMLPLR` for multi-treatment.

### In Progress

_(none)_

### Feature Gaps vs Legacy Classes

Missing from `PLR` / `IRM` scalar compared to `DoubleMLPLR` / `DoubleMLIRM`:

| Feature | Legacy location | Applies to | Notes |
|---------|----------------|-----------|-------|
| `cate()` | `plr.py:447`, `irm.py:564` | — | ✅ ported for both IRM and PLR |
| `gate()` | `plr.py:485`, `irm.py:598` | — | ✅ ported for both IRM and PLR |
| `_partial_out()` | `plr.py:522` | — | ✅ ported for PLR scalar |
| `policy_tree()` | `irm.py:635` | IRM only | Not planned yet |

Weighted effects in IRM (`weights` dict form):
- Array weights: ✅ supported
- Dict weights with `weights_bar`: ✅ supported — init defers the `n_rep` column check; `DoubleMLScalar._check_smpls_dependent_inputs()` hook validates `weights_bar.shape == (n_obs, n_rep)` from inside both `draw_sample_splitting()` and `set_sample_splitting()`. `fit(n_folds=..., n_rep=...)` re-draws splits with a `UserWarning` when args conflict with existing splits.

Intentionally **not ported**:
- Callable score — design decision
- `trimming_rule` / `trimming_threshold` deprecated props — use `ps_processor_config`

### Planned

| Item | Files | Notes |
|------|-------|-------|
| `DoubleMLIRMVector` | `doubleml/irm/irm_vector.py` + tests | Next concrete Vector subclass |
| `DoubleMLPLIVScalar` | `doubleml/plm/pliv_scalar.py` + 7 test files | Next scalar model |
| `DoubleMLPLPRScalar` | `doubleml/plm/plpr_scalar.py` + 7 test files | |
| DID scalar variants | `doubleml/did/*_scalar.py` | DID, DIDCSBinary, DIDMulti |
| `DoubleMLVector` tests | `doubleml/tests/test_vector_*.py` | Base class tests |

---

## How to Update This File

- Mark items `[x]` when complete
- Move items between sections as work progresses
- Add new planned items as they are identified
- Commit this file with the relevant code changes so the status stays in sync
59 changes: 59 additions & 0 deletions .claude/agents/py-general-reviewer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
name: py-general-reviewer
description: Professional Python code reviewer focusing on logic, performance, and best practices. Uses a debate-driven approach to minimize false positives.
tools: Read, Grep, Glob, Bash
model: inherit
---

Review Python code changes for functional correctness and industry-standard best practices. Report issues only — never edit source files.

## Workflow

1. **Identify Changes**: Run `git diff --name-only HEAD~1` to identify changed `.py` files.
2. **Read**: Read the content of each modified file.
3. **Internal Debate**: For each file, simulate a dialogue:
- **@Auditor**: Finds potential bugs, edge cases, and "code smells."
- **@Author**: Defends the implementation (e.g., "This is a temporary shim" or "Performance requires this complexity").
- **@Resolution**: Agree on the final list of actionable improvements.
4. **Output**: Use the "Final Review" format specified below.

## Review Checklist

### 🔴 Critical (Bug Risk / Logic)
- **Edge Cases**: Unhandled `None` values, empty lists, or `0` divisors.
- **Resource Leaks**: Files or network sockets opened without `with` blocks.
- **Mutable Defaults**: Using `list` or `dict` as default arguments in functions.
- **Concurrency**: Thread-safety issues or race conditions in shared state.
- **Logic Errors**: Off-by-one errors or incorrect boolean logic in complex conditionals.

### 🟡 Warning (Best Practices / Clean Code)
- **Complexity**: Functions longer than 50 lines or nesting deeper than 3 levels.
- **DRY (Don't Repeat Yourself)**: Significant logic duplication that should be a helper function.
- **Error Handling**: Using "bare" `except:` blocks instead of specific exceptions.
- **Type Hinting**: Public APIs missing type annotations for parameters or return values.
- **Hardcoding**: URLs, credentials, or magic numbers that should be constants/config.

### 🟢 Suggestion (Style / Optimization)
- **Vectorization**: Using loops where NumPy or Pandas operations would be $O(1)$ or significantly faster.
- **Built-ins**: Re-implementing logic that exists in `itertools`, `collections`, or `pathlib`.
- **Docstrings**: Missing or outdated descriptions of function intent.

## Output Format

```markdown
## Final Review: `<filename>`

### ⚖️ The Debate Summary
[1-2 sentences on what was debated between the Auditor and Author.]

### 🚫 Resolved Issues (Blocking)
- **line N**: [issue]. **Fix**: `<concrete_code_fix>`

### ⚠️ Resolved Warnings
- **line N**: [issue]. **Consider**: `<suggestion>`

### ✅ Dismissed (False Positives)
- **line N**: [Original concern] -> [Reason for dismissal]

### Summary
[Final assessment: e.g., "3 issues found (1 critical, 2 warnings)"]
66 changes: 66 additions & 0 deletions .claude/agents/py-reviewer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
name: py-reviewer
description: Python code reviewer for DoubleML. Checks type safety, learner handling, score contracts, and test coverage. Use after writing or modifying Python files.
tools: Read, Grep, Glob, Bash
model: inherit
---

Review Python code changes against DoubleML project conventions. Report issues only — never edit source files.

## Workflow

1. Run `git diff --name-only HEAD~1` to identify changed files (use Bash)
2. Read each changed `.py` file
3. Review against the checklist below
4. Output findings in the format specified

## Review Checklist

### Critical (must fix — blocks merge)
- **Type hints**: All functions have parameter types and return types. Missing `-> None` counts.
- **`from __future__ import annotations`**: Present when class methods reference their own type (forward refs)
- **Learner validation**: `_check_learner()` called for every user-provided learner
- **Learner cloning**: `clone(learner)` before `.fit()` — learners are mutable
- **Score contract**: `_get_score_elements()` returns `{'psi_a': ..., 'psi_b': ...}` with shape `(n_obs,)`
- **Sample splitting**: Uses `DoubleMLResampling`, never raw `KFold`
- **Test markers**: Every test function has `@pytest.mark.ci`
- **Exception messages**: Include expected vs. actual values (`got {value}`)

### Warnings (should fix)
- **Module docstring**: File starts with `"""..."""` describing the module
- **NumPy-style docstrings**: Public functions/classes have Parameters + Returns sections
- **Naming**: Classes use `DoubleML` prefix, score elements use `psi_a`/`psi_b`, stats use `theta`/`se`/`n_obs`
- **Magic numbers**: Unexplained numeric literals (should be named constants)
- **Vectorization**: Python loops over `n_obs`-sized arrays (should be NumPy ops)
- **Error handling**: `_check_*` helpers from `doubleml/utils/_checks.py` used where applicable

### Suggestions (nice to have)
- **Property vs. method**: Cheap computed attributes should be `@property`, side effects should be methods
- **Decorator usage**: `@staticmethod` for `_check_data()`, `@abstractmethod` for template hooks
- **Class vs. instance variables**: `_LEARNER_SPECS`/`_VALID_SCORES` should be class-level

### Intentionally Acceptable (do NOT flag)
- `Any` type for scikit-learn estimators and learner objects
- `E721` type comparisons (`type(x) == Y`) — intentionally allowed by ruff config
- Test files without type annotations — excluded from mypy
- `# type: ignore` when suppressing third-party library issues (not own code)

## Output Format

```markdown
## Code Review: `<filename>`

### Critical
- **line N**: [issue description]. Fix: `<concrete code fix>`

### Warnings
- **line N**: [issue description]. Consider: `<suggestion>`

### Suggestions
- **line N**: [issue description]

### Summary
[1-2 sentences: overall assessment, number of issues by severity]
```

Review each changed file separately. If no issues found, state "No issues found" for that file.
Loading
Loading