Refactor examples with common interface, plotting and benchmarking #188

adamamer20 · 2025-12-08T23:02:07Z

This PR introduces:

a single, unified Typer CLI for running Mesa-Frames models with consistent options for agents, steps, seeds, plotting, and result persistence.

run command with standardised options
(--agents, --steps, --seed, --plot, --save-results, --results-dir).
Automatic generation of plots (light + dark themes) for agent-level metrics and backend performance.

A first version of the benchmarks CLI, allowing quick comparison of Mesa vs Frames backends on reference models (boltzmann, sugarscape) with automated CSV output and scaling plots.

Summary by CodeRabbit

New Features
- Added CLI-driven performance benchmarking between backends, runnable example models (Boltzmann Wealth, Sugarscape IG) for both backends, and reusable plotting utilities for model/agent/performance metrics.
Documentation
- New comprehensive READMEs describing benchmarks, examples, CLI options, outputs, and extension tips.
Chores
- Updated ignore patterns and documentation tooling config (docs dependency added).

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Implemented a new backend using Mesa with sequential updates in `examples/sugarscape_ig/backend_mesa`. - Created agent and model classes for the Sugarscape simulation, including movement and sugar management. - Added a CLI interface using Typer for running simulations and saving results. - Introduced utility classes for handling simulation results from both Mesa and Mesa-Frames backends. - Added a new backend using Mesa-Frames with parallel updates in `examples/sugarscape_ig/backend_frames`. - Implemented model-level reporters for Gini coefficient and correlations between agent traits. - Included CSV output and plotting capabilities for simulation metrics.

codecov · 2025-12-08T23:03:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.23%. Comparing base (172cf28) to head (991eb8a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #188   +/-   ##
=======================================
  Coverage   89.23%   89.23%           
=======================================
  Files          14       14           
  Lines        2007     2007           
=======================================
  Hits         1791     1791           
  Misses        216      216

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

uv.lock

adamamer20 · 2025-12-08T23:08:02Z

@coderabbitai review

coderabbitai · 2025-12-08T23:08:13Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-12-08T23:12:27Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a benchmarks CLI, example models (Boltzmann and Sugarscape) with both Mesa and mesa-frames backends, shared plotting/utilities, documentation, and minor config updates (pyproject, .gitignore). New Typer CLIs and data/result scaffolding produce timestamped CSVs and plots.

Changes

Cohort / File(s)	Summary
Configuration & Dependencies `\`.gitignore``,` `pyproject.toml``	Updated gitignore glob patterns for benchmarks/examples artifacts; added `typer>=0.9.0` to docs dependency group.
Benchmarks CLI & docs `\`benchmarks/cli.py``,` `benchmarks/README.md``	New Typer-based benchmarking CLI (multi-model, multi-backend, repeats, seeding, CSV output, plotting) and README describing usage, output layout, CLI options, CSV schema, and extension notes.
Examples root & utilities `\`examples/init.py``,` `examples/utils.py``,` `examples/plotting.py``,` `examples/README.md``	Package initializer exposing example symbols; new simulation result dataclasses (`FramesSimulationResult`, `MesaSimulationResult`); unified plotting helpers (`plot_model_metrics`, `plot_agent_metrics`, `plot_performance`) with multi-theme saving; examples README.
Boltzmann Wealth (Frames & Mesa) `\`examples/boltzmann_wealth/backend_frames.py``,` `examples/boltzmann_wealth/backend_mesa.py``,` `examples/boltzmann_wealth/README.md``	Added Frames and Mesa implementations of Boltzmann Wealth model, each with `simulate()` and Typer CLI, gini computation, optional CSV/plot outputs, and run instrumentation.
Sugarscape IG — Frames backend `\`examples/sugarscape_ig/backend_frames/init.py``,` `examples/sugarscape_ig/backend_frames/agents.py``,` `examples/sugarscape_ig/backend_frames/model.py``,` `examples/sugarscape_ig/README.md``	New mesa-frames Sugarscape IG package: AntsBase/AntsParallel with vectorised neighborhood/ranking and iterative conflict-resolution rounds; Sugarscape model with instant-growback sugar logic, reporters (gini, correlations), simulate() and Typer CLI; README.
Sugarscape IG — Mesa backend `\`examples/sugarscape_ig/backend_mesa/init.py``,` `examples/sugarscape_ig/backend_mesa/agents.py``,` `examples/sugarscape_ig/backend_mesa/model.py``	New Mesa-based Sugarscape IG: `AntAgent` movement/tie-break logic, Sugarscape model with DataCollector reporters, simulate(), Typer CLI, and plotting integration.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as benchmarks/cli.py
    participant ModelCfg as MODELS
    participant Backend as Backend Runner
    participant Sim as Simulation (Mesa/Frames)
    participant DC as DataCollector
    participant IO as File I/O (CSV/plots)
    participant Plot as examples/plotting.py

    User->>CLI: run(models, agents, steps, repeats, seed, save, plot)
    CLI->>CLI: parse inputs (_parse_agents/_parse_models)
    CLI->>ModelCfg: select model configs
    loop models
        CLI->>Backend: choose backend runner
        loop repeats
            CLI->>Backend: runner(agents, steps, seed)
            activate Backend
            Backend->>Sim: instantiate & run simulation
            Sim->>DC: collect metrics during run
            Sim-->>Backend: return runtime & datacollector
            Backend-->>CLI: return runtime, datacollector
            deactivate Backend
            CLI->>CLI: record runtime row
        end
        CLI->>CLI: aggregate runtimes into DataFrame
        alt save
            CLI->>IO: write CSVs to timestamped results_dir
        end
        alt plot
            CLI->>Plot: plot_performance(df, model_name, output_dir, timestamp)
            Plot->>IO: save PNG/SVG (light/dark themes)
        end
    end
    CLI->>User: report completion and paths

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas to focus review on:

benchmarks/cli.py — input parsing, seeding, repeat logic, aggregation and file I/O.
examples/sugarscape_ig/backend_frames/agents.py — parallel movement ranking and iterative conflict-resolution correctness and edge cases.
examples/plotting.py — DataFrame melting, multi-theme saving, and filename/metadata conventions.
Cross-backend consistency between Frames and Mesa implementations (metric definitions, CSV schema, timestamps).

Possibly related PRs

Add advanced tutorial (Sugarscape IG) notebook script #189 — Overlapping Sugarscape IG implementations and algorithmic components (AntsBase/AntsParallel, Sugarscape model, reporters), likely related code-level changes.

Suggested labels

docs

Suggested reviewers

EwoutH
Ben-geo

Poem

🐰
I hopped through models, plots in paw,
Tweaked backends, CSVs, and awe,
Frames and Mesa side by side,
Benchmarks logged — a joyful stride,
Carrots for code, let's run and draw! 🎨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 49.21% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly describes the main objective: introducing a unified CLI interface for examples, adding plotting functionality, and creating a benchmarking system. It accurately summarizes the primary changes across the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch split/examples-benchmarks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (12)

examples/__init__.py (1)
1-6: Clean and simple package initializer.

The module docstring and __all__ list are properly structured and follow Python conventions. The subpackage names use the correct snake_case naming style.

Note: The __all__ list declares these as public API surface, but without explicit imports, users will need to access them via the full path (e.g., import examples.boltzmann_wealth) rather than from examples import boltzmann_wealth. If direct importing from the examples package is desired, consider adding explicit imports:
from . import boltzmann_wealth, sugarscape_ig
This is optional based on your intended API—just ensure the usage pattern is intentional and documented.
examples/sugarscape_ig/backend_frames/agents.py (2)
298-298: Redundant .with_columns(pl.col("radius")) does nothing.

This line selects radius without any transformation, effectively a no-op.
-            .with_columns(pl.col("radius"))
311-311: Minor: Non-breaking hyphen in comment.

The comment contains a non-breaking hyphen (‑) instead of a regular hyphen-minus (-). This is flagged by Ruff (RUF003) but is purely cosmetic.
-        # Precompute per‑agent candidate rank once so conflict resolution can
+        # Precompute per-agent candidate rank once so conflict resolution can
examples/boltzmann_wealth/backend_frames.py (1)
5-13: Non-standard import order.

Standard library imports (os, time) are interleaved with third-party imports. PEP 8 recommends grouping imports: stdlib first, then third-party.
 from datetime import datetime, timezone
+import os
 from pathlib import Path
+from time import perf_counter
 from typing import Annotated
 
 import numpy as np
-import os
 import polars as pl
 import typer
-from time import perf_counter
examples/plotting.py (3)
71-75: Consider logging the exception instead of silently ignoring it.

The try-except-pass pattern suppresses all errors during SVG export. While SVG is optional, logging the exception would aid debugging when exports fail unexpectedly.
-    try:
-        fig.savefig(output_dir / f"{stem}_{theme}.svg", bbox_inches="tight")
-    except Exception:
-        pass  # SVG is a nice-to-have
+    try:
+        fig.savefig(output_dir / f"{stem}_{theme}.svg", bbox_inches="tight")
+    except Exception as exc:  # noqa: BLE001
+        import logging
+        logging.debug("SVG export failed for %s: %s", stem, exc)
109-114: Bare Exception catch may mask unexpected errors.

The intent is to handle missing/malformed step data, but catching Exception could hide unrelated bugs. Consider catching more specific exceptions or at minimum adding a # noqa: BLE001 comment to acknowledge this is intentional.
     if steps is None:
         try:
             steps = int(metrics.select(pl.col("step").max()).item()) + 1
-        except Exception:
+        except (TypeError, ValueError, pl.exceptions.ComputeError):
             steps = None
286-290: Sort __all__ for consistency with isort conventions.

Static analysis flagged this as unsorted. Sorting maintains consistency with standard Python tooling.
 __all__ = [
+    "plot_agent_metrics",
     "plot_model_metrics",
-    "plot_agent_metrics",
     "plot_performance",
 ]
examples/sugarscape_ig/backend_mesa/model.py (2)
57-72: Docstring mismatch: function signature differs from similar functions.

The docstring for gini doesn't follow NumPy style as required by the coding guidelines. Consider adding Parameters/Returns sections for consistency with public APIs.
 def gini(values: Iterable[float]) -> float:
+    """Compute the Gini coefficient from an iterable of wealth values.
+
+    Parameters
+    ----------
+    values : Iterable[float]
+        Iterable of wealth/sugar values.
+
+    Returns
+    -------
+    float
+        Gini coefficient in [0, 1], or 0.0 for zero-total/constant values,
+        or NaN for empty input.
+    """
     array = np.fromiter(values, dtype=float)
132-134: Unused lambda parameter m – consider using _ convention.

The seed reporter lambda captures seed from the enclosing scope but declares an unused parameter m. Using _ makes the intent explicit.
-                "seed": lambda m: seed,
+                "seed": lambda _: seed,
examples/boltzmann_wealth/backend_mesa.py (2)
24-40: Consider extracting gini to a shared module to reduce duplication.

This gini function is identical to the one in examples/sugarscape_ig/backend_mesa/model.py (lines 57-72). Extracting it to examples/utils.py would improve maintainability.

You could add to examples/utils.py:
def gini(values: Iterable[float]) -> float:
    """Compute the Gini coefficient from an iterable of wealth values."""
    array = np.fromiter(values, dtype=float)
    if array.size == 0:
        return float("nan")
    if np.allclose(array, 0.0):
        return 0.0
    if np.allclose(array, array[0]):
        return 0.0
    sorted_vals = np.sort(array)
    n = sorted_vals.size
    cumulative = np.cumsum(sorted_vals)
    total = cumulative[-1]
    if total == 0:
        return 0.0
    index = np.arange(1, n + 1, dtype=float)
    return float((2.0 * np.dot(index, sorted_vals) / (n * total)) - (n + 1) / n)
Then import from both backends.

74-77: Unused lambda parameter m – use _ convention.

Same issue as in the Sugarscape model.
             model_reporters={
                 "gini": lambda m: gini(a.wealth for a in m.agent_list),
-                "seed": lambda m: seed,
+                "seed": lambda _: seed,
             }
benchmarks/cli.py (1)
97-100: Range generation logic could be simplified.

The current approach generates range(start, stop + step, step) then pops if the last value exceeds stop. This is correct but less idiomatic than using range(start, stop + 1, step) directly for inclusive ranges.
-        counts = list(range(start, stop + step, step))
-        if counts[-1] > stop:
-            counts.pop()
-        return counts
+        return list(range(start, stop + 1, step))
This produces the same result for an inclusive [start, stop] range with the given step.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3037456 and aadee32.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (18)

.gitignore (1 hunks)
benchmarks/README.md (1 hunks)
benchmarks/cli.py (1 hunks)
examples/README.md (1 hunks)
examples/__init__.py (1 hunks)
examples/boltzmann_wealth/README.md (1 hunks)
examples/boltzmann_wealth/backend_frames.py (1 hunks)
examples/boltzmann_wealth/backend_mesa.py (1 hunks)
examples/plotting.py (1 hunks)
examples/sugarscape_ig/README.md (1 hunks)
examples/sugarscape_ig/backend_frames/__init__.py (1 hunks)
examples/sugarscape_ig/backend_frames/agents.py (1 hunks)
examples/sugarscape_ig/backend_frames/model.py (1 hunks)
examples/sugarscape_ig/backend_mesa/__init__.py (1 hunks)
examples/sugarscape_ig/backend_mesa/agents.py (1 hunks)
examples/sugarscape_ig/backend_mesa/model.py (1 hunks)
examples/utils.py (1 hunks)
pyproject.toml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use Python 3.11 or later with 4-space indentation
Include type hints for all public APIs
Use NumPy-style docstrings (validated by Ruff/pydoclint)
Use CamelCase for class names
Use snake_case for function and attribute names
Avoid using TYPE_CHECKING guards for type annotations because the project uses beartype for runtime type checking, which requires the actual type objects to be available at runtime
Treat underscored attributes as internal/private and not part of the public API

Files:

examples/utils.py
examples/__init__.py
examples/sugarscape_ig/backend_mesa/agents.py
examples/plotting.py
examples/sugarscape_ig/backend_mesa/__init__.py
examples/boltzmann_wealth/backend_frames.py
examples/sugarscape_ig/backend_mesa/model.py
examples/sugarscape_ig/backend_frames/agents.py
examples/sugarscape_ig/backend_frames/model.py
examples/boltzmann_wealth/backend_mesa.py
examples/sugarscape_ig/backend_frames/__init__.py
benchmarks/cli.py

🧠 Learnings (3)

📚 Learning: 2025-12-08T18:41:11.772Z

Learnt from: CR
Repo: projectmesa/mesa-frames PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-08T18:41:11.772Z
Learning: Use `examples/` directory for reproducible demo models and performance scripts

Applied to files:

examples/README.md
examples/__init__.py

📚 Learning: 2025-12-08T18:41:11.772Z

Learnt from: CR
Repo: projectmesa/mesa-frames PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-08T18:41:11.772Z
Learning: Use `mesa_frames/` as the source package directory, with `abstract/` and `concrete/` subdirectories for core APIs and implementations, and key modules: `agents.py`, `agentset.py`, `space.py`, `datacollector.py`, `types_.py`

Applied to files:

examples/README.md
benchmarks/README.md
examples/utils.py
examples/sugarscape_ig/README.md
examples/sugarscape_ig/backend_mesa/agents.py
examples/sugarscape_ig/backend_mesa/__init__.py
examples/boltzmann_wealth/backend_frames.py
examples/sugarscape_ig/backend_mesa/model.py
examples/sugarscape_ig/backend_frames/agents.py
examples/sugarscape_ig/backend_frames/model.py
examples/sugarscape_ig/backend_frames/__init__.py

📚 Learning: 2025-04-29T09:25:34.183Z

Learnt from: adamamer20
Repo: projectmesa/mesa-frames PR: 143
File: mesa_frames/abstract/space.py:50-63
Timestamp: 2025-04-29T09:25:34.183Z
Learning: The project mesa-frames has been upgraded to Python 3.11, which provides native support for `Self` type in the standard typing module, eliminating the need for imports from typing_extensions.

Applied to files:

examples/utils.py
examples/sugarscape_ig/backend_frames/__init__.py

🧬 Code graph analysis (7)

examples/utils.py (1)

mesa_frames/concrete/datacollector.py (1)

DataCollector (69-640)

examples/sugarscape_ig/backend_mesa/agents.py (3)

examples/sugarscape_ig/backend_frames/model.py (2)

Sugarscape (174-386)

step (333-350)

examples/sugarscape_ig/backend_mesa/model.py (2)

Sugarscape (75-187)

step (172-181)

examples/sugarscape_ig/backend_frames/agents.py (3)

step (55-71)

move (73-80)

move (131-185)

examples/boltzmann_wealth/backend_frames.py (3)

mesa_frames/concrete/datacollector.py (1)

DataCollector (69-640)

examples/utils.py (1)

FramesSimulationResult (7-14)

examples/boltzmann_wealth/backend_mesa.py (1)

gini (24-40)

examples/sugarscape_ig/backend_frames/agents.py (3)

mesa_frames/concrete/agentset.py (1)

AgentSet (76-686)

mesa_frames/concrete/model.py (1)

Model (54-223)

examples/sugarscape_ig/backend_mesa/agents.py (1)

move (75-78)

examples/sugarscape_ig/backend_frames/model.py (3)

examples/utils.py (1)

FramesSimulationResult (7-14)

examples/plotting.py (1)

plot_model_metrics (81-171)

mesa_frames/abstract/datacollector.py (2)

collect (118-128)

flush (176-196)

examples/boltzmann_wealth/backend_mesa.py (4)

examples/utils.py (1)

MesaSimulationResult (18-25)

examples/plotting.py (1)

plot_model_metrics (81-171)

examples/boltzmann_wealth/backend_frames.py (3)

gini (24-42)

step (52-76)

step (106-108)

examples/sugarscape_ig/backend_frames/model.py (2)

gini (30-71)

step (333-350)

benchmarks/cli.py (3)

examples/plotting.py (1)

plot_performance (234-283)

examples/boltzmann_wealth/backend_frames.py (5)

simulate (115-124)

step (52-76)

step (106-108)

run (110-112)

run (131-186)

examples/boltzmann_wealth/backend_mesa.py (5)

simulate (92-97)

step (50-57)

step (81-85)

run (87-89)

run (104-177)

🪛 LanguageTool

examples/boltzmann_wealth/README.md

[grammar] ~57-~57: Use a hyphen to join words.
Context: ...rics - --results-dir Override the auto timestamped directory under results/ ...

(QB_NEW_EN_HYPHEN)

examples/sugarscape_ig/README.md

[grammar] ~50-~50: Use a hyphen to join words.
Context: ...alive` typically decreases until a quasi steady state (metabolism vs regrowth) or...

(QB_NEW_EN_HYPHEN)

🪛 Ruff (0.14.7)

examples/sugarscape_ig/backend_mesa/agents.py

21-21: Undefined name Sugarscape

(F821)

examples/plotting.py

73-74: try-except-pass detected, consider logging the exception

(S110)

73-73: Do not catch blind exception: Exception

(BLE001)

113-113: Do not catch blind exception: Exception

(BLE001)

273-273: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

286-290: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

examples/sugarscape_ig/backend_mesa/model.py

134-134: Unused lambda argument: m

(ARG005)

260-260: Local variable seed_val is assigned to but never used

Remove assignment to unused variable seed_val

(F841)

examples/sugarscape_ig/backend_frames/agents.py

50-52: Avoid specifying long messages outside the exception class

(TRY003)

311-311: Comment contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF003)

examples/sugarscape_ig/backend_frames/model.py

227-229: Avoid specifying long messages outside the exception class

(TRY003)

examples/boltzmann_wealth/backend_mesa.py

76-76: Unused lambda argument: m

(ARG005)

138-138: Comment contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF003)

143-143: Comment contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF003)

benchmarks/cli.py

86-86: Avoid specifying long messages outside the exception class

(TRY003)

90-90: Avoid specifying long messages outside the exception class

(TRY003)

92-92: Avoid specifying long messages outside the exception class

(TRY003)

94-94: Avoid specifying long messages outside the exception class

(TRY003)

96-96: Avoid specifying long messages outside the exception class

(TRY003)

104-104: Avoid specifying long messages outside the exception class

(TRY003)

106-106: Avoid specifying long messages outside the exception class

(TRY003)

126-126: Avoid specifying long messages outside the exception class

(TRY003)

129-129: Avoid specifying long messages outside the exception class

(TRY003)

197-197: Do not perform function call in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🔇 Additional comments (43)

.gitignore (1)

158-162: Ignore patterns updated appropriately for nested results/plots output.

The glob patterns now correctly accommodate timestamped subdirectories (benchmarks/results/TIMESTAMP/... and examples/*/results/TIMESTAMP/...) and plot outputs, aligning with the new CLI-driven output structure described in the READMEs.

examples/README.md (1)

1-106: Well-structured and comprehensive examples documentation.

The README clearly explains the dual-backend approach, CLI interface, data formats, and extension patterns. The quick start, tips, and programmatic use examples are practical and well-documented.

pyproject.toml (1)

66-85: Typer dependency added appropriately to docs group.

The placement in the docs dependency group aligns with using Typer for CLI tooling in examples and benchmarks. The version constraint >=0.9.0 is reasonable and not overly restrictive.

Please confirm that the examples and benchmarks code actually imports and uses Typer as documented in the READMEs (e.g., benchmarks/cli.py, examples/boltzmann_wealth/backend_frames.py), since those files are not visible in this review context.

examples/sugarscape_ig/backend_mesa/__init__.py (1)

1-1: Appropriate minimal package initializer.

The module docstring correctly identifies the package purpose. No additional code or exports are needed at this stage.

benchmarks/README.md (1)

1-88: Comprehensive and well-structured benchmarking documentation.

The README clearly explains purpose, CLI usage, output formats, and extension patterns. The table of CLI options and CSV schema documentation are particularly helpful for reproducibility and post-processing.

examples/sugarscape_ig/backend_frames/__init__.py (1)

1-1: Appropriate minimal package initializer.

The module docstring correctly identifies the Frames backend package. No additional code or exports are needed at this stage.

examples/sugarscape_ig/backend_mesa/agents.py (3)

34-47: LGTM!

The _visible_cells method correctly implements cardinal direction visibility with proper boundary checks.

49-73: LGTM!

The _choose_best_cell method correctly implements the Sugarscape movement rule with proper tie-breaking by sugar amount, Manhattan distance, and lexicographic coordinates.

75-78: LGTM!

The move method is clean and delegates appropriately to the helper methods.

examples/utils.py (1)

6-25: LGTM!

The dataclasses provide clean containers for simulation results with appropriate type hints. The docstrings could be updated to reference datacollector.data instead of metrics/agent_metrics for clarity, but this is a minor documentation nit.

examples/sugarscape_ig/backend_frames/agents.py (7)

30-53: LGTM!

Good defensive validation of required trait columns with a clear error message. Using clone() on the input DataFrame prevents unintended side effects.

55-71: LGTM!

The step sequence (shuffle → move → eat → remove starved) correctly implements the Sugarscape update order as documented.

82-116: LGTM!

The eat method correctly implements vectorised sugar harvesting with proper updates to both agent sugar stocks and cell sugar values.

118-127: LGTM!

Clean vectorised removal of starved agents.

131-185: LGTM!

The move method is well-structured with clear early exits and proper delegation to helper methods. The inline schema comments enhance readability.

187-248: LGTM!

The _build_neighborhood_frame method correctly assembles the neighborhood data with proper joins and null handling.

348-620: LGTM!

The conflict resolution algorithm is well-implemented with proper fallback handling. The iterative lottery approach with rank promotion ensures all agents are eventually assigned, either to a preferred cell or their origin. The inline schema comments significantly aid comprehension.

examples/sugarscape_ig/backend_frames/model.py (4)

30-71: LGTM with minor redundancy.

The function is robust with multiple safety checks. Lines 60-65 contain redundant size checks (already verified at line 58), but this defensive approach is acceptable for example code.

74-140: LGTM!

Both correlation reporters are correctly implemented with appropriate null checks. The similar structure between them is acceptable for readability in example code.

143-171: LGTM!

The _safe_corr helper correctly handles edge cases (insufficient data, constant values) that would make Pearson correlation undefined.

174-386: LGTM!

The Sugarscape class is well-structured with:

Proper validation of agent count vs grid capacity

Clean separation of grid generation, agent creation, and data collection

Correct instant-growback sugar regrowth logic

Appropriate termination handling when agents die out

examples/boltzmann_wealth/backend_frames.py (4)

24-42: LGTM!

The gini function correctly handles edge cases and matches the pattern from the Mesa backend implementation.

45-76: LGTM!

The MoneyAgents class correctly implements the Boltzmann wealth exchange using vectorised Polars operations. The sampling uses the model RNG for reproducibility.

79-112: LGTM!

The MoneyModel class correctly integrates the agent set with data collection, with appropriate storage backend selection based on results_dir.

115-124: LGTM!

Clean simulation function that correctly wires the model and returns the result container.

examples/plotting.py (5)

17-44: LGTM! Theme configuration is well-structured.

The light/dark theme definitions with consistent styling (disabled top/right spines, appropriate colors) provide a good foundation for the plotting utilities.

47-56: LGTM! Seed shortening helper is correctly implemented.

The regex-based approach handles edge cases (None, no match) and truncates long seeds appropriately.

81-171: LGTM! plot_model_metrics implementation is comprehensive.

The function correctly handles:

Empty dataframes (early return)

Missing step column (auto-generation)

Single vs multiple metrics (legend removal, y-axis labeling)

Title composition with optional N/T metadata

Multi-theme output

177-228: LGTM! plot_agent_metrics handles id variable detection well.

The fallback to the first column when preferred id vars aren't present is a sensible default.

234-283: LGTM! plot_performance implementation is solid.

The function correctly uses seaborn's estimator="mean" with errorbar="sd" for aggregating repeated measurements, and applies appropriate legend styling per theme.

examples/sugarscape_ig/backend_mesa/model.py (5)

31-42: LGTM! _safe_corr handles edge cases correctly.

The function properly handles degenerate inputs (too few elements, constant arrays) by returning NaN, matching the documented Frames helper behavior.

45-54: Forward reference Sugarscape requires string annotation or import reordering.

Sugarscape is used as a type hint before the class is defined. While from __future__ import annotations (line 9) enables postponed evaluation making this work, ensure this import remains at the top.

75-144: LGTM! Sugarscape model initialization is well-structured.

The model correctly:

Generates a seed if not provided

Initializes sugar fields with NumPy RNG

Places agents on empty cells

Sets up comprehensive DataCollector reporters

148-181: LGTM! Step scheduling logic matches documented order.

The move → eat/starve → regrow → collect sequence matches the docstring and tutorial schedule. The survivor filtering in _harvest_and_survive is correctly implemented.

209-311: LGTM! CLI implementation is comprehensive and consistent.

The Typer CLI follows the same patterns as the Boltzmann example:

Timestamped output directories

Optional CSV persistence and plotting

Per-metric plot generation

Clear console output

examples/boltzmann_wealth/backend_mesa.py (3)

43-57: LGTM! MoneyAgent implementation is correct.

The agent correctly:

Initializes with 1 unit of wealth

Guards against transferring with zero/negative wealth

Randomly selects a recipient from the agent list

60-90: LGTM! MoneyModel implementation is clean.

The model correctly initializes agents, sets up the DataCollector, and implements the shuffled step execution.

103-177: LGTM! CLI implementation follows consistent patterns.

The run command properly:

Creates timestamped output directories

Measures and reports runtime

Extracts and displays metrics

Saves CSV and generates plots conditionally

benchmarks/cli.py (5)

27-34: LGTM! Runner protocol is well-defined.

The RunnerP protocol cleanly abstracts the runner interface for both Mesa and Frames backends.

43-78: LGTM! Model configuration with lambda wrappers is appropriate.

The lambda wrappers for Sugarscape runners correctly adapt the width/height requirements to match the simpler RunnerP signature. The grid sizing formula (max(20, ceil(sqrt(agents)) * 2)) ensures adequate space.

200-205: LGTM! Runtime type-checking warning is helpful.

Good practice to warn users that beartype runtime checking can significantly impact benchmark results.

213-244: LGTM! Benchmark loop implementation is correct.

The nested loop correctly iterates over models, agent counts, repeats, and backends, recording all relevant metadata for each run.

246-262: LGTM! Output handling is well-organized.

The unified timestamp directory structure with per-model CSVs and a plots subfolder is clean and maintainable.

benchmarks/cli.py

examples/boltzmann_wealth/backend_frames.py

examples/boltzmann_wealth/backend_mesa.py

examples/boltzmann_wealth/README.md

examples/sugarscape_ig/backend_frames/model.py

examples/sugarscape_ig/backend_mesa/agents.py

examples/sugarscape_ig/backend_mesa/model.py

examples/sugarscape_ig/README.md

…ocumentation

EwoutH · 2025-12-09T04:09:00Z

@Ben-geo can you take the lead in reviewing this?

Ben-geo · 2025-12-09T07:52:52Z

Yep, I’ve got it! - I’ll review all the new PRs this week.

I’ll start going through some tonight, and I’ll wrap up by the weekend.

Ben-geo

LGTM!
ran it locally as well

…tmesa/mesa-frames into split/examples-benchmarks

for more information, see https://pre-commit.ci

…rse_agents function

…te default at call time

…s themes

…proved flexibility

…o split/examples-benchmarks

…tmesa/mesa-frames into split/examples-benchmarks

for more information, see https://pre-commit.ci

…tmesa/mesa-frames into split/examples-benchmarks

…rect tutorial path.

adamamer20 · 2025-12-15T19:41:56Z

@coderabbitai review

coderabbitai · 2025-12-15T19:42:20Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (7)

examples/sugarscape_ig/backend_frames/model.py (1)
58-71: Redundant size checks in gini() function.

The function checks sugar.size == 0 at line 60-61, then immediately calls np.sort() and checks n == 0 again at line 64-65. Since n = sorted_vals.size equals sugar.size, the second check is unreachable.
     sugar = primary_set.df["sugar"].to_numpy().astype(np.float64)
-
-    if sugar.size == 0:
-        return float("nan")
-    sorted_vals = np.sort(sugar.astype(np.float64))
+    sorted_vals = np.sort(sugar)
     n = sorted_vals.size
     if n == 0:
         return float("nan")
examples/plotting.py (2)
73-77: Consider logging SVG export failures instead of silently passing.

Silently swallowing exceptions can make debugging difficult. Even for "nice-to-have" features, logging the failure helps identify environment issues.
+import logging
+
+_logger = logging.getLogger(__name__)
+
 def _finalize_and_save(fig: Figure, output_dir: Path, stem: str, theme: str) -> None:
     """Tight layout with space for suptitle, export PNG + (optional) SVG."""
     output_dir.mkdir(parents=True, exist_ok=True)
     fig.tight_layout(rect=[0, 0, 1, 0.94])
     png = output_dir / f"{stem}_{theme}.png"
     fig.savefig(png, dpi=300)
     try:
         fig.savefig(output_dir / f"{stem}_{theme}.svg", bbox_inches="tight")
-    except Exception:
-        pass  # SVG is a nice-to-have
+    except Exception as exc:
+        _logger.debug("SVG export failed for %s: %s", stem, exc)
     plt.close(fig)
302-306: Optional: Sort __all__ alphabetically.

Static analysis suggests sorting __all__ for consistency (RUF022). This is a minor style preference.
 __all__ = [
+    "plot_agent_metrics",
     "plot_model_metrics",
-    "plot_agent_metrics",
     "plot_performance",
 ]
examples/boltzmann_wealth/backend_mesa.py (2)
104-146: Variable seed shadows the function parameter.

Line 146 reassigns seed = model_pd["seed"].iloc[0], shadowing the seed parameter from line 104. While this works correctly (the extracted seed equals the passed seed or auto-generated one), the shadowing can be confusing. Consider renaming to extracted_seed or similar.
-    seed = model_pd["seed"].iloc[0]
+    extracted_seed = model_pd["seed"].iloc[0]
     model_pd = model_pd[["step", "gini"]]
 
     # Show a short tail in console for quick inspection
     ...
             subtitle=f"mesa backend; seed={seed}",
+            # or use extracted_seed if you want the value from the DataFrame
135-140: Replace non-breaking hyphens with regular hyphens in comments.

Static analysis detected ambiguous non-breaking hyphens (U+2011) in comments at lines 135 and 140. These could cause issues if copy-pasted.
-    # Run simulation (Mesa‑idiomatic): we only use DataCollector's public API
+    # Run simulation (Mesa-idiomatic): we only use DataCollector's public API
     result = simulate(agents=agents, steps=steps, seed=seed)
     typer.echo(f"Simulation completed in {perf_counter() - start_time:.3f} seconds")
     dc = result.datacollector
 
-    # ---- Extract metrics (no helper, no monkey‑patch):
+    # ---- Extract metrics (no helper, no monkey-patch):
benchmarks/cli.py (2)
27-41: Widen RunnerP return type to match actual backend simulate functions

The concrete runners (e.g. boltzmann_* .simulate, sugarscape_* .simulate) return result objects, but RunnerP is declared as returning None. Static type-checkers will flag Backend.runner=...simulate assignments against this protocol.

Since the CLI ignores the return value, you can safely widen the protocol to avoid spurious type errors:
class RunnerP(Protocol):
-    def __call__(self, agents: int, steps: int, seed: int | None = None) -> None: ...
+    def __call__(self, agents: int, steps: int, seed: int | None = None) -> object: ...
This keeps the runtime behavior identical while staying consistent with the existing backends.

81-101: Agent range parsing looks solid; optional whitespace handling improvement

The range parsing logic (validation, inclusive stop, and start = 0 support for init‑time benchmarks) looks correct and robust.

If you want to make the CLI a bit more forgiving, you could accept whitespace around colons (e.g. --agents "0: 500: 100"):
-        parts = value.split(":")
+        parts = [part.strip() for part in value.split(":")]
         if len(parts) != 3:
             raise typer.BadParameter("Ranges must use start:stop:step format")
         try:
             start, stop, step = (int(part) for part in parts)
Pure ergonomics; current behavior is already fine.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aadee32 and 6e5bfbb.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (11)

benchmarks/README.md (1 hunks)
benchmarks/cli.py (1 hunks)
examples/boltzmann_wealth/README.md (1 hunks)
examples/boltzmann_wealth/backend_frames.py (1 hunks)
examples/boltzmann_wealth/backend_mesa.py (1 hunks)
examples/plotting.py (1 hunks)
examples/sugarscape_ig/README.md (1 hunks)
examples/sugarscape_ig/backend_frames/model.py (1 hunks)
examples/sugarscape_ig/backend_mesa/agents.py (1 hunks)
examples/sugarscape_ig/backend_mesa/model.py (1 hunks)
pyproject.toml (1 hunks)

✅ Files skipped from review due to trivial changes (1)

benchmarks/README.md

🚧 Files skipped from review as they are similar to previous changes (4)

examples/sugarscape_ig/backend_mesa/agents.py
pyproject.toml
examples/sugarscape_ig/README.md
examples/boltzmann_wealth/backend_frames.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use Python 3.11 or later with 4-space indentation
Include type hints for all public APIs
Use NumPy-style docstrings (validated by Ruff/pydoclint)
Use CamelCase for class names
Use snake_case for function and attribute names
Avoid using TYPE_CHECKING guards for type annotations because the project uses beartype for runtime type checking, which requires the actual type objects to be available at runtime
Treat underscored attributes as internal/private and not part of the public API

Files:

examples/sugarscape_ig/backend_frames/model.py
examples/plotting.py
examples/sugarscape_ig/backend_mesa/model.py
examples/boltzmann_wealth/backend_mesa.py
benchmarks/cli.py

🧬 Code graph analysis (3)

examples/sugarscape_ig/backend_frames/model.py (3)

examples/utils.py (1)

FramesSimulationResult (7-14)

examples/plotting.py (1)

plot_model_metrics (83-182)

mesa_frames/abstract/datacollector.py (2)

collect (118-128)

flush (176-196)

examples/sugarscape_ig/backend_mesa/model.py (3)

examples/utils.py (1)

MesaSimulationResult (18-25)

examples/plotting.py (1)

plot_model_metrics (83-182)

examples/sugarscape_ig/backend_mesa/agents.py (2)

AntAgent (16-78)

move (75-78)

benchmarks/cli.py (5)

examples/plotting.py (1)

plot_performance (250-299)

examples/boltzmann_wealth/backend_mesa.py (5)

simulate (89-94)

step (47-54)

step (78-82)

run (84-86)

run (101-174)

examples/sugarscape_ig/backend_mesa/model.py (4)

simulate (189-202)

step (171-180)

run (182-186)

run (209-308)

examples/boltzmann_wealth/backend_frames.py (5)

simulate (115-124)

step (52-76)

step (106-108)

run (110-112)

run (131-184)

examples/sugarscape_ig/backend_frames/model.py (4)

simulate (389-409)

step (333-350)

run (352-365)

run (416-496)

🪛 Ruff (0.14.8)

examples/sugarscape_ig/backend_frames/model.py

227-229: Avoid specifying long messages outside the exception class

(TRY003)

examples/plotting.py

75-76: try-except-pass detected, consider logging the exception

(S110)

75-75: Do not catch blind exception: Exception

(BLE001)

115-115: Do not catch blind exception: Exception

(BLE001)

289-289: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

302-306: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

examples/sugarscape_ig/backend_mesa/model.py

133-133: Unused lambda argument: m

(ARG005)

examples/boltzmann_wealth/backend_mesa.py

73-73: Unused lambda argument: m

(ARG005)

135-135: Comment contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF003)

140-140: Comment contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF003)

benchmarks/cli.py

86-86: Avoid specifying long messages outside the exception class

(TRY003)

90-90: Avoid specifying long messages outside the exception class

(TRY003)

92-92: Avoid specifying long messages outside the exception class

(TRY003)

95-95: Avoid specifying long messages outside the exception class

(TRY003)

97-97: Avoid specifying long messages outside the exception class

(TRY003)

105-105: Avoid specifying long messages outside the exception class

(TRY003)

107-107: Avoid specifying long messages outside the exception class

(TRY003)

127-127: Avoid specifying long messages outside the exception class

(TRY003)

130-130: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (9)

examples/sugarscape_ig/backend_frames/model.py (2)

463-465: Past review issue addressed: CSV confirmation message now correctly scoped.

The typer.echo(f"Saved CSV results under {results_dir}") is now inside the if save_results: block, fixing the misleading message issue flagged in the previous review.

174-279: Sugarscape model implementation looks correct.

The model properly wires the grid, agent sets, and data collector. The conditional storage backend selection (memory vs CSV based on results_dir) is a good pattern for supporting both benchmarking and CLI use cases.

examples/plotting.py (1)

82-183: Well-designed model metrics plotting with theme support.

The function handles edge cases (empty data, missing step column), automatically infers step count, and generates consistent light/dark themed outputs. The legend handling for single vs. multiple metrics is a nice touch.

examples/sugarscape_ig/backend_mesa/model.py (2)

130-134: Unused lambda argument m is intentional for DataCollector API.

The static analysis flags lambda m: seed as having an unused argument, but this is required by Mesa's DataCollector API which passes the model to all reporter lambdas. No change needed.

74-143: Sugarscape Mesa model implementation is correct.

The model properly initializes the sugar field, grid, and agents with reproducible RNG seeding. The DataCollector setup mirrors the Frames backend for CSV comparability.

examples/boltzmann_wealth/backend_mesa.py (2)

21-37: Gini coefficient implementation is robust.

The function handles edge cases well: empty arrays, all-zero values, and uniform distributions all return appropriate values (nan or 0.0). The formula implementation is correct.

40-55: MoneyAgent step logic is correct.

The agent correctly skips if wealth is 0, selects a random peer, and transfers one unit. The if other is None check at line 51-52 is defensive but random.choice on a non-empty list won't return None, so this is safe.

benchmarks/cli.py (2)

111-138: Model selection parsing is clear and robust

_parse_models nicely handles "all", comma-separated lists, validation against MODELS, and order-preserving de-duplication. I don’t see any changes needed here.

141-160: Consistent reuse of shared plotting utilities

The _plot_performance wrapper is minimal and does the right thing: early‑exits on empty data, selects the expected columns, and delegates to the shared examples.plotting.plot_performance with a clear stem and title. Looks good.

coderabbitai · 2025-12-15T19:46:38Z

benchmarks/cli.py

+@app.command()
+def run(
+    models: Annotated[
+        str | list[str],
+        typer.Option(
+            help="Models to benchmark: boltzmann, sugarscape, or all",
+            callback=_parse_models,
+        ),
+    ] = "all",
+    agents: Annotated[
+        str | list[int],
+        typer.Option(
+            help="Agent count or range (start:stop:step)", callback=_parse_agents
+        ),
+    ] = "1000:5000:1000",
+    steps: Annotated[
+        int,
+        typer.Option(
+            min=0,
+            help="Number of steps per run.",
+        ),
+    ] = 100,
+    repeats: Annotated[int, typer.Option(help="Repeats per configuration.", min=1)] = 1,
+    seed: Annotated[int, typer.Option(help="Optional RNG seed.")] = 42,
+    save: Annotated[bool, typer.Option(help="Persist benchmark CSV results.")] = True,
+    plot: Annotated[bool, typer.Option(help="Render performance plots.")] = True,
+    results_dir: Annotated[
+        Path | None,
+        typer.Option(
+            help=(
+                "Base directory for benchmark outputs. A timestamped subdirectory "
+                "(e.g. results/20250101_120000) is created with CSV files at the root "
+                "and a 'plots/' subfolder for images. Defaults to the module's results directory."
+            ),
+        ),
+    ] = None,
+) -> None:
+    """Run performance benchmarks for the selected models."""
+    # Support both CLI (via callbacks) and direct function calls
+    if isinstance(models, str):
+        models = _parse_models(models)
+    if isinstance(agents, str):
+        agents = _parse_agents(agents)
+    # Ensure module-relative default is computed at call time (avoids import-time side effects)
+    if results_dir is None:
+        results_dir = Path(__file__).resolve().parent / "results"
+
+    runtime_typechecking = os.environ.get("MESA_FRAMES_RUNTIME_TYPECHECKING", "")
+    if runtime_typechecking and runtime_typechecking.lower() not in {"0", "false"}:
+        typer.secho(
+            "Warning: MESA_FRAMES_RUNTIME_TYPECHECKING is enabled; benchmarks may run significantly slower.",
+            fg=typer.colors.YELLOW,
+        )
+    rows: list[dict[str, object]] = []
+    # Single timestamp per CLI invocation so all model results are co-located.
+    timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
+    # Create unified output layout: <results_dir>/<timestamp>/{CSV files, plots/}
+    base_results_dir = results_dir
+    timestamp_dir = (base_results_dir / timestamp).resolve()
+    plots_subdir: Path = timestamp_dir / "plots"
+    for model in models:
+        config = MODELS[model]
+        typer.echo(f"Benchmarking {model} with agents {agents}")
+        for agents_count in agents:
+            for repeat_idx in range(repeats):
+                run_seed = seed + repeat_idx
+                for backend in config.backends:
+                    start = perf_counter()
+                    backend.runner(agents_count, steps, run_seed)
+                    runtime = perf_counter() - start
+                    rows.append(
+                        {
+                            "model": model,
+                            "backend": backend.name,
+                            "agents": agents_count,
+                            "steps": steps,
+                            "seed": run_seed,
+                            "repeat_idx": repeat_idx,
+                            "runtime_seconds": runtime,
+                            "timestamp": timestamp,
+                        }
+                    )
+                    # Report completion of this run to the CLI
+                    typer.echo(
+                        f"Completed {backend.name} for model={model} agents={agents_count} steps={steps} seed={run_seed} repeat={repeat_idx} in {runtime:.3f}s"
+                    )
+        # Finished all runs for this model
+        typer.echo(f"Finished benchmarking model {model}")
+
+    if not rows:
+        typer.echo("No benchmark data collected.")
+        return
+    df = pl.DataFrame(rows)
+    if save:
+        timestamp_dir.mkdir(parents=True, exist_ok=True)
+        for model in models:
+            model_df = df.filter(pl.col("model") == model)
+            csv_path = timestamp_dir / f"{model}_perf_{timestamp}.csv"
+            model_df.write_csv(csv_path)
+            typer.echo(f"Saved {model} results to {csv_path}")
+    if plot:
+        plots_subdir.mkdir(parents=True, exist_ok=True)
+        for model in models:
+            model_df = df.filter(pl.col("model") == model)
+            _plot_performance(model_df, model, plots_subdir, timestamp)
+            typer.echo(f"Saved {model} plots under {plots_subdir}")
+
+    typer.echo(
+        f"Unified benchmark outputs written under {timestamp_dir} (CSV files) and {plots_subdir} (plots)"
+    )


⚠️ Potential issue | 🟡 Minor

Clarify final summary message based on save / plot flags

The core benchmarking loop and IO layout look good. One minor UX issue: the final message always claims that CSVs and plots were written under {timestamp_dir} / {plots_subdir}, even when save=False and/or plot=False.

To avoid misleading output, you could tailor the summary to what was actually produced:

- typer.echo( - f"Unified benchmark outputs written under {timestamp_dir} (CSV files) and {plots_subdir} (plots)" - ) + destinations: list[str] = [] + if save: + destinations.append(f"CSVs under {timestamp_dir}") + if plot: + destinations.append(f"plots under {plots_subdir}") + + if destinations: + typer.echo("Unified benchmark outputs written: " + "; ".join(destinations)) + else: + typer.echo("Benchmark run completed (save=False, plot=False; no files written).")

This keeps the behavior unchanged while making the CLI output more accurate in all flag combinations.

🤖 Prompt for AI Agents

In benchmarks/cli.py around lines 163 to 272, the final summary always claims CSVs and plots were written even when save=False and/or plot=False; change the final message to reflect only what was actually produced by building a conditional summary: if save True include the CSV timestamp_dir path, if plot True include the plots_subdir path, otherwise note that CSVs or plots were not generated; then print a single concise summary string that lists produced artifacts (or states none were produced) instead of always mentioning both.

coderabbitai · 2025-12-15T19:46:38Z

examples/boltzmann_wealth/README.md

+# Boltzmann Wealth Exchange Model
+
+## Overview
+
+This example implements a simple wealth exchange ("Boltzmann money") model in two
+backends:
+
+- `backend_frames.py` (Mesa Frames / vectorised `AgentSet`)
+- `backend_mesa.py` (classic Mesa / object-per-agent)
+
+Both expose a Typer CLI with symmetric options so you can compare correctness
+and performance directly.
+
+## Concept
+
+Each agent starts with 1 unit of wealth. At every step:
+
+1. Frames backend: all agents with strictly positive wealth become potential donors.
+   Each donor gives 1 unit of wealth, and a recipient is drawn (with replacement)
+   for every donating agent. A single vectorised update applies donor losses and
+   recipient gains.
+2. Mesa backend: agents are shuffled and iterate sequentially; each agent with
+   positive wealth transfers 1 unit to a randomly selected peer.
+
+The stochastic exchange process leads to an emergent, increasingly unequal
+wealth distribution and rising Gini coefficient, typically approaching a stable
+level below 1 (due to conservation and continued mixing).
+
+## Reported Metrics
+
+The model records per-step population Gini (`gini`). You can extend reporters by
+adding lambdas to `model_reporters` in either backend's constructor.
+
+Notes on interpretation:
+
+- Early steps: Gini ~ 0 (uniform initial wealth).
+- Mid phase: Increasing Gini as random exchanges concentrate wealth.
+- Late phase: Fluctuating plateau (a stochastic steady state) — exact level
+  varies with agent count and RNG seed.
+
+## Running
+
+Always run examples from the project root using `uv`:
+
+```bash
+uv run examples/boltzmann_wealth/backend_frames.py --agents 5000 --steps 200 --seed 123 --plot --save-results
+uv run examples/boltzmann_wealth/backend_mesa.py --agents 5000 --steps 200 --seed 123 --plot --save-results
+```
+
+## CLI options
+
+- `--agents` Number of agents (default 5000)
+- `--steps` Simulation steps (default 100)
+- `--seed` Optional RNG seed for reproducibility
+- `--plot / --no-plot` Generate line plot(s) of Gini
+- `--save-results / --no-save-results` Persist CSV metrics
+- `--results-dir` Override the auto-timestamped directory under `results/`
+
+Frames backend additionally warns if runtime type checking is enabled because it
+slows vectorised operations: set `MESA_FRAMES_RUNTIME_TYPECHECKING=0` for fair
+performance comparisons.
+
+## Outputs
+
+Each run creates (or uses) a results directory like:
+
+```text
+examples/boltzmann_wealth/results/20251016_173702/
+  model.csv            # step,gini
+  gini_<timestamp>_dark.png (and possibly other theme variants)
+```
+
+Tail metrics are printed to console for quick inspection:
+
+```text
+Metrics in the final 5 steps: shape: (5, 2)
+┌──────┬───────┐
+│ step ┆ gini  │
+│ ---  ┆ ---   │
+│ i64  ┆ f64   │
+├──────┼───────┤
+│ ...  ┆ ...   │
+└──────┴───────┘
+```
+
+## Performance & Benchmarking
+
+Use the shared benchmarking CLI to compare scaling, checkout `benchmarks/README.md`.
+
+## Programmatic Use
+
+```python
+from examples.boltzmann_wealth import backend_frames as bw_frames
+result = bw_frames.simulate(agents=10000, steps=250, seed=42)
+metrics = result.datacollector.data["model"]  # Polars DataFrame
+```


⚠️ Potential issue | 🟡 Minor

Documentation is well-structured and comprehensive.

The README provides clear explanations of the model concept, CLI usage, output structure, and programmatic API. One minor grammar fix:

Line 88: "checkout" should be "check out" (verb phrase).

-Use the shared benchmarking CLI to compare scaling, checkout `benchmarks/README.md`. +Use the shared benchmarking CLI to compare scaling, check out `benchmarks/README.md`.

🤖 Prompt for AI Agents

In examples/boltzmann_wealth/README.md around lines 1–96 (specifically line 88), the phrase "checkout `benchmarks/README.md`" is incorrect; change it to the verb phrase "check out `benchmarks/README.md`". Update the sentence in the Performance & Benchmarking section to use "check out" and run a quick spell/grammar check across the README to ensure no other similar verb/compound-word mistakes remain.

coderabbitai · 2025-12-15T19:46:38Z

examples/sugarscape_ig/backend_mesa/model.py

+    # Keep the full model metrics (step + any model reporters)
+    # Keep the full model metrics (step + any model reporters)
+


⚠️ Potential issue | 🟡 Minor

Remove duplicate comment line.

Line 257 is a duplicate of line 256.

# Extract metrics using DataCollector API model_pd = ( dc.get_model_vars_dataframe().reset_index().rename(columns={"index": "step"}) ) # Keep the full model metrics (step + any model reporters) - # Keep the full model metrics (step + any model reporters)

🤖 Prompt for AI Agents

In examples/sugarscape_ig/backend_mesa/model.py around lines 256 to 258, there is a duplicated comment line ("Keep the full model metrics (step + any model reporters)") on line 257; remove the duplicate line 257 so only a single instance of that comment remains.

coderabbitai · 2025-12-15T19:46:38Z

examples/sugarscape_ig/backend_mesa/model.py

+        typer.echo(f"Saved plots under {plots_dir}")
+
+    typer.echo(f"Saved CSV results under {results_dir}")


⚠️ Potential issue | 🟡 Minor

Misleading "Saved CSV results" message when save_results=False.

Line 308 unconditionally prints "Saved CSV results under {results_dir}" regardless of whether save_results is True. This should be moved inside the if save_results: block (lines 268-270) to match the backend_frames fix.

if save_results: csv_path = results_dir / "model.csv" model_pd.to_csv(csv_path, index=False) + typer.echo(f"Saved CSV results under {results_dir}") # Plot per-metric similar to the backend_frames example... ... typer.echo(f"Saved plots under {plots_dir}") - - typer.echo(f"Saved CSV results under {results_dir}")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In examples/sugarscape_ig/backend_mesa/model.py around lines 306-308, the message "Saved CSV results under {results_dir}" is printed unconditionally even when save_results is False; move this typer.echo call inside the existing if save_results: block (the block around lines 268-270 that handles saving CSV results) so the CSV-saved log is only emitted when results are actually written, mirroring the backend_frames fix.

… logic

for more information, see https://pre-commit.ci

…tmesa/mesa-frames into split/examples-benchmarks

adamamer20 added 2 commits December 8, 2025 21:12

Update .gitignore and pyproject.toml for benchmarks and new dependencies

aadee32

adamamer20 mentioned this pull request Dec 8, 2025

[Documentation]: Adding advanced tutorial and improvements to documentation #179

Closed

chatgpt-codex-connector bot reviewed Dec 8, 2025

View reviewed changes

uv.lock Show resolved Hide resolved

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

Update README.md: Remove redundant documentation section on related d…

96e0ac8

…ocumentation

adamamer20 added the examples additions or modifications to sample models label Dec 8, 2025

adamamer20 added this to the 0.2.0 milestone Dec 8, 2025

adamamer20 requested review from Ben-geo and EwoutH December 8, 2025 23:29

Ben-geo added 2 commits December 11, 2025 01:34

Merge branch 'main' into split/examples-benchmarks

9ebbf6a

Merge branch 'main' into split/examples-benchmarks

61172f3

Ben-geo approved these changes Dec 11, 2025

View reviewed changes

adamamer20 and others added 11 commits December 14, 2025 13:34

Merge branch 'split/examples-benchmarks' of https://github.com/projec…

288e7ed

…tmesa/mesa-frames into split/examples-benchmarks

[pre-commit.ci] auto fixes from pre-commit.com hooks

acb0d95

for more information, see https://pre-commit.ci

Add clarification comment for benchmarking initialization time in _pa…

d0cfd44

…rse_agents function

Refactor run function to use Optional[Path] for results_dir and compu…

266f854

…te default at call time

update uv.lock

c69a15e

Fix typo in README for results-dir option description

ec3cccd

Add user feedback for saved results in run function

af68918

Remove unused imports from backend_mesa.py

d62d440

Add confirmation message for saved CSV results in run function

5cf28ed

Remove unnecessary blank line in run function

380d5fd

Remove redundant seed value assignment in run function

c5bcce4

adamamer20 and others added 11 commits December 14, 2025 18:06

Fix model type annotation in AntAgent constructor

a16a0a0

Fix hyphenation in README for clarity on agents' population dynamics

5f86385

Remove unused pandas import from model.py

c9f6369

Enhance legend styling in plot functions for better readability acros…

1862147

…s themes

Enhance run command to support multiple model and agent inputs for im…

7c84351

…proved flexibility

Merge branch 'main' of https://github.com/projectmesa/mesa-frames int…

6b7c369

…o split/examples-benchmarks

Merge branch 'split/examples-benchmarks' of https://github.com/projec…

38bad2f

…tmesa/mesa-frames into split/examples-benchmarks

[pre-commit.ci] auto fixes from pre-commit.com hooks

7db5334

for more information, see https://pre-commit.ci

Remove unused pandas import from backend_mesa.py

9fa6f5c

Merge branch 'split/examples-benchmarks' of https://github.com/projec…

6e5bfbb

…tmesa/mesa-frames into split/examples-benchmarks

Fix documentation links in agents.py and model.py to point to the cor…

1aab6ca

…rect tutorial path.

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

adamamer20 and others added 6 commits December 15, 2025 21:12

Refactor gini function to simplify sugar array sorting

22cface

Fix order of exports in plotting.py to include plot_model_metrics

d7cbbca

Enhance CLI output for benchmark results and add tests for CSV saving…

43a0160

… logic

Format code for better readability in benchmark and sugarscape tests

277884b

[pre-commit.ci] auto fixes from pre-commit.com hooks

148b31c

for more information, see https://pre-commit.ci

Merge branch 'split/examples-benchmarks' of https://github.com/projec…

991eb8a

…tmesa/mesa-frames into split/examples-benchmarks

adamamer20 merged commit 29fa011 into main Dec 15, 2025
14 checks passed

adamamer20 deleted the split/examples-benchmarks branch December 15, 2025 20:25

		# Keep the full model metrics (step + any model reporters)
		# Keep the full model metrics (step + any model reporters)

		typer.echo(f"Saved plots under {plots_dir}")

		typer.echo(f"Saved CSV results under {results_dir}")

Refactor examples with common interface, plotting and benchmarking #188

Refactor examples with common interface, plotting and benchmarking #188

Uh oh!

Conversation

adamamer20 commented Dec 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

codecov bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

adamamer20 commented Dec 8, 2025

Uh oh!

coderabbitai bot commented Dec 8, 2025

Uh oh!

coderabbitai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EwoutH commented Dec 9, 2025

Uh oh!

Ben-geo commented Dec 9, 2025

Uh oh!

Ben-geo left a comment

Choose a reason for hiding this comment

Uh oh!

adamamer20 commented Dec 15, 2025

Uh oh!

coderabbitai bot commented Dec 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

adamamer20 commented Dec 8, 2025 •

edited by coderabbitai bot

Loading

codecov bot commented Dec 8, 2025 •

edited

Loading

coderabbitai bot commented Dec 8, 2025 •

edited

Loading