[Cross] Broadcast Refactor to 1-to-1 numpy match#538
Merged
Conversation
NumPy's broadcast_to is unilateral — it only stretches source dimensions that are size 1 to match the target shape. If the source has a dimension larger than the target, or more dimensions than the target, it raises ValueError. NumSharp's broadcast_to was delegating directly to the bilateral Broadcast(Shape, Shape) which allows both sides to stretch. Added ValidateBroadcastTo() helper called from all 9 broadcast_to overloads before the bilateral Broadcast call. The check enforces: - source ndim <= target ndim - each source dimension (right-aligned) must be 1 or equal to target This cannot live inside Broadcast() itself because arithmetic operations (a + b) require bilateral stretching of both operands. Verified with dotnet_run scripts against NumPy: broadcast_to(ones(3), (1,)) → now throws (was accepted) broadcast_to(ones(1,2), (2,1)) → now throws (was accepted) broadcast_to(ones(1,3), (2,3)) → still works (valid unilateral)
…ug 4) 2-arg Broadcast(Shape, Shape) threw NotSupportedException when either input was already broadcast. This blocked legitimate operations like np.clip on broadcast arrays (which internally re-broadcasts) and explicit re-broadcasting via broadcast_to. Changes to the 2-arg path: - Removed the IsBroadcasted guard at line 299 - When an input IsBroadcasted, resolve BroadcastInfo.OriginalShape as the root original for chain tracking — stride=0 dims from prior broadcasts naturally propagate through the stride computation loop - ViewInfo for sliced inputs now uses the resolved original shape Changes to the N-arg Broadcast(Shape[]) path: - Added ViewInfo handling for sliced inputs, matching the 2-arg path. Without this, N-arg broadcast_arrays with sliced inputs produced wrong values (GetOffset couldn't resolve slice strides) - Added re-broadcast support via BroadcastInfo.OriginalShape - Removed dead code: `it.size = tmp` (was immediately overwritten by ComputeHashcode which recalculates size from dimensions) Verified both paths produce identical results for sliced inputs: arange(12).reshape(3,4)[:,1:2] broadcast to (3,3) now correctly returns [[1,1,1],[5,5,5],[9,9,9]] through both 2-arg and N-arg paths. Re-broadcast chains tested up to triple depth: broadcast_to(broadcast_to(broadcast_to(x, s1), s2), s3) works.
Updated and added tests for the broadcast system audit: - BroadcastTo_UnilateralSemantics_RejectsInvalidCases: replaces the old BroadcastTo_BilateralBroadcast_KnownDiscrepancy test. Now verifies that (3,)→(1,), (1,2)→(2,1), and (1,1)→(1,) all throw, matching NumPy. - ReBroadcast_2Arg_SameShape: broadcast → re-broadcast same shape - ReBroadcast_2Arg_HigherDim: (3,1)→(3,3)→(2,3,3) chain - ReBroadcast_2Arg_ClipOnBroadcast: np.clip on broadcast (Bug 4 variant) - BroadcastArrays_NArg_SlicedInput_CorrectValues: N-arg path with sliced column input (was returning [0,0,0],[1,1,1],[2,2,2] instead of [1,1,1],[5,5,5],[9,9,9] due to missing ViewInfo) - BroadcastPaths_2Arg_vs_NArg_SlicedInput_Identical: verifies 2-arg and N-arg paths produce identical results for the same sliced input
New bugs found by running 65 stress tests against the broadcast system after Phase 3/4 fixes. All are pre-existing, not regressions. Bug 23a — reshape col-broadcast wrong element order: reshape(broadcast_to([[10],[20],[30]], (3,3)), (9,)) returns [10,20,30,10,20,30,...] instead of [10,10,10,20,20,20,...]. _reshapeBroadcast uses offset % OriginalShape.size modular arithmetic which walks original storage linearly instead of logical row-major. Workaround: np.copy(a).reshape(...). Bug 23b — np.abs on broadcast throws IncorrectShapeException: Cast creates UnmanagedStorage with mismatched shape size (broadcast size=6 vs storage size=0). The abs implementation doesn't handle broadcast arrays that have storage smaller than the broadcast shape. Bug 24 — transpose col-broadcast returns wrong values: broadcast_to([[10],[20],[30]], (3,3)).T returns [[10,10,10],...×3] instead of [[10,20,30],...×3]. Transpose materializes via Clone and creates plain strides [3,1], losing the stride=0 broadcast semantics. Should swap strides to [0,1] (zero-copy, like NumPy). Row-broadcast .T works by coincidence.
Entry point for planning the rewrite of NumSharp's view/offset resolution to match NumPy's base_offset + strides architecture. Current model: ViewInfo + BroadcastInfo chains with 6+ GetOffset code paths, recursive ParentShape resolution, and complex lazy-loaded UnreducedBroadcastedShape computation. Target model: base_offset (int) + strides[] + dimensions[]. Offset computation becomes a single loop: sum(stride[i] * coord[i]). All slice/broadcast/transpose operations just adjust the base_offset and strides — no chains, no special cases. The plan covers: - Investigation checklist (12 items): catalog all consumers, understand IArraySlice bounds checking, NDIterator relationship, reshape-after- slice interactions, generated template code, IsSliced/IsBroadcasted derivation from strides, memory management - Risk assessment with mitigations - Suggested incremental approach (prototype Shape2, verify parity, migrate) - NumPy reference files in src/numpy/ for each subsystem
Adds a 'GitHub Issues' section to .claude/CLAUDE.md that documents using `gh issue create` for SciSharp/NumSharp and notes GH_TOKEN availability via the env-tokens skill. Provides structured templates for Feature/Enhancement and Bug Report issues (checklists and fields such as overview, problem/proposal, evidence, scope, benchmarks, breaking changes, reproduction, expected/actual behavior, workaround, root cause, and related issues) to standardize reporting.
3f446e0 to
ae86956
Compare
…ape_unsafe, re-broadcast Shape.GetCoordinates: use dimension-based decomposition for broadcast shapes instead of stride-based, which breaks on zero-stride dims. Matches NumPy's PyArray_ITER_GOTO1D factor-based approach. NDArray.flatten (both overloads): guard broadcast arrays by delegating to np.ravel() — flat.copy() produced wrong element order, and non-clone path caused out-of-bounds reads on the small backing buffer. Default.Reduction.CumAdd: strip broadcast metadata via shape.Clean() before allocating the result array, preventing slice writes from going to a detached clone (IsBroadcasted clone path in GetViewInternal). NdArray.ReShape: fix reshape_unsafe to pass ref newshape instead of the instance's shape — was silently ignoring the requested shape. Tests: update re-broadcast test to expect success (not throw) after Bug 4 fix; fix GetCoordinates_Broadcasted to validate correct logical coordinates; clean up OpenBugs.cs (remove fixed bugs, keep reference comments).
Replace broken type-switch + GetCoordinates/GetOffset implementation with NumPy's empty_like + slice-copy approach. Fixes 5 tracked bugs: - Bug 27: np.roll returns int instead of NDArray - Bug 45: no-axis roll returns null - Bug 50: roll only supports Int32/Single/Double (now all 12 dtypes) - Bug 14a/b: broadcast roll produces zeros - Bug 19a/b: broadcast roll Data<T> reads garbage NDArray.roll.cs: rewritten from 104-line type-switch to 2-line delegation to np.roll(this, shift, axis). np.roll.cs: new 70-line static method — no axis: ravel→roll→reshape; with axis: empty_like + 2 slice-copy pairs (body shift + tail wrap). Handles negative axis, shift modulo, all dtypes via slicing. np.array_manipulation.cs: removed broken static np.roll that returned int. Add 110 roll tests (100 pass, 10 OpenBugs for multi-axis tuple shift API gap and empty 2D with axis=1). Add 52 ravel tests (50 pass, 2 OpenBugs for upstream Shape.IsContiguous too conservative on contiguous slices). Document C-order-only as architectural constraint in CLAUDE.md Key Design Decisions table.
…verload Fix aliasing bug: prototype.shape (raw int[]) was passed by reference to the new Shape, causing both arrays to share the same dimensions array. Now clones via (int[])prototype.shape.Clone(), matching full_like's existing pattern. Add shape override parameter (Shape shape = default): when provided, overrides the prototype's shape while preserving its dtype. Matches NumPy's empty_like(a, shape=(4,5)) signature. Add NPTypeCode overload: empty_like(NDArray, NPTypeCode, Shape) for callers that already have an NPTypeCode, avoiding Type→NPTypeCode conversion. Delegates to np.empty() for consistency. Add 103 tests verified against NumPy 2.4.2 ground truth covering: shape/dtype preservation (1D–4D, scalar), dtype override (Type and NPTypeCode, all 12 types), shape override (2D→1D/2D/3D, with dtype, same/diff size, scalar, broadcast/slice sources), empty arrays (zero-dim), sliced/broadcast/transposed prototypes, memory independence, writeability, aliasing fix verification, sibling contract comparison (zeros_like/ones_like), chained operations, and integration with np.roll pattern.
… 5 assertion bugs
FluentAssertions went proprietary after v8. AwesomeAssertions is the Apache 2.0
community fork — permanently free, actively maintained.
Package upgrade:
- FluentAssertions 5.10.3 → AwesomeAssertions 9.3.0 in csproj
- Renamed `using FluentAssertions` → `using AwesomeAssertions` across 83 test files
- Adapted to AA 9.x API: ReferenceTypeAssertions now requires (subject, AssertionChain)
constructor, Execute.Assertion replaced with AssertionChain field, Subject read-only
Bugs fixed in FluentExtension.cs:
- Bug 1: AllValuesBe error messages showed literal "0","1","2" instead of actual values
due to unescaped {0}/{1}/{2} inside $"" strings — fixed to {{0}}/{{1}}/{{2}}
- Bug 2: BeOfValuesApproximately all 12 dtype branches said "(dtype: Boolean)" — fixed
each branch to show correct dtype name (Byte, Int16, Double, etc.)
- Bug 3: NDArrayAssertions.Identifier returned "shape" (copy-paste) — changed to "ndarray"
- Bug 4: BeShaped(ITuple)/BeEquivalentTo had no bounds check — added dimension count
assertion before accessing dimensions[i] to prevent IndexOutOfRangeException
- Bug 5: BeShaped used order-insensitive BeEquivalentTo, so BeShaped(3,2) would pass
on a (2,3) shape — changed to order-sensitive Equal(). This correctly exposed 5
pre-existing NumSharp bugs (np.moveaxis, NewAxis indexing, SlicingWithNewAxis)
AA 9.x API compatibility fixes across test files:
- .Array.Should().ContainInOrder() → .Data<int>().Should().ContainInOrder() (typed)
- .Array.Should().BeEquivalentTo(.Array) → .Data<bool>().Should().Equal() (typed)
- .Should().BeInAscendingOrder() → .Data<double>().Should().BeInAscendingOrder()
- BeEquivalentTo(params) → BeEquivalentTo(new[]{}) for type inference
- BeLessOrEqualTo → BeLessThanOrEqualTo (renamed in AA 8.x)
- .And.HaveCount() → .Which.Should().HaveCount() (chain semantics)
- Cast<T>().Should().BeEquivalentTo(NDArray) → .Should().Be(NDArray)
Infrastructure:
- Added FluentExtensionTests.cs with 72 tests covering all custom assertion methods,
error message quality (catches Bug 1/2 regressions), chaining, all 12 dtypes,
edge cases (scalar, sliced, broadcast, 2D), UnmanagedStorage entry point
- Removed OpenBugs.DeprecationAudit.cs (duplicate method names conflicting with OpenBugs.cs)
Test results: 1644 passed, 5 failed (pre-existing), 34 skipped — both net8.0 and net10.0
…new tests Correctness fixes in FluentExtension.cs: - Fix NotBe/NotBeShaped error messages: was "Expected shape to be X" when shapes ARE equal — now correctly says "Did not expect shape to be X" - Fix UInt64 overflow in BeOfValuesApproximately: unsigned subtraction (expected - nextval) wraps on underflow; cast to double before subtraction - Remove dead System.IO import New assertion capabilities added to ShapeAssertions and NDArrayAssertions: - BeContiguous() / NotBeContiguous() — asserts Shape.IsContiguous - HaveStrides(params int[]) — asserts exact stride values - BeEmpty() — asserts size == 0 (NDArrayAssertions only) - NotBe(NDArray) — complement to Be(), uses np.array_equal negation - NotBeOfType(NPTypeCode) / NotBeOfType<T>() — complement to BeOfType New infrastructure tests (16, total now 88): - Contiguous assertions: fresh array, sliced step, shape-level (4) - HaveStrides: shape pass, shape fail, ndarray pass (3) - BeEmpty: empty pass, non-empty fail (2) - NotBeOfType: mismatch pass, match fail, generic form (3) - NotBe: different pass, equal fail (2) - Error message correctness: NotBe/NotBeShaped say "Did not expect" (2) - UInt64 overflow regression: both directions (3UL vs 5UL) (1) All 88 infrastructure tests pass on net8.0 and net10.0. Full suite: 1644 pass, 5 fail (pre-existing NumSharp bugs), 34 skipped.
# Conflicts: # test/NumSharp.UnitTest/Selection/NDArray.Indexing.Test.cs
…rge files These 4 test files were added to broadcast-refactor after the tests branch diverged, so they weren't included in the AwesomeAssertions migration. The merge brought AwesomeAssertions as the package but these files still referenced `using FluentAssertions` — renamed to `using AwesomeAssertions`. Files fixed: - np.empty_like.Test.cs - np.ravel.Test.cs - np.reshape.Test.cs (new file, untracked) - np.roll.Test.cs
… correct offset resolution _reshapeBroadcast previously only set ViewInfo when the broadcast shape was also sliced (guarded by `if (IsSliced)`). Without ViewInfo, the reshaped shape's GetOffset fell through to the `offset % OriginalShape.size` modular arithmetic path, which happened to produce correct results for row broadcasts (where data is already laid out linearly) but produced wrong element ordering for column broadcasts and other non-trivial broadcast patterns. The fix removes the `if (IsSliced)` guard so ViewInfo is always set. This forces offset resolution through the recursive GetOffset path, which walks up to the parent broadcast shape and uses its strides (with zeros for broadcast dimensions) to compute the correct physical offset via GetCoordinates → parent.GetOffset. Validated against NumPy 2.4.2 output across 80+ individual checks: - Column, row, scalar, 3D, 4D, 5D broadcast reshapes - Slice→broadcast→reshape, broadcast→slice→reshape chains - Step slices, reverse slices, non-contiguous sources - Double/triple reshape chains, copy equivalence - All access patterns: flat, ToString, ravel, multi-dim indexing, copy Also adds comprehensive np.reshape test suite (61 tests) covering basic reshapes, -1 dimension inference, view semantics, scalar/empty arrays, sliced+reshape, broadcast+reshape, all 12 dtypes, large arrays, error cases, static vs instance API, transposed arrays.
Bug 66 (3 tests): swapaxes produces C-contiguous strides instead of permuted strides. For arange(24).reshape(2,3,4) with strides [12,4,1], swapaxes(0,2) should give [1,4,12] but gives [6,2,1]. Root cause: Default.Transpose.cs allocates new C-contiguous storage and copies data via MultiIterator.Assign, discarding the permuted strides. Direct consequence of Bug 64 (transpose copies instead of returning a view). Bug 67 (1 test): swapaxes on 0D scalar succeeds instead of throwing. NumPy scalar has shape=(), ndim=0 so any axis is out of bounds. NumSharp represents scalars as shape=[1], ndim=1, so swapaxes(0,0) is valid. Bug 68 (2 tests): swapaxes on empty arrays (shape with 0 dimension) crashes with InvalidOperationException from NDIterator. NumPy handles this correctly — just swaps dimensions. Resolves automatically when Bug 64 is fixed (no iteration needed for view). Bug 69 (2 tests): Out-of-bounds axis throws IndexOutOfRangeException (accidental leak from array access) instead of descriptive AxisError. Root cause: check_and_adjust_axis only adjusts negative indices but never validates bounds.
Migrate the test suite (156 files, ~2,076 tests) from MSTest to TUnit,
a modern .NET testing framework using source generators instead of reflection.
**csproj changes:**
- Add TUnit 1.13.11 as test framework (source-generated test discovery)
- Add OutputType=Exe (required by TUnit's Microsoft.Testing.Platform)
- Add TUnitAssertionsImplicitUsings=false (prevent TUnit.Assertions.Assert
from conflicting with MSTest's Assert class)
- Remove MSTest.TestAdapter 2.1.1 (replaced by TUnit engine)
- Remove Microsoft.NET.Test.Sdk 16.7.1 (replaced by Microsoft.Testing.Platform)
- Remove coverlet.collector 1.3.0 (incompatible with TUnit)
- Keep MSTest.TestFramework 2.1.1 for Assert.* compatibility (1,252 calls
across 85+ files — converting these risks argument-reorder bugs)
- Keep AwesomeAssertions 9.3.0 (~3,689 .Should() calls unchanged)
**New files:**
- global.json: Microsoft.Testing.Platform runner config, required for
`dotnet test` on .NET 10 SDK (MTP mode replaces VSTest)
- AssemblyAttributes.cs: [assembly: NotInParallel] disables TUnit's
default parallel execution for safety (MSTest ran sequentially)
**Attribute replacements across 156 test files:**
- [TestClass] → deleted (152 lines, TUnit doesn't need class-level markers)
- [TestMethod] → [Test] (2,056 occurrences)
- [DataTestMethod] → [Test] (11 files)
- [DataRow(] → [Arguments(] (195 parameterized test rows)
- [TestCategory(] → [Category(] (34 occurrences)
- [Ignore] / [Ignore("...")] → [Skip("...")] (12 occurrences)
- [ExpectedException(...)] → deleted (3 in np.any.Test.cs, all OpenBugs)
- [TestMethod, Ignore("...")] → [Test, Skip("...")] (combined attrs)
- [TestMethod, Timeout(10000)] → [Test, TUnit.Core.Timeout(10000)]
with CancellationToken parameter (TUnit requirement)
**Compile-time fixes:**
- TestClass.cs: Fully qualify System.Reflection.Assembly in 3 methods
to resolve ambiguity with TUnit's HookType.Assembly enum member
- Shape.Test.cs: Add CancellationToken parameter + System.Threading using
for TUnit's [Timeout] attribute requirement
**Test results (both net8.0 and net10.0):**
total: 2,076 | passed: 2,040 | failed: 25 (all pre-existing) | skipped: 11
All 25 failures are pre-existing dead-code/known-bug tests (AND/OR operators,
isnan/isfinite/isclose/allclose, memory allocation, broadcast/newaxis).
**Usage changes:**
- `dotnet test --project <path> --treenode-filter "/*/*/*/*[Category!=OpenBugs]"`
replaces the old `--filter "TestCategory!=OpenBugs"` syntax
- `dotnet run --project <path> -- --treenode-filter ...` also works directly
… for TUnit
Enable TUnit's default parallel test execution by removing the
[assembly: NotInParallel] guard. Tests run ~43% faster in parallel
(~8s vs ~14s sequential for 2,076 tests).
**Parallel race condition fixes:**
- np.load.Test.cs: Add [NotInParallel] on NumpyLoad class — tests share
a read-only data file (data/1-dim-int32_4_comma_empty.npy) that np.Load
opens with exclusive access
- np.tofromfile.Test.cs: Fix copy-paste bug in NumpyToFromFileTestUShort1
that used nameof(NumpyToFromFileTestByte1) — both tests wrote to the same
file "test.NumpyToFromFileTestByte1" causing race conditions
**WindowsOnly platform auto-skip:**
- Add WindowsOnlyAttribute (extends TUnit.Core.SkipAttribute) that
auto-skips tests on non-Windows via OperatingSystem.IsWindows()
- Replace [Category("WindowsOnly")] with [WindowsOnly] on 3 bitmap
test classes (BitmapExtensionsTests, BitmapWithAlphaTests, OpenBugsBitmap)
- Eliminates need for separate CI filter logic per OS
**CI workflow update (build-and-release.yml):**
- Switch from `dotnet test --filter "TestCategory!=OpenBugs"` (VSTest) to
`dotnet run -- --treenode-filter "/*/*/*/*[Category!=OpenBugs]"` (MTP)
- Remove per-OS filter matrix (WindowsOnly now handled by runtime skip)
- Simplify matrix to just os: [windows-latest, ubuntu-latest, macos-latest]
- Add --report-trx for TRX artifact upload
**Stability:** 8 consecutive runs (5 net10.0 + 3 net8.0), all identical:
2,076 total | 2,040 passed | 25 failed (pre-existing) | 11 skipped
Closes #539
The previous CI config used `dotnet run` without --framework, which only runs one TFM. Split into two explicit steps (net8.0 and net10.0) to ensure both target frameworks are tested on all 3 OS runners.
Targeted optimizations on the tests dominating wall-clock time:
**Allocate_1GB (1,113ms → 70ms, 16x faster):**
np.ones → np.empty — test verifies large allocation succeeds,
not that 4GB of memory is filled with ones
**GcDoesntCollectArraySliceAlone (361ms → 95ms, 3.8x faster):**
Reduce iterations from 100K+1M to 10K+100K — still 110K allocations
with GC.Collect + sleep, more than sufficient to test GC correctness
**Dot product tests (removed redundant work):**
- Remove Console.WriteLine(np.dot(x,y).ToString(false)) calls that
recomputed the entire dot product AND stringified the result array
- Dot2x2, Dot2222x2222, Dot3412x5621, Dot311x511: each was calling
np.dot twice — once for debug output, once for assertion
- Dot30_300x30_300: remove Stopwatch + Console.WriteLine benchmark
scaffolding — the test just verifies the operation completes
**Net effect on total suite (2,076 tests, Release, parallel):**
Before: ~8.0s wall clock
After: ~6.6s wall clock (18% faster)
When GetView() produces a slice that describes a contiguous memory block, create an offset InternalArray alias instead of a ViewInfo-based alias. This makes IsContiguous=true for the result, enabling: - Fast-path NDIterator (pointer increment vs GetOffset per element) - Efficient ravel/flatten (can return view instead of copy) - Proper copyto semantics **Contiguity detection algorithm:** Scan SliceDefs right-to-left. Trailing dimensions must be fully taken (Start=0, Step=1, Count=origDim). First partially-taken dimension must have Step=1 (or Count<=1). All dimensions left of that must have Count=1. Examples of contiguous slices now optimized: - arr[0, :] — first row of 2D (was ViewInfo, now offset alias) - arr[:5] — prefix slice (was ViewInfo, now offset alias) - arr[2:4, :, :] — row range of 3D (was ViewInfo, now offset alias) Non-contiguous slices unchanged (still use ViewInfo): - arr[::2] — stepped slice - arr[:, 0] — column slice (non-trailing partial dim)
Replaces flag-based IsContiguous computation with stride-based analysis matching NumPy's C_CONTIGUOUS algorithm (flagsobject.c:116-160). ## Changes ### Shape.cs - Add ComputeIsContiguousFromStrides() implementing NumPy algorithm: scan right-to-left, stride[-1]=1, stride[i]=shape[i+1]*stride[i+1], skip size-1 dimensions, empty arrays (dim=0) are contiguous - IsContiguous property now calls stride-based computation - GetCoordinates uses dimension-based decomposition for IsSliced shapes (strides may have gaps from step!=1 slices) - Slice() computes actual memory strides: origin.strides[i] * step enabling correct contiguity detection for step-2, reversed slices - TransformOffset checks ModifiedStrides (transposed shapes need GetOffset) ### Default.Transpose.cs - Returns view instead of copy (NumPy semantics) - Identity case (axis==start) returns array itself, not clone - Empty arrays: just permute dimensions, no data copy - Add axis bounds checking with AxisOutOfRangeException - Broadcastable arrays can use view (zero strides preserved) - Sliced/already-transposed arrays still need clone ### NdArray.ReShape.cs - Non-contiguous arrays (transposed/sliced) copy before reshape matching NumPy behavior where reshape of non-contiguous returns copy ### NDArray.flatten.cs - Add ModifiedStrides check for correct element ordering (transposed arrays must use ravel path) ### UnmanagedStorage.Slicing.cs - Enhanced documentation for contiguous slice optimization - Contiguous slices use InternalArray.Slice(offset, count) with clean shape enabling Address to point to correct location ### UnmanagedStorage.Cloning.cs - CloneData uses IsContiguous instead of checking flags separately now correctly handles transposed arrays ### Shape.Reshaping.cs - ViewInfo setup extended for ModifiedStrides (transposed shapes) ensures GetOffset correctly transforms through parent ### Tests - NdArray.Transpose.Test: expect view semantics (shares memory) - Add Shape.IsContiguous.Test.cs with comprehensive test cases ## Test Results - Failures: 217 → 141 (-76) - All IsContiguous behaviors verified against NumPy ## Architecture Note Views (IsSliced || IsBroadcasted) return IsContiguous=false because Address doesn't account for view offset. Contiguous slice optimization creates offset InternalArray with clean shape, making Address correct. This bridges NumPy's offset+strides model with NumSharp's ViewInfo model.
NumPy-aligned offset calculation replaces complex ViewInfo traversal: - Element access now uses simple formula: offset + sum(indices * strides) - Offset computed at slice time, strides include step factor - stride=0 handles broadcast repetition Removed ~200 lines of legacy code: - GetOffset_broadcasted, GetOffset_broadcasted_1D - GetOffset_IgnoreViewInfo - resolveUnreducedBroadcastedShape Added: - IsSimpleSlice property for fast-path documentation - Offset preservation in DefaultEngine.Broadcast() - 32 parity tests verifying NumPy behavior Test results: 123 failures (6 fewer than baseline 129) The removed recursive slice handling was always fragile; NumPy handles reshape-of-slice differently (copies if non-contiguous).
Add a complete benchmark infrastructure for comparing NumSharp performance against NumPy baselines using BenchmarkDotNet and Python. ## Structure - benchmark/NumSharp.Benchmark.GraphEngine/ - C# BenchmarkDotNet project - benchmark/NumSharp.Benchmark.Python/ - NumPy baseline benchmarks - benchmark/scripts/ - Helper scripts for result merging - benchmark/run-benchmarks.ps1 - Main runner with report generation ## Benchmark Suites (130+ operations) - Arithmetic: +, -, *, /, % with element-wise and scalar variants - Unary: sqrt, abs, exp, log, sin, cos, tan, etc. - Reduction: sum, mean, var, std, min, max, argmin, argmax - Broadcasting: scalar, row, column, 3D patterns - Creation: zeros, ones, empty, full, copy, *_like - Manipulation: reshape, transpose, ravel, flatten, stack - Slicing: contiguous, strided, reversed views - MultiDim: 1D vs 2D vs 3D performance comparison - Dispatch: comparison of dispatch mechanisms (DynamicMethod, static, struct) - Fusion: multi-pass vs fused kernel patterns ## Array Sizes - Scalar (1): pure overhead measurement - Tiny (100): common small collections - Small (1K): L1 cache tier - Medium (100K): L2/L3 cache tier - Large (10M): memory-bound throughput ## Features - Interactive menu for selecting benchmark suites - Automated report generation (markdown, JSON, CSV) - README.md auto-updates with latest results when present - Matching methodology: same operations, sizes, seeds as NumPy - All 12 NumSharp data types supported
Add comprehensive documentation for 24 NumPy Enhancement Proposals (NEPs) relevant to NumSharp's goal of 1-to-1 NumPy 2.x behavioral compatibility. Documentation structure: - README.md: Index with priority tiers, quick reference, implementation roadmap - Individual NEP files: Detailed analysis of each proposal Priority classifications: - CRITICAL (NumPy 2.0 breaking): NEP 50 (type promotion), NEP 52 (API cleanup), NEP 56 (Array API standard) - HIGH (significant impl): NEP 01 (.npy format), NEP 07 (datetime), NEP 19 (RNG), NEP 27 (zero-rank), NEP 38/54 (SIMD) - MEDIUM (behavioral): NEP 05/20 (gufuncs), NEP 10 (iterator), NEP 21 (indexing), NEP 34 (ragged), NEP 42/43 (dtypes), NEP 51 (scalar repr) - LOW (informational): NEP 13/18 (Python dispatch), NEP 32 (remove financial), NEP 49 (allocators), NEP 53 (C-API) Includes .NET SIMD implementation patterns and NumPy 1.x vs 2.x quick reference. Related: #547, #544, #545, #529 (NumPy 2.x Compliance milestone)
Broadcasting tests (row/column vector) were duplicated between AddBenchmarks and BroadcastBenchmarks. The Byte type failed on broadcasting operations, causing benchmark failures. Changes: - Remove _matrix, _rowVector, _colVector fields - Remove Add_RowBroadcast and Add_ColBroadcast benchmark methods - Update docstring to note that broadcasting is in BroadcastBenchmarks BroadcastBenchmarks.cs already covers these scenarios with float64 only, avoiding the type compatibility issues. AddBenchmarks now focuses on element-wise and scalar operations across all ArithmeticTypes.
Adds source files for the NumSharp.Benchmark.Exploration project - a standalone benchmark suite for isolated performance experiments. Structure: - Infrastructure/: BenchFramework (timing), BenchResult (data model), OutputFormatters (CSV/JSON/MD), SimdImplementations (SIMD patterns) - Isolated/: Self-contained micro-benchmarks for specific scenarios - SizeThresholds: Find N where SIMD overhead breaks even - BroadcastScenarios: Isolated broadcast pattern benchmarks - SimdStrategies: Compare Vector<T> vs AVX2 vs loop - DispatchOverhead: Measure call overhead - MemoryPatterns: Sequential vs strided access - CombinedOptimizations: Multi-optimization combinations - Integration/: NumSharpBroadcast tests against real NumSharp - BenchmarkDotNet/: BenchmarkDotNet-formatted broadcast tests - Python/: NumPy baseline script for comparison - Results/: Output directory (.gitkeep, ignore generated files) Purpose: Exploration benchmarks help identify optimization opportunities before implementing them in the main NumSharp codebase. They provide isolated measurements without NumSharp's dispatch overhead.
…ture Update test files to work with the readonly Shape refactoring and write protection for broadcast arrays. Changes: - NpBroadcastFromNumPyTests.cs: Fix test assertions and method imports - NDArray.Indexing.Test.cs: Update tests for write protection behavior - CLAUDE.md: Documentation updates for new architecture
Implements the NumPy-aligned `ndarray.base` property chain for tracking view ownership. All views chain to the ultimate owner (not intermediate views), matching NumPy semantics. Storage-level: - Add `_baseStorage` internal field to UnmanagedStorage - Add `BaseStorage` public property (read-only by design) - Add `IsView` convenience property (equivalent to BaseStorage != null) - Update all three `Alias()` overloads to propagate base reference - Update `CreateBroadcastedUnsafe(storage, shape)` for base tracking - Update `GetData()` slicing to chain to ultimate owner NDArray-level: - Add `@base` property returning NDArray wrapper of BaseStorage - Document semantic difference from NumPy: property returns new wrapper each call (not cached), but Storage reference equality holds Affected operations that now track base: - Slicing via indexer (a["2:5"]) - Selection getter (fancy indexing) - Reshape (when returning view) - Alias() for explicit view creation - Broadcast operations This enables: - View detection: `arr.@base != null` or `arr.Storage.IsView` - Memory debugging: trace which array owns shared data - NumPy-compatible semantics for view chains
Updates project documentation to reflect readonly struct Shape design: Shape architecture section: - Document internal fields (dimensions, strides, offset, bufferSize, _flags) - Document ArrayFlags enum values matching NumPy's ndarraytypes.h - Document key O(1) properties: IsContiguous, IsBroadcasted, IsWriteable, IsSliced, IsSimpleSlice Key design decisions: - Add Shape readonly struct entry - Add broadcast write protection entry - Update C-order description to reference ArrayFlags.C_CONTIGUOUS Capability reference updates: - Fix np.cumsum location (APIs/np.cumsum.cs, not NDArray.cumsum.cs duplicate) - Add missing Math functions (add, subtract, multiply, divide, mod, etc.) - Fix Sorting paths (Sorting_Searching_Counting/ not Sorting/) - Update np.roll status: fully implemented (was partial) Test filtering: - Update treenode-filter examples for 4-level path pattern
The IsContiguous fix has been validated - these tests now pass and should run unconditionally as part of the normal test suite. Tests promoted to regular execution: - IsContiguous_Step1Slice1D - IsContiguous_RowSlice2D - IsContiguous_SingleRow2D - IsContiguous_SingleRowPartialCol2D - IsContiguous_SingleElement1D - IsContiguous_3D_RowSlice - IsContiguous_3D_SingleRowPartialCol - IsContiguous_SliceOfContiguousSlice - IsContiguous_SliceOfSteppedSlice_SingleElement - ViewSemantics_Step1Slice1D_MutationPropagates - ViewSemantics_RowSlice2D_MutationPropagates - ViewSemantics_SingleRowPartialCol_MutationPropagates - ViewSemantics_SliceOfContiguousSlice_MutationPropagates - Ravel_ContiguousSlice1D_IsView - Ravel_ContiguousRowSlice2D_IsView - Copyto_ContiguousSlice_FastPath - ContiguousSlice_Float64/Float32/Byte/Int64_Values - FullSlice_IsContiguous - ContiguousSlice_ThenReshape_Values These verify NumPy-aligned behavior: step-1 slices are marked contiguous.
Replaces string-based category with typed attribute defined in TestCategory.cs for better IDE support and compile-time validation. Files updated: - Issues/448.cs - Logic/np.any.Test.cs - Logic/np_all_axis_Test.cs - Manipulation/np.ravel.Test.cs - Manipulation/np.reshape.Test.cs - Manipulation/np.roll.Test.cs - OpenBugs.Bitmap.cs - OpenBugs.cs (class-level attribute) - Selection/NDArray.Indexing.Test.cs Both forms work with TUnit's --treenode-filter: /*/*/*/*[Category!=OpenBugs] The typed attribute is preferred for: - Compile-time typo detection - IDE autocomplete and navigation - Consistent usage across the codebase
Adds explicit braces to if statements in reduction axis-handling code
for consistency with project coding conventions.
Files: Default.Reduction.{AMax,AMin,Add,Mean,Product,Std,Var}.cs
…butes
Add test coverage for the NumPy-compatible .base property:
NDArray.Base.Test.cs (35 tests):
- NumPy behavior: owned arrays have null base, views chain to owner
- View chaining: slice-of-slice chains to ultimate owner (not intermediate)
- Copy ownership: copy() creates owned array with null base
- Operations: reshape, transpose, broadcast_to, expand_dims
- Edge cases: scalar, 0-d, empty arrays
- All 12 dtypes verification
NDArray.Base.MemoryLeakTest.cs:
- Memory lifecycle: views keep base alive
- Concurrent access safety
- Finalization ordering
- [Misaligned] test for broadcast-then-slice materialization
TestCategory.cs:
- [OpenBugs] - known failing tests (excluded from CI)
- [Misaligned] - NumSharp differs from NumPy (runs, documents difference)
- [WindowsOnly] - platform-specific tests (GDI+/System.Drawing)
These typed attributes replace string-based [Category("...")] for
better IDE support and compile-time checking.
Documents the design and implementation approach for NumPy-compatible .base property tracking at the UnmanagedStorage level. Key decisions documented: - Storage-level _baseStorage field chains to ultimate owner - Memory safety via shared Disposer (not base reference) - Read-only BaseStorage property (prevents ownership corruption) - Known limitation: broadcast slicing materializes data Code paths analyzed: - Alias() overloads for view creation - GetData() for contiguous and broadcast paths - CreateBroadcastedUnsafe() for broadcast operations This plan was executed in commit ea8fef5.
The test project currently only targets net10.0 (net8.0 commented out with TODO note about TUnit compatibility). The CI was trying to test both frameworks, causing "No such file or directory" failures because the net8.0 executable doesn't exist. Aligns CI with test project's actual target framework until net8.0 support is re-enabled in NumSharp.UnitTest.csproj.
MSTest's [Ignore] attribute was not migrated to TUnit's [Skip] for StringArraySample1 test. TUnit doesn't recognize [Ignore], causing the test to run instead of being skipped. Fixes CI test execution where this test was unexpectedly running.
Complete the MSTest → TUnit migration by replacing all remaining [Ignore] attributes with [OpenBugs]. TUnit does not recognize MSTest's [Ignore] attribute, causing tests to run instead of being skipped. Changes across 16 test files: - AllocationTests.cs: 2GB/4GB/44GB allocation tests (Int32 limit) - ReduceAddTests.cs: keepdims returns wrong shape - np.dot.Test.cs: high-dimensional array bugs - np.matmul.Test.cs: ArgumentOutOfRangeException crashes - np.allclose.Test.cs: depends on unimplemented np.isclose - np.isclose.Test.cs: returns null (dead code) - np.isfinite.Test.cs: returns null (dead code) - np.isnan.Test.cs: returns null (dead code) - NDArray.flat.Test.cs: IsBroadcasted flag bug - np.moveaxis.Test.cs: wrong shape returned - NdArray.Convolve.Test.cs: returns null (dead code) - NDArray.AND.Test.cs: returns null (dead code) - NDArray.OR.Test.cs: returns null (dead code) - NDArray.Indexing.Test.cs: slice/newaxis bugs - NdArray.Mean.Test.cs: keepdims wrong shape - Shape.OffsetParity.Tests.cs: contiguous slice optimization All tests now properly excluded from CI via --treenode-filter "/*/*/*/*[Category!=OpenBugs]" instead of silently failing.
8b16e64 to
07f908d
Compare
Problem: - Job names showed ugly filters: "test (ubuntu-latest, & TestCategory!=WindowsOnly)" - WindowsOnly tests were running on Ubuntu/macOS and failing - Two conflicting WindowsOnlyAttribute classes caused namespace shadowing Root cause: - Commit 3c8350b added WindowsOnlyAttribute : CategoryAttribute in TestCategory.cs - This shadowed the existing Utilities/WindowsOnlyAttribute : SkipAttribute - Tests resolved [WindowsOnly] to the CategoryAttribute version (no skip behavior) - The CI workflow was simplified to remove the extra_filter matrix Fix: - Remove Utilities/WindowsOnlyAttribute.cs (eliminates namespace conflict) - Compute filter dynamically in workflow step using $RUNNER_OS - OpenBugs: excluded on all platforms (global) - WindowsOnly: excluded only on non-Windows (conditional) Result: - Clean job names: "test (ubuntu-latest)", "test (windows-latest)", etc. - WindowsOnly tests correctly skipped on Ubuntu/macOS - Single [WindowsOnly] attribute with clear semantics
TUnit's --treenode-filter doesn't support compound filters with & or AND operators reliably across platforms. Instead of CI filtering: 1. Add SkipOnNonWindowsAttribute (extends TUnit's SkipAttribute) - Runtime check: OperatingSystem.IsWindows() - Auto-skips WindowsOnly tests on non-Windows 2. Bitmap test classes now use both attributes: - [WindowsOnly] - CategoryAttribute for categorization/documentation - [SkipOnNonWindows] - SkipAttribute for runtime skip 3. Simplified CI workflow: - Single --treenode-filter for OpenBugs only (all platforms) - WindowsOnly handled at runtime by SkipOnNonWindows - Clean job names without platform-specific filters
Nucs
added a commit
that referenced
this pull request
Feb 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major architectural refactoring to align NumSharp with NumPy 2.x semantics. What started as two broadcast bug fixes evolved into a comprehensive modernization of the Shape system, test infrastructure, and view semantics.
Core Changes
Shape Architecture Rewrite — Shape is now a
readonly structwith immutable fields and cachedArrayFlagscomputed at construction, matching NumPy'sndarraytypes.h:ViewInfo/BroadcastInforeference chains with value fields:offset,bufferSize,strides[],_flagsGetOffsetsimplified to NumPy formula:offset + sum(indices * strides)— eliminated ~200 lines of recursive resolution codeIsContiguous,IsBroadcasted,IsWriteable,IsSlicedvia cached flagsIsContiguouscomputed from strides using NumPy's C_CONTIGUOUS algorithm (flagsobject.c:116-160)NumPy Behavioral Alignment:
broadcast_tonow enforces unilateral validation (only size-1 dims stretch)broadcast_to(broadcast_to(a, s1), s2)IsWriteable = false) — writes throw like NumPy's "assignment destination is read-only"transposereturns view (O(1)) instead of copy.baseproperty tracks view ownership chains to ultimate ownernp.rollrewritten using NumPy's slice-based algorithm (fixes 5 bugs, all 12 dtypes)Test Infrastructure Modernization:
[OpenBugs],[Misaligned],[WindowsOnly]with compile-time validationBug Fixes
broadcast_toaccepted invalid bilateral broadcastsNotSupportedExceptionIsBroadcastedguardoffset != 0New Capabilities
Benchmark Suite (
benchmark/):Documentation:
docs/neps/— 24 NEP analyses for NumPy 2.x compliance roadmapdocs/plans/offset-model-rewrite.md— Architecture investigation plan.claude/CLAUDE.md— Comprehensive project documentation updatesTest Coverage:
.baseproperty testsnp.empty_liketestsnp.rolltestsnp.reshapetests (broadcast+reshape chains)np.raveltestsIsContiguoustest suitePerformance
InternalArrayalias (fast-path NDIterator)Breaking Changes
.baseproperty.copy()before mutation.copy()for independent arrayTest Results
All new failures are pre-existing bugs documented in
OpenBugs.cs— zero regressions.Files Changed (Key)
View/Shape.csDefault.Broadcasting.csUnmanagedStorage.*.csCommits (44)
Architecture (10):
b9e3ef6cShape: readonly struct with ArrayFlagsa4e14ac6Storage: align with readonly Shapeb495766fMath: use IsContiguous for linear access57259b55IsContiguous from strides (NumPy algorithm)cb91bebeGetOffset NumPy architecturea70d21e4Remove deprecated ModifiedStrides9aab01c6Broadcast write protectionea8fef50.base property for view tracking4dd62d24Iterator offset check fixeb99e9b6Contiguous slice optimizationBug Fixes (10):
5ef8f132broadcast_to unilateral validation999f12aeRe-broadcast supportf9ab0291GetCoordinates, flatten, cumsum, reshape_unsafeba9b845fnp.roll rewrite20f703cenp.empty_like fixesc1f521f0Broadcast reshape ViewInfofd49c289Template broadcast refactor57259b55Transpose view semanticseb808a5cTest updates for readonly Shapee14ee6beTest API updatesTest Infrastructure (12):
ed4c7597MSTest → TUnit 1.13.11daae3f3cParallel execution, WindowsOnly auto-skip15ccc026CI net8.0 + net10.06c267181FluentAssertions → AwesomeAssertionsfd2521ddAssertion fixes + new capabilities53f9753eOptimize slow tests3c8350b4.base property testsa07701d3Broadcast audit tests82b01489Bugs 23-24 stress tests90cc6fe9swapaxes OpenBugs 66-69a9fa9fefRemove Option2Fix categorye669571eTyped [OpenBugs] attributeDocumentation (6):
e65af340NEP reference docs (24 NEPs)15a9d792Offset model rewrite planf41fc320Shape architecture docs6e24ab3eTUnit test framework docsae869564GitHub Issues templates82048e74.base property planBenchmark (4):
0071a884Comprehensive benchmark suitec6167408Exploration benchmark files42b72668Fix duplicate broadcast testsStyle/Cleanup (2):
aab53f0bBraces in reduction methodsb0e519d3Post-merge namespace fixRelated Issues
Closes
Related (Partially Addressed / Foundation Laid)
Previously Fixed (Validated by This PR)
Test Plan
broadcast_torejects invalid unilateral broadcasts.basechains to ultimate ownernp.rollworks for all 12 dtypes