Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request finalizes the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Test Results
Benchmark Results
Run Status
Memory Efficiency (Linux, PHP 8.5)
API Speedups (Linux, PHP 8.5)Speedup = PHP time / Judy time. Bold = Judy faster.
Core Types (Linux, PHP 8.5)Speedup = PHP time / Judy time. Bold = Judy faster.
All types detailCore Types (Linux, PHP 8.5) — AllSpeedup = PHP time / Judy time. Bold = Judy faster.
Cross-version detail (Linux)Core Types — LinuxEach cell: Judy ms (speedup vs PHP array).
API — Linux
Cross-version detail (Windows)Core Types — WindowsEach cell: Judy ms (speedup vs PHP array).
API — Windows
Windows results (PHP 8.5)Memory Efficiency (Windows, PHP 8.5)
API Speedups (Windows, PHP 8.5)Speedup = PHP time / Judy time. Bold = Judy faster.
Core Types (Windows, PHP 8.5)Speedup = PHP time / Judy time. Bold = Judy faster.
All types detailCore Types (Windows, PHP 8.5) — AllSpeedup = PHP time / Judy time. Bold = Judy faster.
Release Comparison (2.3.0 → 2.4.0)
Summary: 6 faster,
Memory ComparisonMemory:
50 new benchmarks (not in baseline)
Full benchmark outputLinux — PHP 8.5Windows — PHP 8.5 |
There was a problem hiding this comment.
Code Review
This pull request bumps the version to 2.4.0 in package.xml, php_judy.h, and updates the corresponding test expectations. The release notes in package.xml are also updated to reflect the features and improvements for version 2.4.0. The changes are correct and consistent. While reviewing package.xml, I noticed that many <file> entries are missing the md5sum attribute. Although this is outside the direct changes in this PR, it would be beneficial to add these checksums in a follow-up to ensure package integrity for PECL.
f79f616 to
f3337d4
Compare
|
Rebased onto current
Gemini's |
Replace 6 separate benchmark script invocations (run-benchmarks, batch-operations, set-operations, all-types, phase2-api, phase3-advanced) with a single unified runner (judy-bench.php) that outputs structured JSON alongside human-readable tables. Add judy-bench-compare.php for cross-version regression detection with configurable threshold and CI-gatable exit codes. Update CI workflow (Linux + Windows) to use the unified runner and rewrite the report-results Python script to consume bench.json instead of parsing text files with fragile regexes.
- Fix fromArray() bug: record .php counterpart using putAll baseline, fix stale $t_php display in stdout - Rewrite CI report with consistent speedup direction everywhere (PHP time / Judy time; >1x = Judy faster; bold when Judy wins) - Show absolute ms values alongside speedup factors instead of bare ratios - Separate memory efficiency from speed tables - Use human-readable type labels (e.g. "String to Int (Hash)") - Progressive disclosure: Linux latest PHP expanded, cross-version and Windows details in collapsible sections - Add Run Status grid for at-a-glance validation across all platform/version combinations
Add a "Release Comparison" section that diffs the current run against a committed baseline (baselines/latest.json), showing FASTER/SLOWER/ ~same/NEW status for each benchmark with delta percentages. - Store initial v2.4.0-dev baseline (174 entries, Linux PHP 8.1) - Add actions/checkout with sparse-checkout to report-results job so the Python script can read the baseline file - Compare only .judy entries (Judy's own perf across releases), skip heap-only entries, use ±5% threshold - Show summary counts, regressions with warning icons, and new benchmarks in a collapsible section
- Replace v2.4.0-dev baseline with proper PECL-installed v2.3.0 release - Increase benchmark iterations from 3 to 7 for more stable medians - Raise cross-release comparison threshold from ±5% to ±10% - Use defined() checks for type detection instead of try/catch - Add $has_hash guard so hash type benchmarks are skipped on older versions - Note PECL baseline source in the release comparison report header
- Install released v2.3.0 via PIE on the PHP 8.4 Linux runner - Run baseline benchmarks (core suite) on same hardware as PR benchmarks - Report prefers CI-generated baseline; falls back to committed file - Header indicates "same CI runner" or warns about cross-architecture
- Skip sub-2ms benchmarks in comparison (e.g., 0.2ms→0.7ms = +247% noise) - Run baseline benchmarks before PR benchmarks for fairer ordering - Note minimum threshold in report header
Compare heap_bytes between baseline (v2.3.0) and current for all core types, using a ±1% threshold. Memory is deterministic so any meaningful change is a real regression or improvement.
Memory & Performance Analysis (Updated post-lazy
|
| Type | v2.3.0 | v2.4.0 | Delta | Explanation |
|---|---|---|---|---|
| Bitset | 128 B | 160 B | +32 B | Struct fields only (no key_scratch) |
| Int‑to‑Int | 128 B | 160 B | +32 B | Struct fields only (no key_scratch) |
| Int‑to‑Mixed | 7.6 MB | 7.6 MB | ~same | key_scratch dwarfed by per-element zval heap |
| String‑to‑Int | 128 B | 64.2 KB | +64 KB | key_scratch heap buffer — expected |
| String‑to‑Mixed | 7.6 MB | 7.7 MB | ~same | key_scratch dwarfed by per-element zval heap |
Why 32 bytes more for integer-keyed types?
v2.4.0 added new struct fields to judy_object for type-flag dispatch and adaptive type support:
hs_arraypointer (8 B) — JudyHS backing store for adaptive typeskey_scratchpointer (8 B) — set toNULLfor integer types (lazy allocation)- 6 boolean flags:
is_integer_keyed,is_string_keyed,is_mixed_value,is_packed_value,is_hash_keyed,is_adaptive(6 B + alignment)
These flags eliminate repeated switch/if chains on the hot path, trading 32 bytes of struct overhead for faster type dispatch.
Why key_scratch (64 KB) for string types?
v2.3.0 stack-allocated uint8_t key[65536] inside individual functions. v2.4.0 moves this to a single heap-allocated buffer per object:
| v2.3.0 | v2.4.0 | |
|---|---|---|
| Buffer location | Stack (64 KB per call frame) | Heap (64 KB per object, lazy) |
Visible to memory_get_usage()? |
No | Yes |
| Fiber-safe? | No (overflows 8 KB default Fiber stack) | Yes |
The actual memory footprint is comparable — it simply moved from the unmeasured C stack to the PHP-tracked heap. Integer-keyed types skip allocation entirely (intern->key_scratch = NULL).
Performance: iteration 62–80% faster, one write regression
Wins — the C-level php_judy_iterator rewrite delivers major iteration speedups:
| Benchmark | v2.3.0 | v2.4.0 | Delta |
|---|---|---|---|
| bitset.iter | 95.2 ms | 19.2 ms | −80% |
| int_to_int.iter | 96.5 ms | 20.5 ms | −79% |
| int_to_mixed.iter | 96.1 ms | 21.3 ms | −78% |
| string_to_int.iter | 182.1 ms | 63.9 ms | −65% |
| string_to_mixed.iter | 195.5 ms | 74.0 ms | −62% |
| int_to_mixed.free | 16.3 ms | 14.5 ms | −11% |
Regression — int_to_int.write: 24.2 ms → 29.2 ms (+20.6%). This is at the margin of CI noise (the ±10% threshold catches it). The write path for INT_TO_INT is a single JLI call with no code changes in v2.4.0 — the added type-flag conditionals are in the string-keyed branches and don't execute for integer keys. Likely runner load variance; worth monitoring across runs but not a code-level regression.
New API benchmarks (v2.4.0 features)
Highlights from the 50 new benchmarks not present in v2.3.0:
| Operation | Judy | PHP equiv | Speedup |
|---|---|---|---|
populationCount() |
0.00 ms | 20.0 ms | ~200,000× (O(1) maintained counter) |
equals() |
13.9 ms | 52.2 ms | 3.8× |
sumValues() |
7.5 ms | 19.2 ms | 2.6× |
keys() |
11.5 ms | 25.0 ms | 2.2× |
values() |
12.1 ms | 25.1 ms | 2.1× |
getAll() |
1.5 ms | 2.4 ms | 1.7× |
toArray() |
24.8 ms | 38.7 ms | 1.6× |
mergeWith() Int→Int |
68.1 ms | 76.0 ms | 1.1× |
mergeWith() Str→Int |
291.7 ms | 282.1 ms | ~1.0× |
mergeWith() is roughly on par with the PHP foreach equivalent for both key types, which validates its value as an in-place alternative to union() (which allocates a new array).
Bug fixes for adaptive types (STRING_TO_*_ADAPTIVE): - Clone handler: clone all 3 backing stores (JudyL, JudyHS, JudySL) - has_dimension: NULL guard allows long-key-only isset() lookups - unset_dimension: same NULL guard fix for long-key-only unset - has_dimension check_empty: add adaptive types so empty() works - write_dimension: add adaptive types to append-error guard - slice(): add adaptive type branch dispatching SSO vs long keys New feature: - mergeWith(Judy $other): in-place merge registered as public API Optimizations: - Lazy key_scratch: allocate 64KB buffer only for string-keyed types - Lazy iterator key_scratch: same conditional allocation - Replace 6 stack-allocated 64KB buffers with intern->key_scratch
Benchmarks in-place mergeWith() vs manual foreach loop in the API set operations section. Also adds entries to CI report api_ops so results appear in the PR comment.
judy_init_type_flags() was static in php_judy.c but called from judy_handlers.c (separate translation unit), causing undefined symbol at runtime on Linux and LNK2019 on Windows. Move to php_judy.h as static inline. Also update test expectations for the append error message that now includes adaptive types, and fix missing echo in adaptive slice test.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request bumps the project version to 2.4.0, incorporating new features, performance improvements, and bug fixes, including the mergeWith method and new benchmark scripts. However, a security audit identified critical issues related to inconsistent object state during cloning, which can lead to counter corruption and process crashes due to mishandling of allocation failures, as highlighted by an existing repository rule. Additionally, there's a potential path traversal vulnerability in the benchmark scripts due to a lack of sanitization for file paths provided via command-line arguments. Minor issues were also noted, such as incorrect dates (year 2026) in metadata files (baselines/latest.json and package.xml), which appear to be typos.
Summary
PHP_JUDY_VERSIONfrom2.3.0to2.4.0inphp_judy.hpackage.xmlversion, date (2026-03-02), and release notes for 2.4.0tests/001.phptAll features merged into main for the 2.4.0 release:
zend_object_handlers(read/write/has/unset_dimension) + native C iterators viaget_iteratorunion(),intersect(),diff(),xor()slice($start, $end)for all array typesJsonSerializableinterface +__serialize()/__unserialize()fromArray(),toArray(),putAll(),getAll(),increment()memoryUsage()documentation and memory patterns benchmarkmemoryUsage()coverage for all Judy typesINT_TO_PACKEDtype (type 6) -- GC-free opaque value storage viaphp_var_serializecJU_MASKATSTATE 0xffLtruncation at States 5-8STRING_TO_MIXED_HASHtype (type 7) -- JudyHS-backed O(1) string→mixed mapSTRING_TO_INT_HASHtype (type 8) -- JudyHS-backed O(1) string→int mapunion/intersect/diff/xorwith left-wins value semantics.stub.phparginfo generation replacing hand-written C arginfoecalloc→emallocon 9 zval allocs; directjudy_free_array_internalin__destructcount()for all types -- per-insert/delete counter replaces O(n) J1C/JLC scansvalidflag, single heap key buffer,zend_stringreusefromArray()/putAll()/__unserialize()JUDY_LIKELY/JUDY_UNLIKELY) on hot lookup and validation pathskey_scratchbuffer (Fiber-safe)keys(),values()-- native C extraction, 2-3x faster than PHPforeachsumValues(),averageValues()-- C-level aggregation for integer-valued typespopulationCount($start, $end)-- range counting via Judy's internal population cachedeleteRange($start, $end)-- bulk deletion in a single C passequals(Judy $other)-- short-circuit identity comparisonforEach(),filter(),map()-- bypass PHP Iterator protocol for bulk traversalmergeWith()-- in-place mutating mergeunion/intersect/diff/xorforSTRING_TO_INTandSTRING_TO_MIXEDtypesSTRING_TO_MIXED_ADAPTIVEtype (type 9) -- SSO packs short strings (<8 bytes) into JudyL for ultra-fast O(1) lookup; longer strings fall back to JudyHSSTRING_TO_INT_ADAPTIVEtype (type 10) -- same SSO strategy for string→int workloadsJLG/JHSGbeforeJLI/JHSI(4 paths: SSO + non-SSO × INT + MIXED)hs_arrayNULL-check injudy_free_array_internal-- prevents memory leak in adaptive typesChanges directly in this PR:
mergeWith()wiring, lazykey_scratchallocation (NULL for integer-keyed types, saves 64KB per instance)judy_init_type_flags()to header for cross-translation-unit visibilityjudy-bench.php) with structured JSON output replacing 6 separate scriptsjudy-bench-compare.php) with configurable regression thresholdbaselines/latest.jsonmergeWith()benchmarks for INT_TO_INT and STRING_TO_INTTest plan