merge chunks during attach/detach edits#27386
Conversation
|
Hi! Thank you for opening this PR. Want me to review it? Based on the diff (574 lines, 4 files), I've queued these reviewers:
How this works
|
There was a problem hiding this comment.
Pull request overview
This PR adds a coalescing step to the chunked-forest editing pipeline so that repeated mid-field attach/detach operations don’t leave a field permanently fragmented into many adjacent same-shape UniformChunks. It introduces coalesceAroundSplice as the conceptual inverse of splitFieldAtIndex, merging same-shape UniformChunks along splice seams while respecting the dynamic per-chunk target cap.
Changes:
- Added
coalesceAroundSplice(and internaltryMergeAt) to merge adjacent same-shapeUniformChunks around splice seams, capped byuniformChunkNodeCountDynamicTargetMax. - Integrated coalescing into
ChunkedForestattach/detach edit paths immediately after the underlyingsplice. - Added focused unit tests for coalescing behavior (including caps, shape-equality fallback, refcount behavior, and idCompressor retention) and integration tests verifying coalescing after mid-chunk detach/attach.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts | Implements coalesceAroundSplice and merge helper to reduce fragmentation around edit splices. |
| packages/dds/tree/src/feature-libraries/chunked-forest/chunkedForest.ts | Calls coalesceAroundSplice after attach and detach splices to keep fields from fragmenting. |
| packages/dds/tree/src/test/feature-libraries/chunked-forest/chunkTree.spec.ts | Adds a comprehensive test suite for coalesceAroundSplice behavior and edge cases. |
| packages/dds/tree/src/test/feature-libraries/chunked-forest/chunkedForest.spec.ts | Adds integration tests ensuring coalescing occurs after mid-chunk detach/attach edits. |
| } | ||
|
|
||
| /** | ||
| * Keeps a field's chunks from accumulating same-shape fragments across repeated edits. |
There was a problem hiding this comment.
Generally, the top level documentation for a function should say what it does. This is instead documenting what its used to accomplish. Additionally, this doesn't prevent there from being same shaped fragments, so it not quite accurate either.
Right now the API and docs are focused around this being used in exactly one use-case, when a TreeChunk[] which holds the contents of a field has undergone a splice.
That is way more specific that this should be, making this code way more fragile than it needs to be. If we refactor or add some logic which needs to work on a TreeChunk[] thats not a field (like an ArrayChunk) or wasn't spliced (maybe was appended to), this code could still be valid if you just change its documentation and parameter names to be more general.
I'd suggest something like:
| * Keeps a field's chunks from accumulating same-shape fragments across repeated edits. | |
| * Coalesce adjacent small uniform chunks with the same shape in part of the provided array. |
There was a problem hiding this comment.
Updated with suggestion
| * Merges adjacent {@link UniformChunk}s of matching shape along the seams a splice could have | ||
| * created. Acts as the inverse of {@link splitFieldAtIndex}: without it, repeated mid-field | ||
| * attach/detach against the same field would leave it permanently fragmented into ever-smaller | ||
| * adjacent chunks. Cost is proportional to `insertedCount`. |
There was a problem hiding this comment.
Generally it seems like this function's runtime O(insertedCount) when it is a no-op, and worst case O(field length * insertedCount)
I'm not sure what "cost" you are referring to (memory, time, dollars?): use a more specific term. Assuming that was supposed to be time, I think you missed this can do O(insertedCount) splices which are O(field length) in cost.
There was a problem hiding this comment.
Updated and clarified what is being measured
| * @param insertedCount - The number of chunks the splice inserted (0 for a pure detach). | ||
| * @param policy - The {@link ChunkPolicy} supplying the per-chunk cap. | ||
| */ | ||
| export function coalesceAroundSplice( |
There was a problem hiding this comment.
Since there are multiple ways one could coalesce chunks (like put them into ArrayChunks, or merge ArrayChunks) and this seems to only handle UniformChunks, despite taking in a ChunkPolicy which includes details about how to use ArrayChunks, I think it should have a more specific name.
I think it might also make sense for it to make the sub-range which it focuses on optional, and default to processing the whole array. Maybe take in a range?: {start: number, length: number}.
There was a problem hiding this comment.
Renamed coalesceAroundSplice to coalesceUniformChunks and updated to take in (chunks, policy, range?: { start, length })
| * bisect threshold used by {@link splitFieldAtIndex} so split-and-coalesce pairs don't oscillate. | ||
| * | ||
| * @param field - The field's chunks array, modified in place. | ||
| * @param spliceStart - The index passed to the originating `splice` call. |
There was a problem hiding this comment.
You should clarify this is a chunk index not a node index. Might even be good to call it something like spliceStartChunkIndex
There was a problem hiding this comment.
removed spliceStart and updated to your suggestion above
| export function coalesceAroundSplice( | ||
| field: TreeChunk[], | ||
| spliceStart: number, | ||
| insertedCount: number, |
There was a problem hiding this comment.
it might be nice for the name to clarify this is chunks not nodes (its good the docs are clear this time though)
There was a problem hiding this comment.
insertedCount removed after suggested refactor
| * | ||
| * @returns `true` if the merge occurred (and `field` was mutated), `false` otherwise. | ||
| */ | ||
| function tryMergeAt(field: TreeChunk[], i: number, policy: ChunkPolicy): boolean { |
There was a problem hiding this comment.
this function seems like a good target for unit testing, but seeing as its not exported, it can't have any tests.
There was a problem hiding this comment.
inner helper renamed to tryCoalesceUniformChunks and added unit tests in chunkTree.spec.ts
| * | ||
| * @returns `true` if the merge occurred (and `field` was mutated), `false` otherwise. | ||
| */ | ||
| function tryMergeAt(field: TreeChunk[], i: number, policy: ChunkPolicy): boolean { |
There was a problem hiding this comment.
Avoid single letter name like "i" and use more descriptive ones made of English words.
There was a problem hiding this comment.
updated to chunkIndex
| // Identity is the fast path (chunkers cache shapes); equals() is the fallback. | ||
| if (leftTreeShape !== rightTreeShape && !leftTreeShape.equals(rightTreeShape)) { |
There was a problem hiding this comment.
If we need to optimize equality here, I think it would be better to move the fast path into the implementation of TreeShape.equals (which should probably also move its O(1) check at the bottom to before its fields check).
You can make that optimization to TreeShape.equals in its own PR, and remove the workaround for the lack of that optimization from here.
There was a problem hiding this comment.
removed optimization and added TODO
| // decompress to incorrect values when read. | ||
| const leftCompressor = left.idCompressor; | ||
| const rightCompressor = right.idCompressor; | ||
| if ( |
There was a problem hiding this comment.
can we have the scenario where chunks in the same forest have a different idCompressor? Do all chunks created in the same forest not reference the same idCompressor?
There was a problem hiding this comment.
I was about to remove it, but since the function is exported now, I updated it to an assert as a guard :)
| [...left.values, ...right.values], | ||
| leftCompressor ?? rightCompressor, | ||
| ); | ||
| field.splice(i, 2, merged); |
There was a problem hiding this comment.
I don't think the algorithm you are using here id ideal.
Rather than calling tryMergeAt, and passing in the index, I think it would make more sense to:
- make this function into something like "tryCoalesceChunks" which takes in two chunks and the chunk policy, and returns a chunk (replacing the two inputs) or undefined (the two inputs should still be used).
- The outer function can then scan over the relevant range, creating a new array of chunks for the impacted range. It can then do a single splice operation before returning, avoiding the worse case of doing many splices. Take care to avoid doing the splice at all if there are no changes.
If you take this approach, it fixes the worst case performance case, and also makes the two function much more usable. For example the new inner function I propose could be used directly when inserting content to try first to merge it with the chunk before it, then the one after it, and only insert to the array if both merges fail. In the case where this does a merge, it prevents the insert from having to splice the array at all, saving the possible O(n) cost of moving everything in the array.
There was a problem hiding this comment.
Updated to:
- new
tryCoalesceUniformChunks(left, right, policy)inner helper and exported - outer
coalesceUniformChunksscans the range, builds a result and does splice at end (if there are changes)
| return false; | ||
| } | ||
|
|
||
| const merged = new UniformChunk( |
There was a problem hiding this comment.
When merging chunks, ideally instead of always creating a new chunk, if the first/left chunk only as ref count 1, just append the second chunk's content array to it, and update the chunk shape to one with the new top level length. That way if you try and merge a bunch of chunks together in a row, instead of getting O(n^2) array data allocated and filled for the values arrays, you only get O(n) as you grow a single chunk.
Note: because of this issue, I think coalesceAroundSpliceis actually has worst case runtime of O(Max(insertedCount^2, insertedCount * fields.length))
Splitting out a dedicated coalesceUniformChunks which takes in two uniform chunks, asserts the shapes are the same, then does this merging approach could be good, so it can have its own targeted tests about getting the refcounts and top level lengths correct.
There was a problem hiding this comment.
Updated so when !left.isShared(), tryCoalesceUniformChunks extends left.values in place, updates left.shape, releases right, and returns left. Otherwise it falls back to allocating a new chunk.
|
|
||
| /** | ||
| * Manually seeds the root field with the requested sequence of single-shape UniformChunks. | ||
| * Sidesteps the default chunker, which would produce one UniformChunk per top-level node, |
There was a problem hiding this comment.
I don't think this is accurate anymore (@brrichards fixed it I believe)
There was a problem hiding this comment.
Yeah the default chunker now produces multi-node UniformChunks
There was a problem hiding this comment.
Updated stale comment
| assertChunksUnchanged(field, snapshot); | ||
| }); | ||
|
|
||
| it("keeps chunk count bounded under repeated mid-field edits", () => { |
There was a problem hiding this comment.
nit: This test looks like it rewrites the splitFieldAtIndex logic which using might simplify/help with readability, but the test itself looks good!
There was a problem hiding this comment.
Updated with suggestion
| * Caps merged chunks at {@link ChunkPolicy.uniformChunkNodeCountDynamicTargetMax}, matching the | ||
| * bisect threshold used by {@link splitFieldAtIndex} so split-and-coalesce pairs don't oscillate. |
There was a problem hiding this comment.
I'm not sure I understand this. Could we expand on this a bit? Also, does this need to be in a @privateRemarks block? Those are usually reserved for docs we don't want end-users to see, and this note seems relevant.
There was a problem hiding this comment.
moved to @remarks and elaborated for clarification
| // TODO: TreeShape.equals could short-circuit on identity for the common | ||
| // (same-chunker) case; until then this is a full structural check on every merge. |
There was a problem hiding this comment.
Should we do this in this PR?
There was a problem hiding this comment.
It was originally part of this PR, but Craig suggested that we split it up into two
There was a problem hiding this comment.
Do we have a task tracking this next step? Let's make sure we have something tracking this.
| if (combinedTopLevel > policy.uniformChunkNodeCountDynamicTargetMax) { | ||
| return undefined; | ||
| } |
There was a problem hiding this comment.
Nit: A quick comment explaining this might be useful.
| // and returned; `right`'s slot ref is released. | ||
| left.values.push(...right.values); | ||
| left.shape = leftTreeShape.withTopLevelLength(combinedTopLevel); | ||
| if (leftCompressor === undefined && rightCompressor !== undefined) { |
There was a problem hiding this comment.
We assert above that leftCompressor cannot be undefined. Do we need this?
There was a problem hiding this comment.
The above assert checks if either right or left compressor is undefined, so this was trying to set it to the rightCompressor if it is undefined (similarly to leftCompressor ?? rightCompressor used below it). But the if statement seemed unnecessary and updated it to left.idCompressor ??= rightCompressor
Josmithr
left a comment
There was a problem hiding this comment.
Left some questions and suggestions.
Bundle size comparisonBase commit: Notable changesNo bundles changed by ≥ 500 bytes parsed. Per-bundle deltas
|
Description
Follow-up PR to #27153. Adds
coalesceAroundSplice, the inverse ofsplitFieldAtIndex: merges adjacent same-shapeUniformChunks along the seams a splice could have created, so repeated mid-field attach/detach operations don't leave the field permanently fragmented.splitFieldAtIndex(from #27153) lets attach/detach land in the middle of a multi-nodeUniformChunkby splitting the chunk. Without a corresponding merge, every mid-chunk edit fragments the field — repeated same-shape inserts would produce an ever-growing run of small adjacent chunks. TheuniformChunkNodeCountDynamicTargetMaxpolicy field added in #27153 explicitly anticipated this work; this PR provides the merge logic it was waiting for.