merge chunks during attach/detach edits by daesunp · Pull Request #27386 · microsoft/FluidFramework

daesunp · 2026-05-26T17:18:42Z

Description

Follow-up PR to #27153. Adds coalesceAroundSplice, the inverse of splitFieldAtIndex: merges adjacent same-shape UniformChunks along the seams a splice could have created, so repeated mid-field attach/detach operations don't leave the field permanently fragmented.

splitFieldAtIndex (from #27153) lets attach/detach land in the middle of a multi-node UniformChunk by splitting the chunk. Without a corresponding merge, every mid-chunk edit fragments the field — repeated same-shape inserts would produce an ever-growing run of small adjacent chunks. The uniformChunkNodeCountDynamicTargetMax policy field added in #27153 explicitly anticipated this work; this PR provides the merge logic it was waiting for.

github-actions · 2026-05-26T17:19:54Z

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (574 lines, 4 files), I've queued these reviewers:

Correctness — logic errors, race conditions, lifecycle issues
Security — vulnerabilities, secret exposure, injection
API Compatibility — breaking changes, release tags, type design
Performance — algorithmic regressions, memory leaks
Testing — coverage gaps, hollow tests

How this works

Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.
Tick Start review below to dispatch the review fleet.
After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.
This comment updates as new commits land; your reviewer selections are preserved.
Start review

Copilot

Pull request overview

This PR adds a coalescing step to the chunked-forest editing pipeline so that repeated mid-field attach/detach operations don’t leave a field permanently fragmented into many adjacent same-shape UniformChunks. It introduces coalesceAroundSplice as the conceptual inverse of splitFieldAtIndex, merging same-shape UniformChunks along splice seams while respecting the dynamic per-chunk target cap.

Changes:

Added coalesceAroundSplice (and internal tryMergeAt) to merge adjacent same-shape UniformChunks around splice seams, capped by uniformChunkNodeCountDynamicTargetMax.
Integrated coalescing into ChunkedForest attach/detach edit paths immediately after the underlying splice.
Added focused unit tests for coalescing behavior (including caps, shape-equality fallback, refcount behavior, and idCompressor retention) and integration tests verifying coalescing after mid-chunk detach/attach.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts	Implements `coalesceAroundSplice` and merge helper to reduce fragmentation around edit splices.
packages/dds/tree/src/feature-libraries/chunked-forest/chunkedForest.ts	Calls `coalesceAroundSplice` after attach and detach splices to keep fields from fragmenting.
packages/dds/tree/src/test/feature-libraries/chunked-forest/chunkTree.spec.ts	Adds a comprehensive test suite for `coalesceAroundSplice` behavior and edge cases.
packages/dds/tree/src/test/feature-libraries/chunked-forest/chunkedForest.spec.ts	Adds integration tests ensuring coalescing occurs after mid-chunk detach/attach edits.

CraigMacomber · 2026-06-09T21:02:25Z

 }

+/**
+ * Keeps a field's chunks from accumulating same-shape fragments across repeated edits.


Generally, the top level documentation for a function should say what it does. This is instead documenting what its used to accomplish. Additionally, this doesn't prevent there from being same shaped fragments, so it not quite accurate either.

Right now the API and docs are focused around this being used in exactly one use-case, when a TreeChunk[] which holds the contents of a field has undergone a splice.

That is way more specific that this should be, making this code way more fragile than it needs to be. If we refactor or add some logic which needs to work on a TreeChunk[] thats not a field (like an ArrayChunk) or wasn't spliced (maybe was appended to), this code could still be valid if you just change its documentation and parameter names to be more general.

I'd suggest something like:

Suggested change

* Keeps a field's chunks from accumulating same-shape fragments across repeated edits.

* Coalesce adjacent small uniform chunks with the same shape in part of the provided array.

Updated with suggestion

CraigMacomber · 2026-06-09T21:07:50Z

+ * Merges adjacent {@link UniformChunk}s of matching shape along the seams a splice could have
+ * created. Acts as the inverse of {@link splitFieldAtIndex}: without it, repeated mid-field
+ * attach/detach against the same field would leave it permanently fragmented into ever-smaller
+ * adjacent chunks. Cost is proportional to `insertedCount`.


Generally it seems like this function's runtime O(insertedCount) when it is a no-op, and worst case O(field length * insertedCount)

I'm not sure what "cost" you are referring to (memory, time, dollars?): use a more specific term. Assuming that was supposed to be time, I think you missed this can do O(insertedCount) splices which are O(field length) in cost.

Updated and clarified what is being measured

CraigMacomber · 2026-06-09T21:11:19Z

+ * @param insertedCount - The number of chunks the splice inserted (0 for a pure detach).
+ * @param policy - The {@link ChunkPolicy} supplying the per-chunk cap.
+ */
+export function coalesceAroundSplice(


Since there are multiple ways one could coalesce chunks (like put them into ArrayChunks, or merge ArrayChunks) and this seems to only handle UniformChunks, despite taking in a ChunkPolicy which includes details about how to use ArrayChunks, I think it should have a more specific name.

I think it might also make sense for it to make the sub-range which it focuses on optional, and default to processing the whole array. Maybe take in a range?: {start: number, length: number}.

Renamed coalesceAroundSplice to coalesceUniformChunks and updated to take in (chunks, policy, range?: { start, length })

CraigMacomber · 2026-06-09T21:12:30Z

+ * bisect threshold used by {@link splitFieldAtIndex} so split-and-coalesce pairs don't oscillate.
+ *
+ * @param field - The field's chunks array, modified in place.
+ * @param spliceStart - The index passed to the originating `splice` call.


You should clarify this is a chunk index not a node index. Might even be good to call it something like spliceStartChunkIndex

removed spliceStart and updated to your suggestion above

CraigMacomber · 2026-06-09T21:13:07Z

+export function coalesceAroundSplice(
+	field: TreeChunk[],
+	spliceStart: number,
+	insertedCount: number,


it might be nice for the name to clarify this is chunks not nodes (its good the docs are clear this time though)

insertedCount removed after suggested refactor

CraigMacomber · 2026-06-09T21:14:32Z

+ *
+ * @returns `true` if the merge occurred (and `field` was mutated), `false` otherwise.
+ */
+function tryMergeAt(field: TreeChunk[], i: number, policy: ChunkPolicy): boolean {


this function seems like a good target for unit testing, but seeing as its not exported, it can't have any tests.

inner helper renamed to tryCoalesceUniformChunks and added unit tests in chunkTree.spec.ts

CraigMacomber · 2026-06-09T21:15:12Z

+ *
+ * @returns `true` if the merge occurred (and `field` was mutated), `false` otherwise.
+ */
+function tryMergeAt(field: TreeChunk[], i: number, policy: ChunkPolicy): boolean {


Avoid single letter name like "i" and use more descriptive ones made of English words.

updated to chunkIndex

CraigMacomber · 2026-06-09T21:19:20Z

+	// Identity is the fast path (chunkers cache shapes); equals() is the fallback.
+	if (leftTreeShape !== rightTreeShape && !leftTreeShape.equals(rightTreeShape)) {


If we need to optimize equality here, I think it would be better to move the fast path into the implementation of TreeShape.equals (which should probably also move its O(1) check at the bottom to before its fields check).

You can make that optimization to TreeShape.equals in its own PR, and remove the workaround for the lack of that optimization from here.

removed optimization and added TODO

brrichards · 2026-06-09T21:19:24Z

+	// decompress to incorrect values when read.
+	const leftCompressor = left.idCompressor;
+	const rightCompressor = right.idCompressor;
+	if (


can we have the scenario where chunks in the same forest have a different idCompressor? Do all chunks created in the same forest not reference the same idCompressor?

I was about to remove it, but since the function is exported now, I updated it to an assert as a guard :)

CraigMacomber · 2026-06-09T21:35:33Z

+		[...left.values, ...right.values],
+		leftCompressor ?? rightCompressor,
+	);
+	field.splice(i, 2, merged);


I don't think the algorithm you are using here id ideal.

Rather than calling tryMergeAt, and passing in the index, I think it would make more sense to:

make this function into something like "tryCoalesceChunks" which takes in two chunks and the chunk policy, and returns a chunk (replacing the two inputs) or undefined (the two inputs should still be used).

The outer function can then scan over the relevant range, creating a new array of chunks for the impacted range. It can then do a single splice operation before returning, avoiding the worse case of doing many splices. Take care to avoid doing the splice at all if there are no changes.

If you take this approach, it fixes the worst case performance case, and also makes the two function much more usable. For example the new inner function I propose could be used directly when inserting content to try first to merge it with the chunk before it, then the one after it, and only insert to the array if both merges fail. In the case where this does a merge, it prevents the insert from having to splice the array at all, saving the possible O(n) cost of moving everything in the array.

Updated to:

new tryCoalesceUniformChunks(left, right, policy) inner helper and exported

outer coalesceUniformChunks scans the range, builds a result and does splice at end (if there are changes)

CraigMacomber · 2026-06-09T21:46:17Z

+		return false;
+	}
+
+	const merged = new UniformChunk(


When merging chunks, ideally instead of always creating a new chunk, if the first/left chunk only as ref count 1, just append the second chunk's content array to it, and update the chunk shape to one with the new top level length. That way if you try and merge a bunch of chunks together in a row, instead of getting O(n^2) array data allocated and filled for the values arrays, you only get O(n) as you grow a single chunk.

Note: because of this issue, I think coalesceAroundSpliceis actually has worst case runtime of O(Max(insertedCount^2, insertedCount * fields.length))

Splitting out a dedicated coalesceUniformChunks which takes in two uniform chunks, asserts the shapes are the same, then does this merging approach could be good, so it can have its own targeted tests about getting the refcounts and top level lengths correct.

Updated so when !left.isShared(), tryCoalesceUniformChunks extends left.values in place, updates left.shape, releases right, and returns left. Otherwise it falls back to allocating a new chunk.

CraigMacomber · 2026-06-09T21:48:00Z

+
+		/**
+		 * Manually seeds the root field with the requested sequence of single-shape UniformChunks.
+		 * Sidesteps the default chunker, which would produce one UniformChunk per top-level node,


I don't think this is accurate anymore (@brrichards fixed it I believe)

Yeah the default chunker now produces multi-node UniformChunks

Updated stale comment

brrichards · 2026-06-09T22:09:28Z

+			assertChunksUnchanged(field, snapshot);
+		});
+
+		it("keeps chunk count bounded under repeated mid-field edits", () => {


nit: This test looks like it rewrites the splitFieldAtIndex logic which using might simplify/help with readability, but the test itself looks good!

Updated with suggestion

Josmithr · 2026-06-17T16:55:11Z

+ * Caps merged chunks at {@link ChunkPolicy.uniformChunkNodeCountDynamicTargetMax}, matching the
+ * bisect threshold used by {@link splitFieldAtIndex} so split-and-coalesce pairs don't oscillate.


I'm not sure I understand this. Could we expand on this a bit? Also, does this need to be in a @privateRemarks block? Those are usually reserved for docs we don't want end-users to see, and this note seems relevant.

moved to @remarks and elaborated for clarification

Josmithr · 2026-06-17T18:10:25Z

+	// TODO: TreeShape.equals could short-circuit on identity for the common
+	// (same-chunker) case; until then this is a full structural check on every merge.


Should we do this in this PR?

It was originally part of this PR, but Craig suggested that we split it up into two

Do we have a task tracking this next step? Let's make sure we have something tracking this.

Added task :)

Josmithr · 2026-06-17T18:12:11Z

+	if (combinedTopLevel > policy.uniformChunkNodeCountDynamicTargetMax) {
+		return undefined;
+	}


Nit: A quick comment explaining this might be useful.

Added explanation

Josmithr · 2026-06-17T18:13:23Z

+		// and returned; `right`'s slot ref is released.
+		left.values.push(...right.values);
+		left.shape = leftTreeShape.withTopLevelLength(combinedTopLevel);
+		if (leftCompressor === undefined && rightCompressor !== undefined) {


We assert above that leftCompressor cannot be undefined. Do we need this?

The above assert checks if either right or left compressor is undefined, so this was trying to set it to the rightCompressor if it is undefined (similarly to leftCompressor ?? rightCompressor used below it). But the if statement seemed unnecessary and updated it to left.idCompressor ??= rightCompressor

Josmithr

Left some questions and suggestions.

github-actions · 2026-06-17T18:34:51Z

Bundle size comparison

Base commit: e121ff71f3ebed80c656315486933fe2d6859b32
Head commit: aedde4fdeb06c3561886b99434d8bc0a3e508e9e

Notable changes

No bundles changed by ≥ 500 bytes parsed.

Per-bundle deltas

`@fluid-example/bundle-size-tests`

azureClient.js: parsed 618613 → 618669 (+56), gzip 164734 → 164782 (+48)
odspClient.js: parsed 591845 → 591901 (+56), gzip 158885 → 158927 (+42)
aqueduct.js: parsed 525463 → 525498 (+35), gzip 140683 → 140714 (+31)
fluidFramework.js: parsed 392014 → 392035 (+21), gzip 111104 → 111122 (+18)
sharedTree.js: parsed 381401 → 381415 (+14), gzip 108501 → 108513 (+12)
containerRuntime.js: parsed 303813 → 303827 (+14), gzip 83188 → 83196 (+8)
sharedString.js: parsed 175984 → 175991 (+7), gzip 49445 → 49453 (+8)
experimentalSharedTree.js: parsed 160798 → 160798 (0), gzip 45804 → 45804 (0)
matrix.js: parsed 159845 → 159852 (+7), gzip 45411 → 45418 (+7)
loader.js: parsed 145307 → 145321 (+14), gzip 39063 → 39078 (+15)
odspDriver.js: parsed 104423 → 104444 (+21), gzip 32644 → 32651 (+7)
directory.js: parsed 66616 → 66623 (+7), gzip 18532 → 18540 (+8)
748.js: parsed 58793 → 58793 (0), gzip 17826 → 17826 (0)
map.js: parsed 46709 → 46716 (+7), gzip 14310 → 14317 (+7)
odspPrefetchSnapshot.js: parsed 45642 → 45656 (+14), gzip 15268 → 15277 (+9)
594.js: parsed 44493 → 44493 (0), gzip 13744 → 13744 (0)
summarizerDelayLoadedModule.js: parsed 30753 → 30753 (0), gzip 7767 → 7767 (0)
socketModule.js: parsed 26486 → 26493 (+7), gzip 7879 → 7887 (+8)
createNewModule.js: parsed 12480 → 12480 (0), gzip 4786 → 4786 (0)
summaryModule.js: parsed 3797 → 3797 (0), gzip 1860 → 1860 (0)
connectionState.js: parsed 724 → 724 (0), gzip 429 → 429 (0)
sharedTreeAttributes.js: parsed 666 → 673 (+7), gzip 431 → 441 (+10)
debugAssert.js: parsed 429 → 429 (0), gzip 299 → 299 (0)
FluidFramework-HashFallback.js: parsed 422 → 422 (0), gzip 316 → 316 (0)

merge chunks

8002a81

Copilot AI review requested due to automatic review settings May 26, 2026 17:18

daesunp requested a review from a team as a code owner May 26, 2026 17:18

Copilot started reviewing on behalf of daesunp May 26, 2026 17:18 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts

daesunp added 4 commits May 28, 2026 16:12

Merge branch 'main' into merge-attach-detach-edit-chunks

739cefa

Update docs

9cbae6e

function name change

6d2b8e2

PR review

2a52fae

CraigMacomber reviewed Jun 9, 2026

View reviewed changes

brrichards reviewed Jun 9, 2026

View reviewed changes

CraigMacomber reviewed Jun 9, 2026

View reviewed changes

brrichards reviewed Jun 9, 2026

View reviewed changes

daesunp added 6 commits June 11, 2026 15:59

Merge branch 'main' into merge-attach-detach-edit-chunks

28606c8

PR review

0df567d

update to assert

4228364

simplify test

6b8e497

lint fix

1cd0eb7

lint fix

a37c426