Skip to content

merge chunks during attach/detach edits#27386

Open
daesunp wants to merge 16 commits into
microsoft:mainfrom
daesunp:merge-attach-detach-edit-chunks
Open

merge chunks during attach/detach edits#27386
daesunp wants to merge 16 commits into
microsoft:mainfrom
daesunp:merge-attach-detach-edit-chunks

Conversation

@daesunp

@daesunp daesunp commented May 26, 2026

Copy link
Copy Markdown
Contributor

Description

Follow-up PR to #27153. Adds coalesceAroundSplice, the inverse of splitFieldAtIndex: merges adjacent same-shape UniformChunks along the seams a splice could have created, so repeated mid-field attach/detach operations don't leave the field permanently fragmented.

splitFieldAtIndex (from #27153) lets attach/detach land in the middle of a multi-node UniformChunk by splitting the chunk. Without a corresponding merge, every mid-chunk edit fragments the field — repeated same-shape inserts would produce an ever-growing run of small adjacent chunks. The uniformChunkNodeCountDynamicTargetMax policy field added in #27153 explicitly anticipated this work; this PR provides the merge logic it was waiting for.

Copilot AI review requested due to automatic review settings May 26, 2026 17:18
@daesunp daesunp requested a review from a team as a code owner May 26, 2026 17:18
@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (574 lines, 4 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a coalescing step to the chunked-forest editing pipeline so that repeated mid-field attach/detach operations don’t leave a field permanently fragmented into many adjacent same-shape UniformChunks. It introduces coalesceAroundSplice as the conceptual inverse of splitFieldAtIndex, merging same-shape UniformChunks along splice seams while respecting the dynamic per-chunk target cap.

Changes:

  • Added coalesceAroundSplice (and internal tryMergeAt) to merge adjacent same-shape UniformChunks around splice seams, capped by uniformChunkNodeCountDynamicTargetMax.
  • Integrated coalescing into ChunkedForest attach/detach edit paths immediately after the underlying splice.
  • Added focused unit tests for coalescing behavior (including caps, shape-equality fallback, refcount behavior, and idCompressor retention) and integration tests verifying coalescing after mid-chunk detach/attach.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Implements coalesceAroundSplice and merge helper to reduce fragmentation around edit splices.
packages/dds/tree/src/feature-libraries/chunked-forest/chunkedForest.ts Calls coalesceAroundSplice after attach and detach splices to keep fields from fragmenting.
packages/dds/tree/src/test/feature-libraries/chunked-forest/chunkTree.spec.ts Adds a comprehensive test suite for coalesceAroundSplice behavior and edge cases.
packages/dds/tree/src/test/feature-libraries/chunked-forest/chunkedForest.spec.ts Adds integration tests ensuring coalescing occurs after mid-chunk detach/attach edits.

Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts
}

/**
* Keeps a field's chunks from accumulating same-shape fragments across repeated edits.

@CraigMacomber CraigMacomber Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the top level documentation for a function should say what it does. This is instead documenting what its used to accomplish. Additionally, this doesn't prevent there from being same shaped fragments, so it not quite accurate either.

Right now the API and docs are focused around this being used in exactly one use-case, when a TreeChunk[] which holds the contents of a field has undergone a splice.

That is way more specific that this should be, making this code way more fragile than it needs to be. If we refactor or add some logic which needs to work on a TreeChunk[] thats not a field (like an ArrayChunk) or wasn't spliced (maybe was appended to), this code could still be valid if you just change its documentation and parameter names to be more general.

I'd suggest something like:

Suggested change
* Keeps a field's chunks from accumulating same-shape fragments across repeated edits.
* Coalesce adjacent small uniform chunks with the same shape in part of the provided array.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with suggestion

* Merges adjacent {@link UniformChunk}s of matching shape along the seams a splice could have
* created. Acts as the inverse of {@link splitFieldAtIndex}: without it, repeated mid-field
* attach/detach against the same field would leave it permanently fragmented into ever-smaller
* adjacent chunks. Cost is proportional to `insertedCount`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it seems like this function's runtime O(insertedCount) when it is a no-op, and worst case O(field length * insertedCount)

I'm not sure what "cost" you are referring to (memory, time, dollars?): use a more specific term. Assuming that was supposed to be time, I think you missed this can do O(insertedCount) splices which are O(field length) in cost.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and clarified what is being measured

* @param insertedCount - The number of chunks the splice inserted (0 for a pure detach).
* @param policy - The {@link ChunkPolicy} supplying the per-chunk cap.
*/
export function coalesceAroundSplice(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are multiple ways one could coalesce chunks (like put them into ArrayChunks, or merge ArrayChunks) and this seems to only handle UniformChunks, despite taking in a ChunkPolicy which includes details about how to use ArrayChunks, I think it should have a more specific name.

I think it might also make sense for it to make the sub-range which it focuses on optional, and default to processing the whole array. Maybe take in a range?: {start: number, length: number}.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed coalesceAroundSplice to coalesceUniformChunks and updated to take in (chunks, policy, range?: { start, length })

* bisect threshold used by {@link splitFieldAtIndex} so split-and-coalesce pairs don't oscillate.
*
* @param field - The field's chunks array, modified in place.
* @param spliceStart - The index passed to the originating `splice` call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should clarify this is a chunk index not a node index. Might even be good to call it something like spliceStartChunkIndex

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed spliceStart and updated to your suggestion above

export function coalesceAroundSplice(
field: TreeChunk[],
spliceStart: number,
insertedCount: number,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be nice for the name to clarify this is chunks not nodes (its good the docs are clear this time though)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insertedCount removed after suggested refactor

*
* @returns `true` if the merge occurred (and `field` was mutated), `false` otherwise.
*/
function tryMergeAt(field: TreeChunk[], i: number, policy: ChunkPolicy): boolean {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function seems like a good target for unit testing, but seeing as its not exported, it can't have any tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inner helper renamed to tryCoalesceUniformChunks and added unit tests in chunkTree.spec.ts

*
* @returns `true` if the merge occurred (and `field` was mutated), `false` otherwise.
*/
function tryMergeAt(field: TreeChunk[], i: number, policy: ChunkPolicy): boolean {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid single letter name like "i" and use more descriptive ones made of English words.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to chunkIndex

Comment on lines +701 to +702
// Identity is the fast path (chunkers cache shapes); equals() is the fallback.
if (leftTreeShape !== rightTreeShape && !leftTreeShape.equals(rightTreeShape)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to optimize equality here, I think it would be better to move the fast path into the implementation of TreeShape.equals (which should probably also move its O(1) check at the bottom to before its fields check).

You can make that optimization to TreeShape.equals in its own PR, and remove the workaround for the lack of that optimization from here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed optimization and added TODO

// decompress to incorrect values when read.
const leftCompressor = left.idCompressor;
const rightCompressor = right.idCompressor;
if (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have the scenario where chunks in the same forest have a different idCompressor? Do all chunks created in the same forest not reference the same idCompressor?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to remove it, but since the function is exported now, I updated it to an assert as a guard :)

[...left.values, ...right.values],
leftCompressor ?? rightCompressor,
);
field.splice(i, 2, merged);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the algorithm you are using here id ideal.

Rather than calling tryMergeAt, and passing in the index, I think it would make more sense to:

  1. make this function into something like "tryCoalesceChunks" which takes in two chunks and the chunk policy, and returns a chunk (replacing the two inputs) or undefined (the two inputs should still be used).
  2. The outer function can then scan over the relevant range, creating a new array of chunks for the impacted range. It can then do a single splice operation before returning, avoiding the worse case of doing many splices. Take care to avoid doing the splice at all if there are no changes.

If you take this approach, it fixes the worst case performance case, and also makes the two function much more usable. For example the new inner function I propose could be used directly when inserting content to try first to merge it with the chunk before it, then the one after it, and only insert to the array if both merges fail. In the case where this does a merge, it prevents the insert from having to splice the array at all, saving the possible O(n) cost of moving everything in the array.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to:

  • new tryCoalesceUniformChunks(left, right, policy) inner helper and exported
  • outer coalesceUniformChunks scans the range, builds a result and does splice at end (if there are changes)

return false;
}

const merged = new UniformChunk(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When merging chunks, ideally instead of always creating a new chunk, if the first/left chunk only as ref count 1, just append the second chunk's content array to it, and update the chunk shape to one with the new top level length. That way if you try and merge a bunch of chunks together in a row, instead of getting O(n^2) array data allocated and filled for the values arrays, you only get O(n) as you grow a single chunk.

Note: because of this issue, I think coalesceAroundSpliceis actually has worst case runtime of O(Max(insertedCount^2, insertedCount * fields.length))

Splitting out a dedicated coalesceUniformChunks which takes in two uniform chunks, asserts the shapes are the same, then does this merging approach could be good, so it can have its own targeted tests about getting the refcounts and top level lengths correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated so when !left.isShared(), tryCoalesceUniformChunks extends left.values in place, updates left.shape, releases right, and returns left. Otherwise it falls back to allocating a new chunk.


/**
* Manually seeds the root field with the requested sequence of single-shape UniformChunks.
* Sidesteps the default chunker, which would produce one UniformChunk per top-level node,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate anymore (@brrichards fixed it I believe)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the default chunker now produces multi-node UniformChunks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated stale comment

assertChunksUnchanged(field, snapshot);
});

it("keeps chunk count bounded under repeated mid-field edits", () => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This test looks like it rewrites the splitFieldAtIndex logic which using might simplify/help with readability, but the test itself looks good!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with suggestion

Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Outdated
Comment on lines +660 to +661
* Caps merged chunks at {@link ChunkPolicy.uniformChunkNodeCountDynamicTargetMax}, matching the
* bisect threshold used by {@link splitFieldAtIndex} so split-and-coalesce pairs don't oscillate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this. Could we expand on this a bit? Also, does this need to be in a @privateRemarks block? Those are usually reserved for docs we don't want end-users to see, and this note seems relevant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to @remarks and elaborated for clarification

Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts
Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Outdated
Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Outdated
Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Outdated
Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Outdated
Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Outdated
Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Outdated
Comment thread packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Outdated
Comment on lines +742 to +743
// TODO: TreeShape.equals could short-circuit on identity for the common
// (same-chunker) case; until then this is a full structural check on every merge.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this in this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was originally part of this PR, but Craig suggested that we split it up into two

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a task tracking this next step? Let's make sure we have something tracking this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added task :)

Comment on lines +759 to +761
if (combinedTopLevel > policy.uniformChunkNodeCountDynamicTargetMax) {
return undefined;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: A quick comment explaining this might be useful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added explanation

// and returned; `right`'s slot ref is released.
left.values.push(...right.values);
left.shape = leftTreeShape.withTopLevelLength(combinedTopLevel);
if (leftCompressor === undefined && rightCompressor !== undefined) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assert above that leftCompressor cannot be undefined. Do we need this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above assert checks if either right or left compressor is undefined, so this was trying to set it to the rightCompressor if it is undefined (similarly to leftCompressor ?? rightCompressor used below it). But the if statement seemed unnecessary and updated it to left.idCompressor ??= rightCompressor

Comment thread packages/dds/tree/src/test/feature-libraries/chunked-forest/chunkedForest.spec.ts Outdated
Comment thread packages/dds/tree/src/test/feature-libraries/chunked-forest/chunkedForest.spec.ts Outdated
Comment thread packages/dds/tree/src/test/feature-libraries/chunked-forest/chunkTree.spec.ts Outdated

@Josmithr Josmithr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions and suggestions.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Bundle size comparison

Base commit: e121ff71f3ebed80c656315486933fe2d6859b32
Head commit: aedde4fdeb06c3561886b99434d8bc0a3e508e9e

Notable changes

No bundles changed by ≥ 500 bytes parsed.

Per-bundle deltas

@fluid-example/bundle-size-tests

  • azureClient.js: parsed 618613 → 618669 (+56), gzip 164734 → 164782 (+48)
  • odspClient.js: parsed 591845 → 591901 (+56), gzip 158885 → 158927 (+42)
  • aqueduct.js: parsed 525463 → 525498 (+35), gzip 140683 → 140714 (+31)
  • fluidFramework.js: parsed 392014 → 392035 (+21), gzip 111104 → 111122 (+18)
  • sharedTree.js: parsed 381401 → 381415 (+14), gzip 108501 → 108513 (+12)
  • containerRuntime.js: parsed 303813 → 303827 (+14), gzip 83188 → 83196 (+8)
  • sharedString.js: parsed 175984 → 175991 (+7), gzip 49445 → 49453 (+8)
  • experimentalSharedTree.js: parsed 160798 → 160798 (0), gzip 45804 → 45804 (0)
  • matrix.js: parsed 159845 → 159852 (+7), gzip 45411 → 45418 (+7)
  • loader.js: parsed 145307 → 145321 (+14), gzip 39063 → 39078 (+15)
  • odspDriver.js: parsed 104423 → 104444 (+21), gzip 32644 → 32651 (+7)
  • directory.js: parsed 66616 → 66623 (+7), gzip 18532 → 18540 (+8)
  • 748.js: parsed 58793 → 58793 (0), gzip 17826 → 17826 (0)
  • map.js: parsed 46709 → 46716 (+7), gzip 14310 → 14317 (+7)
  • odspPrefetchSnapshot.js: parsed 45642 → 45656 (+14), gzip 15268 → 15277 (+9)
  • 594.js: parsed 44493 → 44493 (0), gzip 13744 → 13744 (0)
  • summarizerDelayLoadedModule.js: parsed 30753 → 30753 (0), gzip 7767 → 7767 (0)
  • socketModule.js: parsed 26486 → 26493 (+7), gzip 7879 → 7887 (+8)
  • createNewModule.js: parsed 12480 → 12480 (0), gzip 4786 → 4786 (0)
  • summaryModule.js: parsed 3797 → 3797 (0), gzip 1860 → 1860 (0)
  • connectionState.js: parsed 724 → 724 (0), gzip 429 → 429 (0)
  • sharedTreeAttributes.js: parsed 666 → 673 (+7), gzip 431 → 441 (+10)
  • debugAssert.js: parsed 429 → 429 (0), gzip 299 → 299 (0)
  • FluidFramework-HashFallback.js: parsed 422 → 422 (0), gzip 316 → 316 (0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants