Skip to content

Conversation

@Josh-Engle
Copy link

Description

This PR addresses a race condition in LocalFileStore where concurrent writes to the same key could lead to data loss or corruption. Previously, multiple mset operations on the same key would execute in parallel, causing file writes to overwrite each other. Additionally, added guards against incomplete file writes due to unexpected process termination.

This is addressed through a number of mechanisms.

  • Dedupe keys within the same mset call to ensure the last one is written.
  • Use a promise queue to guarantee sequential execution across multiple mset calls.
  • Write to tmp files and then rename to text files prevent partial writes in the case of unexpected termination.

Fixes #9337

@changeset-bot
Copy link

changeset-bot bot commented Nov 13, 2025

⚠️ No Changeset found

Latest commit: 4fdbffc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Member

@hntrl hntrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Josh-Engle! One Q about your approach:

Comment on lines +270 to +301
/**
* Writes data to a temporary file before atomically renaming it into place.
* @param content Serialized value to persist.
* @param fullPath Destination path for the stored key.
*/
private async writeFileAtomically(content: Uint8Array, fullPath: string) {
const directory = path.dirname(fullPath);
await fs.mkdir(directory, { recursive: true });

const tempPath = `${fullPath}.${Date.now()}-${Math.random()
.toString(16)
.slice(2)}.tmp`;

try {
await fs.writeFile(tempPath, content);

try {
await fs.rename(tempPath, fullPath);
} catch (renameError) {
const code = (renameError as { code?: string }).code;
if (renameError && (code === "EPERM" || code === "EACCES")) {
await fs.writeFile(fullPath, content);
await fs.unlink(tempPath).catch(() => {});
} else {
throw renameError;
}
}
} catch (error) {
await fs.unlink(tempPath).catch(() => {});
throw error;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the temp files if we're sequencing files using withKeyLock?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an added guard to prevent partial file writes causing data corruption if the process terminates before the buffer is fully flushed for a particular file.

We write to the temp file first as this might be interrupted. If it is interrupted, only the temp file is corrupted which is cleaned up. Once we have fully flushed the fs buffer and we know it's in a good state, we overwrite the previous file with the rename which cannot be interrupted as it is either successful or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a common occurrence? What is the rename operation fails?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to say if it's common but I can image relatively likely scenarios and the problem tends to scale up with larger file sizes. Larger files mean larger write times, increasing the chance of a failure mid-write.

A good example might be:

  1. A developer force terminates the process execution while LocalFileStore is halfway through writing a file to the file system.
  2. The incomplete write operation results in malformed JSON as it is not terminated correctly.
  3. Reading the txt file results in an invalid JSON exception.

The rename operation is considerably more reliable then the writeFile because it's a single operation. If we have already written data to the filesystem with writeFile it's very likely the rename will work correctly as it requires the same OS filesystem permissions.

For windows I think it would only fail if there was a open file handle and on Linux it would fail if it doesn't have access to the directory. In both cases I think a failure is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concurrent LocalFileStore mset writes can corrupt chunked embeddings cache

2 participants