-
Notifications
You must be signed in to change notification settings - Fork 2.9k
fix(langchain): Prevent local file corruption when using LocalFileStore
#9386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(langchain): Prevent local file corruption when using LocalFileStore
#9386
Conversation
|
hntrl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Josh-Engle! One Q about your approach:
| /** | ||
| * Writes data to a temporary file before atomically renaming it into place. | ||
| * @param content Serialized value to persist. | ||
| * @param fullPath Destination path for the stored key. | ||
| */ | ||
| private async writeFileAtomically(content: Uint8Array, fullPath: string) { | ||
| const directory = path.dirname(fullPath); | ||
| await fs.mkdir(directory, { recursive: true }); | ||
|
|
||
| const tempPath = `${fullPath}.${Date.now()}-${Math.random() | ||
| .toString(16) | ||
| .slice(2)}.tmp`; | ||
|
|
||
| try { | ||
| await fs.writeFile(tempPath, content); | ||
|
|
||
| try { | ||
| await fs.rename(tempPath, fullPath); | ||
| } catch (renameError) { | ||
| const code = (renameError as { code?: string }).code; | ||
| if (renameError && (code === "EPERM" || code === "EACCES")) { | ||
| await fs.writeFile(fullPath, content); | ||
| await fs.unlink(tempPath).catch(() => {}); | ||
| } else { | ||
| throw renameError; | ||
| } | ||
| } | ||
| } catch (error) { | ||
| await fs.unlink(tempPath).catch(() => {}); | ||
| throw error; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of the temp files if we're sequencing files using withKeyLock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an added guard to prevent partial file writes causing data corruption if the process terminates before the buffer is fully flushed for a particular file.
We write to the temp file first as this might be interrupted. If it is interrupted, only the temp file is corrupted which is cleaned up. Once we have fully flushed the fs buffer and we know it's in a good state, we overwrite the previous file with the rename which cannot be interrupted as it is either successful or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a common occurrence? What is the rename operation fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to say if it's common but I can image relatively likely scenarios and the problem tends to scale up with larger file sizes. Larger files mean larger write times, increasing the chance of a failure mid-write.
A good example might be:
- A developer force terminates the process execution while
LocalFileStoreis halfway through writing a file to the file system. - The incomplete write operation results in malformed JSON as it is not terminated correctly.
- Reading the
txtfile results in an invalid JSON exception.
The rename operation is considerably more reliable then the writeFile because it's a single operation. If we have already written data to the filesystem with writeFile it's very likely the rename will work correctly as it requires the same OS filesystem permissions.
For windows I think it would only fail if there was a open file handle and on Linux it would fail if it doesn't have access to the directory. In both cases I think a failure is expected.
Description
This PR addresses a race condition in
LocalFileStorewhere concurrent writes to the same key could lead to data loss or corruption. Previously, multiplemsetoperations on the same key would execute in parallel, causing file writes to overwrite each other. Additionally, added guards against incomplete file writes due to unexpected process termination.This is addressed through a number of mechanisms.
msetcall to ensure the last one is written.msetcalls.Fixes #9337