Skip to content

Conversation

@requilence
Copy link

@requilence requilence commented Aug 27, 2025

Problem

When uploading large files through UnixFS or other IPLD structures, the current batch implementation keeps everything in memory and writes blocks individually. If the application crashes mid-process, partial blocks remain in FlatFS, making garbage collection extremely difficult since there's no way to identify which blocks belong to incomplete operations.

Solution

This PR rewrites the batch mode to use a temporary directory approach:

  1. Atomic batch operations: All blocks in a batch are written to a temporary directory first, then renamed to their final locations only on commit. If the process crashes, no partial data remains in the main datastore - the temp directory is cleaned on restart.
  2. Memory efficiency: Instead of keeping all data in memory, blocks are written directly to temp files, enabling large file uploads without high RAM usage.
  3. Read operations in batch: Added BatchReader interface implementing Get/Has/GetSize methods. This is essential for IPLD structures which need to verify block links during construction. Without batch reads, you cannot build complex IPLD structures that reference blocks added earlier in the same batch. This follows standard database transaction semantics where reads within a transaction see uncommitted writes.

Implementation Details

  • Each batch creates a unique temp directory under .temp/
  • Files are written without sharding in temp
  • On commit, files are atomically renamed to their sharded destinations
  • On discard or crash, temp files are cleaned up (on the next startup). This code was already existing
  • Thread-safe with mutex protection
  • Batch can be reused after commit/discard (because this behavior of Batch was before my changes)

Write batch data to temp directory instead of memory to avoid high RAM usage with large files. Each batch creates its own temp directory, writes blocks directly to disk, and renames all files atomically on commit.
@welcome
Copy link

welcome bot commented Aug 27, 2025

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

- Removed unused tempFileOnce() method
- Replaced unused 'done' variable with '_' in Commit method
Only check temp directory for keys that exist in puts slice
@requilence requilence force-pushed the batch-temp-directory branch from ddd53d9 to d939e0e Compare August 29, 2025 11:34
@gammazero gammazero added kind/enhancement A net-new feature or improvement to an existing feature need/maintainer-input Needs input from the current maintainer(s) labels Sep 2, 2025
@gammazero gammazero self-requested a review September 2, 2025 16:32
@gammazero
Copy link
Contributor

Need to review. Check that there is still ability to do in-memory batching for smaller batches.

Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that this will negatively impact the performance of Put. This will make Commit faster because the temp files are already written, but existing code my expect a Put that is not slowed down by writing to the filesystem.

It may be worth considering having Put write the temp files asynchronously in a separate goroutine. Then Commit will wait for any outstanding Put to finish, handle any errors from async Puts , and then move all the temp files to their final destination.

@lidel lidel added the need/author-input Needs input from the original author label Sep 9, 2025
@hsanjuan
Copy link
Contributor

Triage questions that come up:

  • Can you describe the scenario in which memory usage of batches becomes a problem? Is this while using Kubo or are you using flatfs and interacting with it in a different application?

@hsanjuan hsanjuan removed the need/maintainer-input Needs input from the current maintainer(s) label Sep 16, 2025
@lidel
Copy link
Member

lidel commented Oct 21, 2025

Triage notes:

  • being able to remove RAM-limit is useful for self-hosted adoption, but comes with some risk
  • we could mitigate by do writes async until the final batch commit is executed, and wait for async work

@requilence gentle ping, are you able to answer where this problem/need surfaced? Are yo uusing Kubo or your own implementation? On what hardware are you running?

@lidel lidel added need/author-input Needs input from the original author and removed need/author-input Needs input from the original author labels Oct 21, 2025
@gammazero
Copy link
Contributor

Batch can be reused after commit/discard (because this behavior of Batch was before my changes)

I do not think we need or want to guarantee this, because the ability to reuse a Batch was not consistent across different datastores, so should not be done in general.

Address PR review feedback
@requilence
Copy link
Author

requilence commented Oct 22, 2025

@lidel Thanks for the review! I've addressed the feedback:

Async Write Operations

  • Put now writes to temp files asynchronously in goroutines and returns immediately
  • Commit and all read/write operations waits for all async writes to complete before proceeding
  • Fail-fast error handling: First async error is captured and returned on next Put/Delete/Commit or any read operation, preventing unnecessary work if disk is full or other errors occur

@requilence requilence requested a review from gammazero October 22, 2025 11:18
@requilence
Copy link
Author

requilence commented Oct 22, 2025

Triage questions that come up:

  • Can you describe the scenario in which memory usage of batches becomes a problem? Is this while using Kubo, or are you using flatfs and interacting with it in a different application?

We are using flatfs and the IPLD structures in Anytype (http://github.com/anyproto/anytype-heart) to store files on users' devices. If we are not using a batch operation, we can find ourselves in a situation where the user force-closes the app in the middle of a large file (e.g., 4 GB video) addin, which will leave a lot of garbage blocks in flatfs. Therefore, we are using atomic batch operations. At the same time, we are trying to avoid heap allocations and reusing memory, as it is especially critical on mobile devices.

@lidel, I'm not sure async Put is really a good option for us. We would prefer to do it synchronously; otherwise, it could lead to heap spikes. I'm considering a clean way to make this configurable while maintaining backward compatibility. Perhaps a new OpenWithArgs that will have variadic arguments? It will also be possible to override the fs.FS implementation with a custom or mocked version.

@gammazero
Copy link
Contributor

gammazero commented Oct 23, 2025

@requilence

I'm not sure async Put is really a good option for us. We would prefer to do it synchronously

Consider limiting the number of asynchronous write operations. I think that it is reasonable for an async write operation to wait until the previous async write operation has completed. This will allow a write function to return immediately, while the work is done during accumulation of operations in the next batch.

Additionally, there could be an option to enable/disable asynchronous puts. Although this is probably not necessary if above limit is implemented.

Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See and commit suggestions: Added a semaphore to limit concurrent puts

@lidel lidel mentioned this pull request Oct 27, 2025
74 tasks
@hsanjuan hsanjuan requested a review from lidel October 28, 2025 15:35
@hsanjuan hsanjuan added the need/maintainer-input Needs input from the current maintainer(s) label Oct 28, 2025
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to document the current state (see godoc drafts in comments below) and one things that stands out is O(n) related to linear seartch in puts – not sure how big of an impact this is, but we already have O(1) for deletes so maybe do the same for puts? (details inline).

If we do that, godoc should be updated to relfect that.

@gammazero
Copy link
Contributor

Triage: We will look at this after kubo v0.39

@gammazero gammazero removed the need/author-input Needs input from the original author label Nov 11, 2025
@gammazero gammazero changed the base branch from master to batch-temp-dir November 13, 2025 15:56
Co-authored-by: Marcin Rataj <[email protected]>
@gammazero gammazero merged commit 2e24d0a into ipfs:batch-temp-dir Nov 14, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement A net-new feature or improvement to an existing feature need/maintainer-input Needs input from the current maintainer(s)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants