Skip to content

Conversation

@deukyeon
Copy link
Contributor

Problem

Splinter faces a scalability issue due to waiting for all I/O completions whenever an asynchronous read or write request is submitted.

Solution

To address this, laio_cleanup() is piggybacked in laio_read_async() and laio_write_async() to handle completions returned by io_getevents() once. It immediately returns if there are no completions, even if some I/O operations may be still in flight.

@deukyeon deukyeon requested a review from rtjohnso August 29, 2024 22:50
@deukyeon deukyeon self-assigned this Aug 29, 2024
@netlify
Copy link

netlify bot commented Aug 29, 2024

Deploy Preview for splinterdb canceled.

Name Link
🔨 Latest commit ba91d41
🔍 Latest deploy log https://app.netlify.com/sites/splinterdb/deploys/6736f073cf55470008d1daad

@deukyeon deukyeon changed the title Make async read and write not to wait all other IO completions Scalability issue with laio Aug 30, 2024
@deukyeon deukyeon changed the title Scalability issue with laio Scalability issue due to laio cleanup Aug 30, 2024
@deukyeon deukyeon removed their assignment Sep 3, 2024
@deukyeon deukyeon closed this Sep 3, 2024
@deukyeon deukyeon deleted the deukyeon/fix-io-bottleneck branch September 3, 2024 15:52
@deukyeon deukyeon restored the deukyeon/fix-io-bottleneck branch September 3, 2024 15:57
@deukyeon deukyeon reopened this Sep 3, 2024
@rtjohnso
Copy link
Collaborator

Hi Deukyeon,

Sorry it took me so long to get to this.

The laio code has been substantially rewritten.

If I understand the intent of this PR, the goal was to change the way the laio code performs laio_cleanup() in the loop attempting to submit a new I/O. In the old code, after every failed attempt, it would process all pending I/O completions. Your PR changes it to process only one completion.

My rewrite changes that whole loop. The old code (and the code in your PR) would just keep calling io_submit() and laio_cleanup() in a tight loop until it got a success. In my rewrite, if io_sbumit() returns EAGAIN, then the caller sleeps on a wait queue for the completion of another I/O. Thus it is up to the caller to call laio_cleanup() (or some wrapper of laio_cleanup()).

So the responsibility to call the cleanup function is pushed all the way to the application, i.e. the tests in the splinter repo.

The tests all use the cache_cleanup() wrapper, which I also changed in clockcache.c to process at most 1 pending I/O completion, as in your PR.

So I believe that the main purpose of this PR has now been incorporated into the code.

@rtjohnso rtjohnso closed this Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants