-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-37662: Binlog Corruption When tmpdir is Full #4373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 10.11
Are you sure you want to change the base?
Conversation
28844af to
9213ed1
Compare
The binary log could be corrupted when committing a large transaction (i.e. one whose data exceeds the binlog_cache_size limit and spills into a tmp file) in binlog_format=row if the server's --tmp-dir is full. The corruption that happens is only the GTID of the errored transaction would be written into the binary log, without any body/finalizing events. This would happen because the content of the transaction wasn't flushed at the proper time, and the transaction's binlog cache data was not durable while trying to copy the content from the binlog cache file into the binary log itself. While switching the tmp file from a WRITE_CACHE to a READ_CACHE, the server would see there is still data to flush in the cache, and first try to flush it. This is not a valid time to flush that data to the temporary file though, as the GTID event has already been written directly to the binary log. So if this flushing fails, it leaves the binary log in a corrupted state. The flush itself is expected to happen in THD::binlog_flush_pending_rows_event(). However, if there is no pending event, the flush is skipped.
The binary log could be corrupted when committing a large transaction
(i.e. one whose data exceeds the binlog_cache_size limit and spills
into a tmp file) in binlog_format=row if the server's --tmp-dir is
full. The corruption that happens is only the GTID of the errored
transaction would be written into the binary log, without any
body/finalizing events. This would happen because the content of the
transaction wasn't flushed at the proper time, and the transaction's
binlog cache data was not durable while trying to copy the content
from the binlog cache file into the binary log itself. While switching
the tmp file from a WRITE_CACHE to a READ_CACHE, the server would see
there is still data to flush in the cache, and first try to flush it.
This is not a valid time to flush that data to the temporary file
though, as:
1. The GTID event has already been written directly to the binary
log. So if this flushing fails, it leaves the binary log in a
corrupted state.
2. This is done during group commit, and will slow down other
concurrent transactions, which are otherwise ready to commit.
This patch fixes these issues by ensuring all transaction data is
fully flushed to its temporary file (if used) before starting any
critical paths, i.e. in binlog_flush_cache(). Note that if the binlog
cache is solely in-memory, this flush-to-temporary-file is skipped.
Reviewed-by: TODO
9213ed1 to
72d0eef
Compare
andrelkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work looks good. Special thanks for a solid analysis!
I have only a kind of cosmetic notes.
| Returns TRUE on success, FALSE on error. | ||
| */ | ||
| static my_bool binlog_cache_reconcile_data_in_storage(IO_CACHE *info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new function purpose is clear, but its introduction as static feels an overkill..
| ready-to-commit (concurrent) transactions could be stalled | ||
| */ | ||
| if (using_stmt && !thd->binlog_flush_pending_rows_event(TRUE, FALSE) && | ||
| binlog_cache_reconcile_data_in_storage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. so instead my_b_get_pos_in_file(info) && flush_io_cache(info) would be a fair enough at this point, as a new inline, why not flush_pending_bytes(info); flushin the name, while reads naturally, is also for consistency with the caller and a sibling's names.
Also the suggested name suggests to me (much) shorter comments that your static function's header ones..
Indeed, the whole point of the caller function is to get all stuff to disk. That, as it turns out in this bug, just deals on few levels.
The binary log could be corrupted when committing a large transaction
(i.e. one whose data exceeds the binlog_cache_size limit and spills
into a tmp file) in binlog_format=row if the server's --tmp-dir is
full. The corruption that happens is only the GTID of the errored
transaction would be written into the binary log, without any
body/finalizing events. This would happen because the content of the
transaction wasn't flushed at the proper time, and the transaction's
binlog cache data was not durable while trying to copy the content
from the binlog cache file into the binary log itself. While switching
the tmp file from a WRITE_CACHE to a READ_CACHE, the server would see
there is still data to flush in the cache, and first try to flush it.
This is not a valid time to flush that data to the temporary file
though, as:
The GTID event has already been written directly to the binary
log. So if this flushing fails, it leaves the binary log in a
corrupted state.
This is done during group commit, and will slow down other
concurrent transactions, which are otherwise ready to commit.
This patch fixes these issues by ensuring all transaction data is
fully flushed to its temporary file (if used) before starting any
critical paths, i.e. in binlog_flush_cache(). Note that if the binlog
cache is solely in-memory, this flush-to-temporary-file is skipped.
This PR is organized as follows:
Commit 1: The regression that reproduces the bug report
Commit 2: The fix
And any future commits will be addressing reviews.