-
Notifications
You must be signed in to change notification settings - Fork 1.4k
misc: Use ioCounter for pageLoadTimeNs #15407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for meta-velox canceled.
|
ioCounter for pageLoadTimeNs
velox/dwio/common/Reader.h
Outdated
| virtual std::unique_ptr<RowReader> createRowReader( | ||
| const RowReaderOptions& options = {}) const = 0; | ||
| const RowReaderOptions& options = {}, | ||
| std::shared_ptr<io::IoStatistics> ioStats = nullptr) const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally format reader should not be aware of such storage level stats. And other formats than Parquet does not need this. Is there a way to extract the IO stats from the BufferedInput in parquet reader, instead of drilling a hole through format layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing! We discovered that simply changing the variable type to ioCounter is enough to gather these metrics.
velox/common/io/IoStatistics.h
Outdated
| casLoop(max_, other.max_, std::less()); | ||
| } | ||
|
|
||
| void clear() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do ioCounter = {} should be enough, there is no need to add an extra method for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests and benchmarks need to reinitialize a RuntimeStatistics object. But after adding IoCounter to it, the assignment operator is implicitly deleted due to the use of atomic variables within IoCounter. To address this issue, clear() methods have been introduced as an alternative solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a copy assignment for IoCounter then? Then there is no need to duplicate the initialization logic in all containing objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
| stats.columnReaderStatistics.pageLoadTimeNs += | ||
| columnReaderStats_.pageLoadTimeNs; | ||
| stats.columnReaderStatistics.pageLoadTimeNs.increment( | ||
| columnReaderStats_.pageLoadTimeNs.sum()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateRuntimeStats is called by HiveDataSource::next and eventually at
velox/velox/exec/TableScan.cpp
Line 204 in 04faded
| dataOptional = dataSource_->next(readBatchSize, blockingFuture_); |
If using merge here, it's more like a batch-level updating for the min/max stats. Typically the other runtime stats are merged at task level at
velox/velox/exec/TableScan.cpp
Line 315 in 04faded
| lockedStats->runtimeStats.at(name).merge(metric); |
How do you think? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So it's counted in columnreader level or page level. We care about single I/O on page level. So we should use merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
velox/dwio/common/Statistics.h
Outdated
| result.emplace( | ||
| "pageLoadTimeNs", | ||
| RuntimeMetric(columnReaderStatistics.pageLoadTimeNs)); | ||
| RuntimeMetric(columnReaderStatistics.pageLoadTimeNs.sum())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait until #15408 merged, then update count,min,max
5d3967b to
76a9ef3
Compare
This PR uses
ioCounterforpageLoadTimeNsto collect its min, max, count stats.