Skip to content

Conversation

@rui-mo
Copy link
Collaborator

@rui-mo rui-mo commented Nov 5, 2025

This PR uses ioCounter for pageLoadTimeNs to collect its min, max, count stats.

@rui-mo rui-mo requested a review from majetideepak as a code owner November 5, 2025 06:58
@netlify
Copy link

netlify bot commented Nov 5, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 76a9ef3
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/6911fef99b0d740008ddcda1

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2025
@rui-mo rui-mo marked this pull request as draft November 5, 2025 14:52
@rui-mo rui-mo changed the title misc: Use ioCounter for pageLoadTimeNs misc: Move pageLoadTimeNs into ioStats Nov 5, 2025
virtual std::unique_ptr<RowReader> createRowReader(
const RowReaderOptions& options = {}) const = 0;
const RowReaderOptions& options = {},
std::shared_ptr<io::IoStatistics> ioStats = nullptr) const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally format reader should not be aware of such storage level stats. And other formats than Parquet does not need this. Is there a way to extract the IO stats from the BufferedInput in parquet reader, instead of drilling a hole through format layer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing! We discovered that simply changing the variable type to ioCounter is enough to gather these metrics.

@rui-mo rui-mo changed the title misc: Move pageLoadTimeNs into ioStats misc: Use ioCounter for pageLoadTimeNs Nov 6, 2025
@rui-mo rui-mo marked this pull request as ready for review November 6, 2025 09:35
casLoop(max_, other.max_, std::less());
}

void clear() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do ioCounter = {} should be enough, there is no need to add an extra method for this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests and benchmarks need to reinitialize a RuntimeStatistics object. But after adding IoCounter to it, the assignment operator is implicitly deleted due to the use of atomic variables within IoCounter. To address this issue, clear() methods have been introduced as an alternative solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a copy assignment for IoCounter then? Then there is no need to duplicate the initialization logic in all containing objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

stats.columnReaderStatistics.pageLoadTimeNs +=
columnReaderStats_.pageLoadTimeNs;
stats.columnReaderStatistics.pageLoadTimeNs.increment(
columnReaderStats_.pageLoadTimeNs.sum());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use merge?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateRuntimeStats is called by HiveDataSource::next and eventually at

dataOptional = dataSource_->next(readBatchSize, blockingFuture_);

If using merge here, it's more like a batch-level updating for the min/max stats. Typically the other runtime stats are merged at task level at

lockedStats->runtimeStats.at(name).merge(metric);

How do you think? Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So it's counted in columnreader level or page level. We care about single I/O on page level. So we should use merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

result.emplace(
"pageLoadTimeNs",
RuntimeMetric(columnReaderStatistics.pageLoadTimeNs));
RuntimeMetric(columnReaderStatistics.pageLoadTimeNs.sum()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait until #15408 merged, then update count,min,max

@Yuhta Yuhta added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Nov 12, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 12, 2025

@Yuhta has imported this pull request. If you are a Meta employee, you can view this in D86894092.

@meta-codesync
Copy link

meta-codesync bot commented Nov 13, 2025

@Yuhta merged this pull request in 3cf456f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants