Skip to content

Conversation

@Peiyingy
Copy link
Member

Description

Following the work in #783 to introduce caching in HaGatewayManager, this PR adds a write buffer mechanism to QueryHistoryManager. With this change, if the database becomes unavailable, Trino-Gateway can continue to route queries using the cached Trino cluster data while temporarily storing query history records in the write buffer. When the database is available again, the buffered history entries will be flushed. This allows Trino-Gateway to avoid using the database as a single point of failure and improves overall resiliency.

This approach has already been implemented and validated in production at LinkedIn.

Testing

  • mvn clean install

@cla-bot
Copy link

cla-bot bot commented Nov 11, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Peiying Ye.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@Peiyingy Peiyingy force-pushed the query-history-write-buffer branch from 0703b78 to 0512583 Compare November 11, 2025 00:39
@cla-bot
Copy link

cla-bot bot commented Nov 11, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Peiying Ye.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@Peiyingy Peiyingy force-pushed the query-history-write-buffer branch from 0512583 to 8c52dd0 Compare November 11, 2025 00:40
@cla-bot cla-bot bot added the cla-signed label Nov 11, 2025
trigger build
@Peiyingy Peiyingy force-pushed the query-history-write-buffer branch from 85776ef to caab5db Compare November 12, 2025 19:00
@Peiyingy Peiyingy marked this pull request as ready for review November 12, 2025 19:32
{
dao = requireNonNull(jdbi, "jdbi is null").onDemand(QueryHistoryDao.class);
this.isOracleBackend = isOracleBackend;
if (writeBufferConfig != null && writeBufferConfig.isEnabled()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (writeBufferConfig != null && writeBufferConfig.isEnabled()) {
if (writeBufferConfig.isEnabled()) {

writeBufferConfig should never be null. If it's null for whatever reason, we should fail fast.

Comment on lines +30 to +36
public void buffer(T item)
{
if (!deque.offerLast(item)) {
deque.pollFirst();
deque.offerLast(item);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs synchronized to be thread safe.

Comment on lines +48 to +58
int flushed = 0;
for (T next; (next = deque.pollFirst()) != null; ) {
try {
flusher.accept(next);
flushed++;
}
catch (RuntimeException e) {
deque.offerFirst(next);
break; // stop after first failure
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may fail to insert back to the queue when it's full.
Maybe peek() -> accept() -> remove() ?

public class WriteBufferConfiguration
{
private boolean enabled;
private int maxCapacity = 10000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this number 10,000?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a default value, same for the flushInterval. You can always overwrite them in the config file. I'll also update the default value settings in the doc. Do you recommend to set different default values for them?

{
private boolean enabled;
private int maxCapacity = 10000;
private Duration flushInterval = new Duration(2, TimeUnit.SECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for 2 seconds. What is the reason for this number?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


public void setMaxCapacity(int maxCapacity)
{
this.maxCapacity = maxCapacity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check for negative/zero number?

Comment on lines +26 to +33
public WriteBufferConfiguration() {}

public WriteBufferConfiguration(boolean enabled, int maxCapacity, Duration flushInterval)
{
this.enabled = enabled;
this.maxCapacity = maxCapacity;
this.flushInterval = flushInterval;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you need two constructors?

}
catch (RuntimeException e) {
if (isConnectionIssue(e) && writeBuffer != null) {
writeBuffer.buffer(queryDetail);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if writer buffer is full?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we won't buffer this record and the query submission would fail - same behavior as the current code without this change.

writeBufferConfig.getFlushInterval().toMillis(),
writeBufferConfig.getFlushInterval().toMillis(),
TimeUnit.MILLISECONDS);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't there be a race condition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate the race condition here? The flusher scheduledExecutor is a single thread here, and the buffering thread safe issue is handled in the WriteBuffer class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants