-
-
Notifications
You must be signed in to change notification settings - Fork 966
Prevent snowballing connection creation on context cancellation. #2425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Change deadline delay for default DeadlineContextWatcherHander to prevent snowballing effect of establishing connections in excess of configured connection pool maximum size in pathological case with many context cancellations. Previously, with default deadline of 0, on context cancellation pgx was immediately marking connection as closed, even though pgConn.asyncClose still needs time to complete and PG need time to act on the cancellation request. Connection pooler (pgx pool or stdlib sql) checks `IsClosed()` state on Release, recognizes it as closed and immediately removes it from the pool, allowing new connection to be established. If new connection establishment time is less than full end-to-end query cancellation time, including time a server needs to act on cancellation request, and there are many context cancellations events happening per second, then number of connections seen by a PostgreSQL server start accumulate, exceeding configured total pool size. It eventually reaches `max_connections` limit on a server, with the majority of them being cancelled. No new connection can be established, bring app to a halt. New default deadline delay of 50ms is chosen to be larger than pessimistic time for a full query cancellation completion. With the new deadline pgx keeps connection in the pool after cancellation for up to 50ms, thus effectively throttling how fast cancelled queries can be replaced with a new connection. Because new connection creation is now throttled, it can't overrun query cancellations, preventing the snowball effect. It also has nice side effect of increasing a chance of connection reuse if what would be cancelled query completes in less than 50ms.
|
I built DeadlineContextWatcherHandler to support this behavior, but I'm not sure I want it as default. My approach to date has been that by default context cancellation should cause the canceled operation to return as quickly as possible. Regardless, it seems that the underlying issue is the pool creating more connections than it has capacity for. I have some recollection of that being an issue in the past, but I thought it was resolved. |
Yes, but it effectively misinforms an application layer: connection is still pretty much alive, but app thinks it is gone for good. This creates conditions for creating more connections than it intends to.
Pool is doing best it can: if Ideally |
This role is filled by reading from Where / how are you measuring the connection count? Is it possible that is including the temporary connections made to issue the "CancelQuery"? Those aren't full connections and aren't considered by the pool, but may be impacting this situation. |
Cleanup channel is closed in a goroutine started by
Connection count is measured on PG side using Scenario to max out server connections beyond pgx pool max size (in our case we had 6x max_connections than pool size) is following:
|
I looked at all the locations the cleanup channel is closed. In every case the Looking at the PG docs it doesn't seem there is a good way for the client to know when the server has actually finished cleaning up the connection. From https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-TERMINATION
We are already doing that. But a simple change would be to add the sleep / delay to It also might be possible to use |
Change deadline delay for default DeadlineContextWatcherHander to prevent snowballing effect of establishing connections in excess of configured connection pool maximum size in pathological case with many context cancellations.
Previously, with default deadline of 0, on context cancellation pgx was immediately marking connection as closed, even though pgConn.asyncClose still needs time to complete and PG need time to act on the cancellation request. Connection pooler (pgx pool or stdlib sql) checks
IsClosed()state on Release, recognizes it as closed and immediately removes it from the pool, allowing new connection to be established.If new connection establishment time is less than full end-to-end query cancellation time, including time a server needs to act on cancellation request, and there are many context cancellations events happening per second, then number of connections seen by a PostgreSQL server start to accumulate, exceeding configured total pool size.
It eventually reaches
max_connectionslimit on a server, with the majority of them being cancelled. No new connection can be established, bringing app to a halt.New default deadline delay of 50ms is chosen to be larger than pessimistic time for a full query cancellation completion. With the new deadline pgx keeps connection in the pool after cancellation for up to 50ms, thus effectively throttling how fast cancelled queries can be replaced with a new connection. Because new connection creation is now throttled, it can't overrun query cancellations, preventing the snowball effect.
It also has nice side effect of increasing a chance of connection reuse if what would be cancelled query completes in less than 50ms.