lore-transport: Happy eyeballs strategy for QUIC fallback connections across resolved addresses#28
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the QUIC client connection logic to use a Happy Eyeballs-style strategy when a hostname resolves to multiple socket addresses, avoiding long stalls on an unreachable first address while preserving existing per-address configuration and error behavior.
Changes:
- Collect resolved socket addresses and attempt connections concurrently with a fixed stagger (250ms).
- Introduce
connect_happy_eyeballsto schedule attempts and return the first successful connection. - Add unit tests covering stalled-first-attempt fallback, immediate advance on failure, no fallback after first success, and all-fail behavior.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ragnarula
left a comment
There was a problem hiding this comment.
This is neat! I wasn't aware of the happy eyeballs technique before. Left some minor comments to address.
| F: FnMut(SocketAddr) -> Fut, | ||
| Fut: Future<Output = Option<T>>, | ||
| { | ||
| let mut remote_addrs = remote_addrs.into_iter(); |
There was a problem hiding this comment.
According to the RFC a good practice is to interleave address families before attempting connections. https://www.rfc-editor.org/info/rfc8305/#section-4
There was a problem hiding this comment.
Good catch. The way I propose gives us the resolver’s ordering, but it does not guarantee the family interleaving recommended by RFC 8305. I’ll add an interleaving step that preserves the preferred first family and the relative order within each of them.
There was a problem hiding this comment.
The interleaving is added and waiting for the review 😄.
And a process-related question: do I resolve the conversation, or the commenter?
| { | ||
| let mut remote_addrs = remote_addrs.into_iter(); | ||
| let mut attempts = FuturesUnordered::new(); | ||
| attempts.push(connect(remote_addrs.next()?)); |
There was a problem hiding this comment.
What are the drop semantics on the stalled connections?
There was a problem hiding this comment.
When one attempt succeeds, connect_happy_eyeballs returns and drops the FuturesUnordered, which drops all outstanding connect_to_addr futures. Each future owns its Quinn Connecting and Endpoint. Dropping the last external ConnectionRef triggers an implicit close; the endpoint driver then exits once its connections are gone. The losing attempts therefore get canceled rather than continuing. I’ll add a short comment explaining this cancellation behavior.
There was a problem hiding this comment.
I added a small 2-line comment below on the semantics 😊
| delay.as_mut().reset(tokio::time::Instant::now() + attempt_delay); | ||
| } | ||
| } | ||
| _ = &mut delay => { |
There was a problem hiding this comment.
Maybe being overly cautious in this case, but is it worth having some bound on the number of inflight attempts? Following the principle of not having unbounded things going on when processing externally provided data.
There was a problem hiding this comment.
You are right. Although the address list is finite, it is externally influenced, so even though the RFC does not mandate it (just asks for staggering), I’ll add a fixed limit of in-flight attempts; further addresses will remain queued until an attempt completes. This will keep the resource usage bounded.
There was a problem hiding this comment.
Now the attempts in-flight is limited to 10 (and we can tweak it with a constant, if necessary). I was considering limiting per-family, but decided not to, due to more complexity added.
e8bbffe to
fe91bcf
Compare
|
This looks like a pretty good addition now! However, please remove the additional |
|
@mjansson Ok, I removed the unnecessary attribution, should I now squash the commits, or you'll do it on your end? |
Doesn't matter, but all the commits need to be with the DCO sign-off - the |
ee2c100 to
f457a07
Compare
|
@mjansson Done, one commit, signed-off as per DCO. |
Signed-off-by: Andrzej Haczewski <ahaczewski@gmail.com>
f457a07 to
c85f9c1
Compare
Use a Happy Eyeballs strategy (see RFC8305) when the QUIC client connects
to a hostname that resolves to multiple socket addresses.
The client now:
The existing per-address endpoint setup and final error behavior remain unchanged.
Why
The previous implementation awaited QUIC addresses sequentially. If
localhostresolved to::1before127.0.0.1whileloreserverlistened onits default IPv4 address, the IPv6 attempt consumed the full 30-second QUIC idle
timeout before IPv4 was attempted.
This caused commands such as
lore historyto take about 30seconds because repository initialization had started a background QUIC
pre-warm, even if not required.
Before:
Using
127.0.0.1directly completed in0.08s, confirming that addressfallback was responsible for the delay.
Closes #27
Testing
Added unit coverage for:
Verification performed:
All 28
lore-transporttests pass.The original reproduction using
lore://localhost:41337now completes in:A remote immutable-store query also demonstrated the complete fallback:
Note on tests
The
lore-transport/src/quic/client.rsfile did not contain any test previously,and if they are wrongly placed, let me know and I'll make amendments.
Alternative approach
The alternative would be to prevent QUIC from connecting for commands
that do not require QUIC connection. I rejected that solution due to sweeping
changes required.
AI disclosure
OpenAI GPT-5.5 was used to improve upon the first version of this PR.
CoPilot-generated code review snippets were used to amend the PR with
suggested fixes.