Skip to content

lore-transport: Happy eyeballs strategy for QUIC fallback connections across resolved addresses#28

Open
ahaczewski wants to merge 1 commit into
EpicGames:mainfrom
ahaczewski:fix/quic-happy-eyeballs
Open

lore-transport: Happy eyeballs strategy for QUIC fallback connections across resolved addresses#28
ahaczewski wants to merge 1 commit into
EpicGames:mainfrom
ahaczewski:fix/quic-happy-eyeballs

Conversation

@ahaczewski

@ahaczewski ahaczewski commented Jun 18, 2026

Copy link
Copy Markdown

Use a Happy Eyeballs strategy (see RFC8305) when the QUIC client connects
to a hostname that resolves to multiple socket addresses.

The client now:

  • preserves the resolver's preferred first address family;
  • interleaves IPv6 and IPv4 candidates while preserving relative order within each family;
  • starts the first address immediately;
  • starts subsequent attempts with a 250-millisecond stagger;
  • advances immediately when all active attempts fail;
  • limits concurrent attempts to 10;
  • returns the first successful connection and cancels the remaining attempts.

The existing per-address endpoint setup and final error behavior remain unchanged.

Why

The previous implementation awaited QUIC addresses sequentially. If
localhost resolved to ::1 before 127.0.0.1 while loreserver listened on
its default IPv4 address, the IPv6 attempt consumed the full 30-second QUIC idle
timeout before IPv4 was attempted.

This caused commands such as lore history to take about 30
seconds because repository initialization had started a background QUIC
pre-warm, even if not required.

Before:

remote_url = "lore://localhost:41337"

real 30.05

Using 127.0.0.1 directly completed in 0.08s, confirming that address
fallback was responsible for the delay.

Closes #27

Testing

Added unit coverage for:

  • IPv6-first and IPv4-first address-family interleaving;
  • starting a fallback while the first attempt remains stalled;
  • advancing immediately after an attempt fails;
  • avoiding fallback when the first address succeeds;
  • returning failure when every address fails;
  • bounding the number of concurrent attempts.

Verification performed:

cargo +nightly fmt --all -- --check
cargo test -p lore-transport
cargo clippy -p lore-transport --all-targets -- -D warnings --no-deps

All 28 lore-transport tests pass.

The original reproduction using lore://localhost:41337 now completes in:

real 0.38

A remote immutable-store query also demonstrated the complete fallback:

QUIC connecting to localhost at [::1]:41337
QUIC connecting to localhost at 127.0.0.1:41337
Success QUIC connecting to 127.0.0.1:41337
QUIC connection ... complete in 254ms

real 0.27

Note on tests

The lore-transport/src/quic/client.rs file did not contain any test previously,
and if they are wrongly placed, let me know and I'll make amendments.

Alternative approach

The alternative would be to prevent QUIC from connecting for commands
that do not require QUIC connection. I rejected that solution due to sweeping
changes required.

AI disclosure

OpenAI GPT-5.5 was used to improve upon the first version of this PR.

CoPilot-generated code review snippets were used to amend the PR with
suggested fixes.

Copilot AI review requested due to automatic review settings June 18, 2026 18:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the QUIC client connection logic to use a Happy Eyeballs-style strategy when a hostname resolves to multiple socket addresses, avoiding long stalls on an unreachable first address while preserving existing per-address configuration and error behavior.

Changes:

  • Collect resolved socket addresses and attempt connections concurrently with a fixed stagger (250ms).
  • Introduce connect_happy_eyeballs to schedule attempts and return the first successful connection.
  • Add unit tests covering stalled-first-attempt fallback, immediate advance on failure, no fallback after first success, and all-fail behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lore-transport/src/quic/client.rs
Comment thread lore-transport/src/quic/client.rs Outdated
Comment thread lore-transport/src/quic/client.rs
Comment thread lore-transport/src/quic/client.rs

@ragnarula ragnarula left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat! I wasn't aware of the happy eyeballs technique before. Left some minor comments to address.

F: FnMut(SocketAddr) -> Fut,
Fut: Future<Output = Option<T>>,
{
let mut remote_addrs = remote_addrs.into_iter();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the RFC a good practice is to interleave address families before attempting connections. https://www.rfc-editor.org/info/rfc8305/#section-4

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The way I propose gives us the resolver’s ordering, but it does not guarantee the family interleaving recommended by RFC 8305. I’ll add an interleaving step that preserves the preferred first family and the relative order within each of them.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interleaving is added and waiting for the review 😄.

And a process-related question: do I resolve the conversation, or the commenter?

{
let mut remote_addrs = remote_addrs.into_iter();
let mut attempts = FuturesUnordered::new();
attempts.push(connect(remote_addrs.next()?));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the drop semantics on the stalled connections?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When one attempt succeeds, connect_happy_eyeballs returns and drops the FuturesUnordered, which drops all outstanding connect_to_addr futures. Each future owns its Quinn Connecting and Endpoint. Dropping the last external ConnectionRef triggers an implicit close; the endpoint driver then exits once its connections are gone. The losing attempts therefore get canceled rather than continuing. I’ll add a short comment explaining this cancellation behavior.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a small 2-line comment below on the semantics 😊

Comment thread lore-transport/src/quic/client.rs Outdated
delay.as_mut().reset(tokio::time::Instant::now() + attempt_delay);
}
}
_ = &mut delay => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe being overly cautious in this case, but is it worth having some bound on the number of inflight attempts? Following the principle of not having unbounded things going on when processing externally provided data.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Although the address list is finite, it is externally influenced, so even though the RFC does not mandate it (just asks for staggering), I’ll add a fixed limit of in-flight attempts; further addresses will remain queued until an attempt completes. This will keep the resource usage bounded.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the attempts in-flight is limited to 10 (and we can tweak it with a constant, if necessary). I was considering limiting per-family, but decided not to, due to more complexity added.

@ahaczewski ahaczewski force-pushed the fix/quic-happy-eyeballs branch 2 times, most recently from e8bbffe to fe91bcf Compare June 19, 2026 16:18
@ahaczewski ahaczewski requested a review from Copilot June 19, 2026 16:44

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@mjansson

Copy link
Copy Markdown
Collaborator

This looks like a pretty good addition now!

However, please remove the additional // SPDX-FileCopyrightText: 2026 Andrzej Haczewski <ahaczewski@gmail.com> in the file header(s) - the PR intake process will correctly attribute the revision and the changes to your user/email

@ahaczewski

Copy link
Copy Markdown
Author

@mjansson Ok, I removed the unnecessary attribution, should I now squash the commits, or you'll do it on your end?

@mjansson

Copy link
Copy Markdown
Collaborator

@mjansson Ok, I removed the unnecessary attribution, should I now squash the commits, or you'll do it on your end?

Doesn't matter, but all the commits need to be with the DCO sign-off - the git commit -s part - so probably easier if you squash and commit or amend that with -s on that single commit.

@ahaczewski ahaczewski force-pushed the fix/quic-happy-eyeballs branch from ee2c100 to f457a07 Compare June 22, 2026 13:09
@ahaczewski

Copy link
Copy Markdown
Author

@mjansson Done, one commit, signed-off as per DCO.

@mjansson mjansson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

@mjansson mjansson changed the title Fix QUIC fallback across resolved addresses lore-transport: Happy eyeballs strategy for QUIC fallback connections across resolved addresses Jun 22, 2026
Signed-off-by: Andrzej Haczewski <ahaczewski@gmail.com>
@ahaczewski ahaczewski force-pushed the fix/quic-happy-eyeballs branch from f457a07 to c85f9c1 Compare June 23, 2026 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

QUIC connection waits 30 seconds before trying the next resolved address

5 participants