Skip to content

fix: Agent is unaware of forced Server restarts, causing#11539

Open
lzf575 wants to merge 1 commit intomainfrom
check-tcp-timeout
Open

fix: Agent is unaware of forced Server restarts, causing#11539
lzf575 wants to merge 1 commit intomainfrom
check-tcp-timeout

Conversation

@lzf575
Copy link
Copy Markdown
Contributor

@lzf575 lzf575 commented Mar 30, 2026

retransmission timeout latency of ~15 minutes.

This PR is for:

@lzf575
Copy link
Copy Markdown
Contributor Author

lzf575 commented Mar 30, 2026

@claude review下这个pr

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 30, 2026

Claude finished @lzf575's task in 7m 59s —— View job


- Event: PR comment asking for a code review. - Request type: review only; no implementation requested. - Scope: inspect the PR diff against `origin/main`, read the changed Rust file, and report bugs or maintainability issues. - Result: reviewed the added Linux TCP socket option handling in `@agent/src/sender/uniform_sender.rs` and found one blocking issue.

Todo

  • Gather PR context and diff
  • Inspect the changed code for correctness
  • Post review feedback in the PR comment

Review

整体方向没问题:给 sender socket 加 SO_KEEPALIVETCP_USER_TIMEOUT,确实是在 Agent 感知 Server 被强制重启时缩短超时的合理做法。

不过这个 patch 里我看到一个阻塞问题:

  1. Linux 下这里会编译失败
    • @agent/src/sender/uniform_sender.rs:611
    • 新代码调用了 socket2::Socket::from_raw_fd(tcp_stream.as_raw_fd()),但当前文件只引入了 AsRawFd,没有引入 FromRawFdfrom_raw_fd 不是 Socket 的固有方法,而是 std::os::unix::io::FromRawFd trait 提供的关联函数。
    • 仓库里同样的用法在 @agent/src/sender/npb_sender.rs:22 有显式 use std::os::unix::io::FromRawFd;,这里少了这行的话,Linux 编译会报类似 no function or associated item named from_raw_fd found
    • 建议至少补上对应 trait import;更稳一点的话,可以直接改成 socket2::SockRef::from(&*tcp_stream),这样也能避免 raw fd 临时包装的所有权风险。
    • Fix this →

Summary

我这边先看到 1 个需要修掉的编译问题;除此之外,这次通过 keepalive/user-timeout 缩短长时间 retransmission 的思路是合理的。

@lzf575 lzf575 force-pushed the check-tcp-timeout branch from 005012d to 9c7f90c Compare March 30, 2026 09:37
@lzf575
Copy link
Copy Markdown
Contributor Author

lzf575 commented Mar 30, 2026

@claude review下这个pr

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 30, 2026

Claude finished @lzf575's task in 19m 19s —— View job


- Event: PR comment asking for a code review. - Request type: review only; no implementation requested. - Scope: inspect the PR diff against `origin/main`, read the changed Rust file, and report correctness / maintainability issues. - Result: reviewed the current diff in `@agent/src/sender/uniform_sender.rs`; the earlier `from_raw_fd` compile concern is already fixed in the current version.

Todo

  • Gather PR context and diff
  • Inspect the changed code for correctness
  • Post review feedback

Review

我重新看了当前这版相对 origin/main 的 diff。

这次改动在 @agent/src/sender/uniform_sender.rs:605 之后给新建的 TCP 连接补了 keepalive,在 @agent/src/sender/uniform_sender.rs:627 设置了 TCP_USER_TIMEOUT,方向上和 PR 要解决的“Server 被强制重启后,Agent 长时间卡在 TCP 重传”是对齐的。

当前版本里,我没有再看到阻塞性的正确性问题。之前那种 from_raw_fd 的编译风险已经通过 socket2::SockRef::from(&*tcp_stream) 这个写法规避掉了。

有一个非阻塞的小清理点:

  • @agent/src/sender/uniform_sender.rs:634std::mem::forget(sock)socket2::SockRef 来说不是必须的。SockRef 是 borrowed socket view,不拥有 TcpStream 的 fd,去掉这行会让所有权语义更清晰。
  • Fix this →

Notes

这次是静态 review;我在这个 job 里没有跑 cargo check,因为当前权限不允许执行构建命令。

Summary

当前版本我没看到 blocking issue,整体实现是合理的;如果要继续打磨,优先把 SockRef 那行 mem::forget 去掉即可。

retransmission timeout latency of ~15 minutes.
@lzf575 lzf575 force-pushed the check-tcp-timeout branch from 9c7f90c to adbf9ef Compare March 30, 2026 10:29
@lzf575
Copy link
Copy Markdown
Contributor Author

lzf575 commented Mar 30, 2026

@claude review下这个pr

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 30, 2026

Claude finished @lzf575's task in 9m 38s —— View job


- Event: PR comment on an open pull request asking for a code review. - Request type: review only; no implementation requested. - Key context: PR #11539 on branch `check-tcp-timeout`, compare against `origin/main`. The change is in `@agent/src/sender/uniform_sender.rs`. - Main tasks: inspect the changed Rust code in context and report any correctness / maintainability issues with precise file references. - Challenge: I could do a static review from the checked-out file, but I could not run `git diff` / `cargo check` in this job because Bash execution is not currently allowed. - Plan executed: read the updated Rust code around the new socket option setup, check the surrounding reconnect / send path, and summarize review findings below.

Todo

  • Review PR context and diff
  • Inspect changed Rust code for correctness
  • Post review feedback in the PR comment

Review

我重新看了当前这版在 @agent/src/sender/uniform_sender.rs:593-634 的实现。

这次是在建连成功后,对 TcpStream 增加:

  • TCP keepalive@agent/src/sender/uniform_sender.rs:614-620
  • TCP_USER_TIMEOUT@agent/src/sender/uniform_sender.rs:626-633

整体思路和 PR 目标是一致的:让 Agent 在 Server 被强制重启、连接进入长时间重传时,更快感知并回收坏连接。

当前版本里,我没有看到 blocking 的正确性问题socket2::SockRef::from(&*tcp_stream) 的用法也比之前 raw-fd 包装更安全,避免了 fd 所有权误解。

Notes

一个非阻塞的小点:

  • @agent/src/sender/uniform_sender.rs:30
    • use socket2::TcpKeepalive; 现在是无条件导入,但实际使用被放在 #[cfg(target_os = "linux")] 代码块里(@agent/src/sender/uniform_sender.rs:605)。
    • 如果你们在非 Linux target 上也做编译检查,这里最好也加同样的 cfg,避免无意义的 unused import warning。
    • 这不是功能性 bug,只是个小清理项。

Validation

这次是静态 review。由于当前 job 不允许执行 Bash,我没有跑 git diff origin/main...HEADcargo check;如果你希望我补做构建级确认,可以在后续 job 里开放这些工具权限。

Summary

当前版本我没有看到需要阻塞合并的问题;实现方向和具体落点都比较合理。若要继续打磨,优先把 TcpKeepalive import 做成 Linux 条件导入即可。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant