relpsess: skip graceful close for half-open clients#289
relpsess: skip graceful close for half-open clients#289
Conversation
Expose open/close/open-failed callbacks for server sessions. Document callback semantics in librelp headers. Emit close callbacks on engine teardown. Extend test receiver plus add session-callback test harness.
Client teardown currently calls relpSessCltDoDisconnect() for any client session that is not already DISCONNECTED or BROKEN. When relpCltConnect() tears down a session that only made it to INIT_CMD_SENT against a plain TCP peer, that destructor path waits for READY_TO_SEND, converts the already-broken state into a second generic wait-state error, and surfaces destructor-time onErr callbacks to callers such as rsyslog omrelp shutdown. Only READY_TO_SEND and WINDOW_FULL can still send a valid RELP close frame, so gate graceful client disconnect to those states. When relpSessWaitState() already returns RELP_RET_SESSION_BROKEN, mark the session broken without emitting the extra 'error waiting on required session state' callback again. Add a regression that connects the sender to dummyserver.py, confirms relpCltConnect() still fails, and asserts the bogus wait-state error no longer appears. Closes rsyslog/rsyslog#6547 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request introduces session-level callbacks for RELP session lifecycle events, including successful opens, failures, and closures. It adds new API functions to librelp.h, updates the engine to trigger these callbacks, and includes comprehensive tests to verify the new functionality. In src/relpsess.c, the logic for handling session state timeouts and breakages was refined, though feedback suggests further guarding against redundant error reporting when the session is already in a broken state.
| if(iRet == RELP_RET_TIMED_OUT || relpEngineShouldStop(pThis->pEngine)) { | ||
| /* the session is broken! */ | ||
| callOnErr(pThis, (char*) "error waiting on required session state, session broken", | ||
| RELP_RET_SESSION_BROKEN); | ||
| pThis->sessState = eRelpSessState_BROKEN; | ||
| } else if(iRet == RELP_RET_SESSION_BROKEN) { | ||
| pThis->sessState = eRelpSessState_BROKEN; | ||
| } |
There was a problem hiding this comment.
The current logic still allows for redundant callOnErr invocations if relpEngineShouldStop is true but the session was already broken (and reported) by relpSessRcvData. Additionally, it suppresses legitimate error reporting if iRet is RELP_RET_SESSION_BROKEN due to a poll failure (which hasn't been reported yet). A more robust approach is to guard the entire block by checking if the session is already in the BROKEN state.
if(pThis->sessState != eRelpSessState_BROKEN) {
if(iRet == RELP_RET_TIMED_OUT || iRet == RELP_RET_SESSION_BROKEN || relpEngineShouldStop(pThis->pEngine)) {
/* the session is broken! */
callOnErr(pThis, (char*) "error waiting on required session state, session broken",
RELP_RET_SESSION_BROKEN);
pThis->sessState = eRelpSessState_BROKEN;
}
}There was a problem hiding this comment.
3 issues found across 10 files
Confidence score: 3/5
- There is some meaningful regression risk: in
src/relpsess.c,RELP_RET_SESSION_BROKENno longer invokesonErr, so poll/select failures inrelpSessWaitState()can silently break sessions without error reporting. src/copen.cmay emit a spuriousonSessOpenFailafter a duplicateopenon an already-open session, which can confuse failure handling and user-visible state transitions.- The issue in
src/librelp.his documentation-only but important:onSessClosecan run duringrelpEngineDestruct(), not justrelpEngineRun(), so callback thread-context assumptions may be wrong. - Pay close attention to
src/relpsess.c,src/copen.c, andsrc/librelp.h- error callback regression, false open-failure signaling, and incorrect thread-context documentation.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/librelp.h">
<violation number="1" location="src/librelp.h:250">
P2: The `onSessClose` docs promise the wrong thread context: shutdown can invoke this callback from `relpEngineDestruct()`, not only from `relpEngineRun()`.</violation>
</file>
<file name="src/relpsess.c">
<violation number="1" location="src/relpsess.c:629">
P2: `RELP_RET_SESSION_BROKEN` no longer triggers `onErr` here, so poll/select failures in `relpSessWaitState()` now break the session without reporting the error.</violation>
</file>
<file name="src/copen.c">
<violation number="1" location="src/copen.c:174">
P2: Guard `onSessOpenFail` so it only fires for sessions that never reached the open state; otherwise a duplicate `open` request reports a spurious open failure on an already-open session.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| * The callback fires when a session is removed from the engine, regardless | ||
| * of the close cause (protocol close, I/O error, or engine shutdown). The | ||
| * reason parameter contains the relpRetVal that triggered teardown; engine | ||
| * shutdown uses RELP_RET_SESSION_CLOSED. The callback runs on the |
There was a problem hiding this comment.
P2: The onSessClose docs promise the wrong thread context: shutdown can invoke this callback from relpEngineDestruct(), not only from relpEngineRun().
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/librelp.h, line 250:
<comment>The `onSessClose` docs promise the wrong thread context: shutdown can invoke this callback from `relpEngineDestruct()`, not only from `relpEngineRun()`.</comment>
<file context>
@@ -220,6 +220,62 @@ relpRetVal relpEngineSetOnErr(relpEngine_t *pThis,
+ * The callback fires when a session is removed from the engine, regardless
+ * of the close cause (protocol close, I/O error, or engine shutdown). The
+ * reason parameter contains the relpRetVal that triggered teardown; engine
+ * shutdown uses RELP_RET_SESSION_CLOSED. The callback runs on the
+ * relpEngineRun() thread and pSess is valid only during the callback.
+ *
</file context>
| callOnErr(pThis, (char*) "error waiting on required session state, session broken", | ||
| RELP_RET_SESSION_BROKEN); | ||
| pThis->sessState = eRelpSessState_BROKEN; | ||
| } else if(iRet == RELP_RET_SESSION_BROKEN) { |
There was a problem hiding this comment.
P2: RELP_RET_SESSION_BROKEN no longer triggers onErr here, so poll/select failures in relpSessWaitState() now break the session without reporting the error.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/relpsess.c, line 629:
<comment>`RELP_RET_SESSION_BROKEN` no longer triggers `onErr` here, so poll/select failures in `relpSessWaitState()` now break the session without reporting the error.</comment>
<file context>
@@ -613,15 +619,15 @@ relpSessWaitState(relpSess_t *const pThis, const relpSessState_t stateExpected,
callOnErr(pThis, (char*) "error waiting on required session state, session broken",
RELP_RET_SESSION_BROKEN);
pThis->sessState = eRelpSessState_BROKEN;
+ } else if(iRet == RELP_RET_SESSION_BROKEN) {
+ pThis->sessState = eRelpSessState_BROKEN;
}
</file context>
| relpOffersDestruct(&pSrvOffers); | ||
|
|
||
| if(iRet != RELP_RET_OK) { | ||
| if(pSess->pEngine->onSessOpenFail != NULL) { |
There was a problem hiding this comment.
P2: Guard onSessOpenFail so it only fires for sessions that never reached the open state; otherwise a duplicate open request reports a spurious open failure on an already-open session.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/copen.c, line 174:
<comment>Guard `onSessOpenFail` so it only fires for sessions that never reached the open state; otherwise a duplicate `open` request reports a spurious open failure on an already-open session.</comment>
<file context>
@@ -168,6 +171,9 @@ BEGINcommand(S, Init)
relpOffersDestruct(&pSrvOffers);
if(iRet != RELP_RET_OK) {
+ if(pSess->pEngine->onSessOpenFail != NULL) {
+ pSess->pEngine->onSessOpenFail(pSess->pUsr, pSess, iRet);
+ }
</file context>
| if(pSess->pEngine->onSessOpenFail != NULL) { | |
| if(!pSess->bServerConnOpen && pSess->pEngine->onSessOpenFail != NULL) { |
librelp can emit onErr callbacks while tearing down a client session that only reached INIT_CMD_SENT against a plain TCP peer. omrelp used to store wrkrInstanceData_t as the librelp user pointer, so those teardown-time callbacks depended on worker-lifetime state while freeWrkrInstance() was already dismantling the worker. rsyslog already keeps the shared action instance alive through worker cleanup, so pass instanceData to librelp instead and log from that stable callback context. Add null-pointer fallbacks in onErr() and onAuthErr() so callback logging stays safe even if librelp invokes them without a user pointer. Add a shutdown regression built around a dumb TCP listener that accepts the socket and holds it open long enough for omrelp to remain in a partial RELP session. With that test in place, the rsyslog workaround was rebuilt and validated against an unpatched librelp checkout to confirm distros can carry the mitigation before the librelp fix is available. Closes rsyslog#6547 See also rsyslog/librelp#289 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Why
relpCltConnect() can destruct a client session that only reached INIT_CMD_SENT when the remote peer accepts TCP but never completes a RELP handshake. The old destructor always tried the graceful client disconnect path, which waits for READY_TO_SEND and triggers the extra error waiting on required session state callback during teardown. That callback path is what shows up in rsyslog issue #6547 during omrelp shutdown.
Closes rsyslog/rsyslog#6547
Summary by cubic
Skips graceful client disconnect for half‑open sessions and removes duplicate wait‑state errors during teardown. Adds server session lifecycle callbacks to
librelp; resolves rsyslog/rsyslog#6547 seen duringomrelpshutdown.Bug Fixes
New Features
relpEngineSetOnSessOpen,relpEngineSetOnSessClose,relpEngineSetOnSessOpenFail(documented in headers).onSessOpenon successful handshake,onSessOpenFailon handshake failure, andonSessClosewith a reason on any removal.--sessfile; newsession-callbacks.sh,invalid-open.py, andsend-dummyserver-no-waitstate-errmsg.shregressions.Written for commit c8faf19. Summary will update on new commits.