Skip to content

Conversation

@Ankith-Confluent
Copy link
Member

Summary

This PR fixes a segmentation fault that occurs when an on_conf_destroy interceptor callback is registered and rd_kafka_new() fails. The bug caused interceptor structures to be freed twice (double-free), leading to a crash when the application properly called rd_kafka_conf_destroy() after a failed rd_kafka_new().

Motivation

Fixes #4142

When rd_kafka_new() fails due to invalid configuration (e.g., invalid SSL settings), it takes the fail: cleanup path. In this path, the code was calling rd_kafka_interceptors_destroy(&rk->rk_conf), which freed the interceptor structures. However, these structures are shallow-copied pointers from the application's configuration object (app_conf). When the application then properly called rd_kafka_conf_destroy(conf) to clean up, it attempted to access the already-freed interceptor structures, resulting in a double-free segmentation fault.

Root Cause

In src/rdkafka.c, the rd_kafka_new() fail path (around line 2830):

if (app_conf) {
    rd_kafka_assignors_term(rk);
    rd_kafka_interceptors_destroy(&rk->rk_conf);   ← BUG
    memset(&rk->rk_conf, 0, sizeof(rk->rk_conf));
}

The problem is that rk->rk_conf.interceptors contains shallow-copied pointers to the same interceptor structures owned by app_conf. Freeing them here causes a double-free when the user later calls rd_kafka_conf_destroy(app_conf).

Changes

File: src/rdkafka.c

Before:

if (app_conf) {
    rd_kafka_assignors_term(rk);
    rd_kafka_interceptors_destroy(&rk->rk_conf);
    memset(&rk->rk_conf, 0, sizeof(rk->rk_conf));
}

After:

if (app_conf) {
    rd_kafka_assignors_term(rk);
    /* Do NOT destroy interceptors here - they belong to app_conf
     * and will be freed when app_conf is destroyed by the user.
     * rd_kafka_interceptors_destroy(&rk->rk_conf); */
    memset(&rk->rk_conf, 0, sizeof(rk->rk_conf));
}

File: tests/0064-interceptors.c

Added a new regression test do_test_conf_destroy_interceptor_with_failed_new() that:

  1. Registers an on_conf_destroy interceptor callback
  2. Forces rd_kafka_new() to fail with invalid SSL configuration
  3. Calls rd_kafka_conf_destroy() and verifies the callback is called exactly once
  4. Ensures no segfault occurs (the bug would cause a crash here)

Testing

Reproduction of Error

Created bug_4142.c :

#include <assert.h>
#include <stdio.h>
#include "rdkafka.h"

rd_kafka_resp_err_t callback(void *opaque) {
    printf("callback called with %p\n", opaque);
    return RD_KAFKA_RESP_ERR_NO_ERROR;
}

int main(int argc, char **argv) {
    rd_kafka_conf_t *conf = rd_kafka_conf_new();
    
    // Set invalid SSL configuration
    rd_kafka_conf_set(conf, "security.protocol", "ssl", ...);
    rd_kafka_conf_set(conf, "ssl.ca.location", "/dev/null", ...);
    
    // Register on_conf_destroy interceptor
    rd_kafka_conf_interceptor_add_on_conf_destroy(conf, "testing", callback, NULL);
    
    // This will fail due to invalid SSL config
    rd_kafka_t *rk = rd_kafka_new(RD_KAFKA_CONSUMER, conf, errstr, sizeof(errstr));
    assert(rk == NULL);
    
    // BUG: This causes segfault in unfixed version
    rd_kafka_conf_destroy(conf);
    
    return 0;
}

Compilation:

gcc -o bug_4142_test bug_4142.c -I./src ./src/librdkafka.a \
    -lssl -lcrypto -lsasl2 -lz -lcurl -lpthread

Results:

Before fix:

$ ./bug_4142_test
=== Testing with librdkafka v2.12.0 ===
% Failed to create new consumer: ssl.ca.location failed: error:05800088:x509...
malloc: Double free of object 0x148f15d90
malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

After fix:

$ ./bug_4142_test
=== Testing with librdkafka v2.12.0 ===
% Failed to create new consumer: ssl.ca.location failed: error:05800088:x509...
callback called with 0x0
$ echo $?
0

Librdkafka Test Suite

TESTS=0064 ./run-test.sh

…uble-free errors. Updated comments to clarify that interceptors are shallow-copied pointers and should not be destroyed when the application configuration is freed.
…with interceptors

This commit introduces a new test case to verify that the on_conf_destroy callback is called correctly when rd_kafka_new() fails due to invalid SSL configuration. It ensures that interceptors are not destroyed multiple times, addressing a segfault issue in the error handling path.
@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

@Ankith-Confluent Ankith-Confluent marked this pull request as ready for review November 13, 2025 07:25
@Ankith-Confluent Ankith-Confluent requested a review from a team as a code owner November 13, 2025 07:25
Copilot AI review requested due to automatic review settings November 13, 2025 07:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical double-free bug that caused segmentation faults when an on_conf_destroy interceptor callback was registered and rd_kafka_new() failed. The issue occurred because interceptor structures were being freed in the error path of rd_kafka_new(), even though they were shallow-copied pointers that belonged to the application's configuration object.

  • Removed the call to rd_kafka_interceptors_destroy() in the rd_kafka_new() failure path
  • Added a regression test to verify the fix prevents the segfault
  • Added clarifying comments explaining the shallow-copy ownership model

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/rdkafka.c Removed the interceptor destruction call in the error path and added documentation explaining the shallow-copy ownership
tests/0064-interceptors.c Added regression test that registers an on_conf_destroy callback, forces rd_kafka_new() to fail, and verifies proper cleanup without crashes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Ankith-Confluent Ankith-Confluent changed the title FIX: Issue 4142 (segmentation fault that occurs when an on_conf_destroy) FIX: Issue 4142 (segmentation fault that occurs whenon_conf_destroy) Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segfault whenever an on_conf_destroy callback is registered

2 participants