Skip to content

Data Index ingestion bottleneck in Mode 3 with Kafka and postgresql: consider grouping events #32

Description

@gmunozfe

Description

Testing with the benchmark the following ingestion path:

Quarkus Flow → Kafka flow-lifecycle-out → Data Index ingestion → PostgreSQL

The pipeline is functionally working, but the initial benchmark results show a significant Data Index ingestion bottleneck when consuming raw lifecycle events from Kafka.

It's needed to evaluate whether grouped lifecycle events, similar to SonataFlow Data Index grouping, are needed to make the ingestion path scalable and comparable.

Current status

Mode 3 was validated with a single fork10 request.

A single fork10 request produces the expected Kafka event volume:

flow-lifecycle-out high watermark = 90

This matches the Quarkus Flow raw lifecycle event model:

Image Image

workflow events:

  • 11 workflows × 4 events = 44

task events:

  • 23 tasks × 2 events = 46

total:

  • 44 + 46 = 90 Kafka lifecycle events/request

So the Mode 3 path is functionally correct.

Issue detected

A 10-minute fork10 RATE=40 benchmark was executed with one Kafka partition and one Data Index consumer.

k6 result:

requests: 23,996
rate: ~39.99 req/s
http failures: 0
p95 latency: ~24 ms

Expected Kafka events:

23,996 requests × 90 events/request = 2,159,640 events
Kafka state after the run:

TOPIC               PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
flow-lifecycle-out  0          356,221         2,159,730       1,803,509

This means the consumer group was stable and consuming, but it was far behind the producer.

Approximate ingestion rate:

356,221 consumed events / 600s = ~594 events/sec

Producer rate during the benchmark:

2,159,640 produced events / 600s = ~3,599 events/sec

So the producer was generating events at 6x Data Index Kafka ingestion rate.

Solution

As in sonataflow, try to group events:

fork10:
  90 messages/request → ~1 grouped message/request
  ~90x fewer Kafka records

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

Fields

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions