Data Index ingestion bottleneck in Mode 3 with Kafka and postgresql: consider grouping events

### Description

Testing with the benchmark the following ingestion path:

`Quarkus Flow → Kafka flow-lifecycle-out → Data Index ingestion → PostgreSQL`

The pipeline is functionally working, but the initial benchmark results show a significant Data Index ingestion bottleneck when consuming raw lifecycle events from Kafka.

It's needed to evaluate whether grouped lifecycle events, similar to SonataFlow Data Index grouping, are needed to make the ingestion path scalable and comparable.

### Current status

Mode 3 was validated with a single fork10 request.

A single fork10 request produces the expected Kafka event volume:

`flow-lifecycle-out high watermark = 90`

This matches the Quarkus Flow raw lifecycle event model:

<img width="367" height="123" alt="Image" src="https://github.com/user-attachments/assets/0ad3ecd8-db01-4e3b-8cd7-992727e44759" />

<img width="1523" height="325" alt="Image" src="https://github.com/user-attachments/assets/bfccfeca-fef3-4319-ba29-5114b4825e1c" />

workflow events:
-   11 workflows × 4 events = 44

task events:
-    23 tasks × 2 events = 46

total:
-    44 + 46 = 90 Kafka lifecycle events/request

So the Mode 3 path is functionally correct.

### Issue detected

A 10-minute `fork10 RATE=40 `benchmark was executed with one Kafka partition and one Data Index consumer.

k6 result:

```
requests: 23,996
rate: ~39.99 req/s
http failures: 0
p95 latency: ~24 ms

```
Expected Kafka events:

`23,996 requests × 90 events/request = 2,159,640 events
`
Kafka state after the run:

```
TOPIC               PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
flow-lifecycle-out  0          356,221         2,159,730       1,803,509
```

This means the consumer group was stable and consuming, but it was far behind the producer.

Approximate ingestion rate:

`356,221 consumed events / 600s = ~594 events/sec`

Producer rate during the benchmark:

`2,159,640 produced events / 600s = ~3,599 events/sec`

So the producer was **generating events at 6x  Data Index Kafka ingestion rate**.

### Solution

As in sonataflow, try to group events:

```
fork10:
  90 messages/request → ~1 grouped message/request
  ~90x fewer Kafka records
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Index ingestion bottleneck in Mode 3 with Kafka and postgresql: consider grouping events #32

Description

Current status

Issue detected

Solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Data Index ingestion bottleneck in Mode 3 with Kafka and postgresql: consider grouping events #32

Description

Description

Current status

Issue detected

Solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions