Skip to content

Add MODE 3 implementation (ingestion with Kafka consumer)#25

Merged
ricardozanini merged 2 commits into
kubesmarts:mainfrom
mcruzdev:issue-23
Jun 9, 2026
Merged

Add MODE 3 implementation (ingestion with Kafka consumer)#25
ricardozanini merged 2 commits into
kubesmarts:mainfrom
mcruzdev:issue-23

Conversation

@mcruzdev

@mcruzdev mcruzdev commented May 26, 2026

Copy link
Copy Markdown
Contributor

This pull request introduces production-ready support and documentation for MODE 3 (Kafka-based ingestion) in the Data Index project, alongside improvements to the documentation build process. The main changes add comprehensive guidance, architecture, and code references for Kafka ingestion, clarify when to use each mode, and update the documentation system to support both Maven and npm workflows.

MODE 3 (Kafka) Support and Documentation:

  • Added a detailed architecture document for MODE 3 (Kafka ingestion) at data-index/data-index-docs/modules/ROOT/pages/architecture/kafka-mode.adoc, covering event pipeline, components, persistence, idempotency, out-of-order handling, error processing, and a comparison with other modes.
  • Updated CLAUDE.md to reflect MODE 3 as production ready, describe its architecture, clarify when to use it, and list new key components, code structure, and integration tests for Kafka ingestion. [1] [2] [3] [4] [5] [6] [7] [8]
  • Added references to new code directories and documentation for MODE 3, including ingestion processor/service modules and their READMEs. [1] [2] [3] [4]

Documentation Navigation and Structure:

  • Added navigation links for Kafka production deployment and architecture in the documentation sidebar (nav.adoc). [1] [2]
  • Updated documentation structure and instructions to support both Maven and npm workflows, including new commands for building and serving docs, and enabling auto-rebuild with nodemon. [1] [2] [3] [4]

These changes collectively enable and document the Kafka-based ingestion (MODE 3) as a first-class, production-ready option, and modernize the documentation workflow for contributors.

@mcruzdev mcruzdev marked this pull request as ready for review May 26, 2026 20:10
@ricardozanini ricardozanini requested a review from gmunozfe May 27, 2026 00:43
Comment thread data-index/workflow-test-app/pom.xml Outdated

@gmunozfe gmunozfe left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things to fix:

  • Reject blank workflow/task identifiers. The current workflow path defaults missing data.name to "", and processors only reject null IDs, so invalid events can create an empty workflow ID.
  • Align status mapping with the existing Data Index model. task.started.v1 currently maps to STARTED, while other modes use RUNNING.

@mcruzdev mcruzdev marked this pull request as draft May 28, 2026 03:31
@mcruzdev mcruzdev force-pushed the issue-23 branch 4 times, most recently from 05b7844 to a92af73 Compare May 29, 2026 05:49
@mcruzdev mcruzdev marked this pull request as ready for review May 29, 2026 15:30

import java.time.OffsetDateTime;

public class LifecycleEvent {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have this already in the SDK?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have a lot of classes (WorkflowCompletedCEData, WorkflowStartedCEData, TaskCancelledCEData) that represent each kind of lifecycle event, but this class aims to merge everything in a centralized data transfer object to be parsed.

@fjtirado fjtirado Jun 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cannot deserialize to the proper SDK POJO using the cloud event type ?
There is a clear and predicatble mapping between each type and each pojo as defined in the SDK, there is not need to create a union class holding all the structs.
Also, now that the SDK is a hierarchy after the later changes you can add common code to handle the different types generically (WorkflowCEEvent and TaskCEEvent) and just an extension to deal with different date fields names

@gmunozfe gmunozfe self-requested a review May 31, 2026 22:08
Signed-off-by: Matheus Cruz <matheuscruz.dev@gmail.com>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use InputStreamReader and BufferedReader#readAllAsString() here.

Signed-off-by: Matheus Cruz <matheuscruz.dev@gmail.com>

@gmunozfe gmunozfe left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, great work @mcruzdev
Probably after analyzing performance we can try some kind of grouping like in sonataflow

@ricardozanini

Copy link
Copy Markdown
Contributor

Looks good to me, great work @mcruzdev Probably after analyzing performance we can try some kind of grouping like in sonataflow

@gmunozfe, care to open an issue in Quarkus Flow to track this, please?

@ricardozanini ricardozanini merged commit a6d8622 into kubesmarts:main Jun 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants