Add MODE 3 implementation (ingestion with Kafka consumer)#25
Conversation
gmunozfe
left a comment
There was a problem hiding this comment.
Two things to fix:
- Reject blank workflow/task identifiers. The current workflow path defaults missing data.name to "", and processors only reject null IDs, so invalid events can create an empty workflow ID.
- Align status mapping with the existing Data Index model. task.started.v1 currently maps to STARTED, while other modes use RUNNING.
05b7844 to
a92af73
Compare
|
|
||
| import java.time.OffsetDateTime; | ||
|
|
||
| public class LifecycleEvent { |
There was a problem hiding this comment.
Don't we have this already in the SDK?
There was a problem hiding this comment.
Yes, we have a lot of classes (WorkflowCompletedCEData, WorkflowStartedCEData, TaskCancelledCEData) that represent each kind of lifecycle event, but this class aims to merge everything in a centralized data transfer object to be parsed.
There was a problem hiding this comment.
Why cannot deserialize to the proper SDK POJO using the cloud event type ?
There is a clear and predicatble mapping between each type and each pojo as defined in the SDK, there is not need to create a union class holding all the structs.
Also, now that the SDK is a hierarchy after the later changes you can add common code to handle the different types generically (WorkflowCEEvent and TaskCEEvent) and just an extension to deal with different date fields names
Signed-off-by: Matheus Cruz <matheuscruz.dev@gmail.com>
There was a problem hiding this comment.
I would use InputStreamReader and BufferedReader#readAllAsString() here.
Signed-off-by: Matheus Cruz <matheuscruz.dev@gmail.com>
This pull request introduces production-ready support and documentation for MODE 3 (Kafka-based ingestion) in the Data Index project, alongside improvements to the documentation build process. The main changes add comprehensive guidance, architecture, and code references for Kafka ingestion, clarify when to use each mode, and update the documentation system to support both Maven and npm workflows.
MODE 3 (Kafka) Support and Documentation:
data-index/data-index-docs/modules/ROOT/pages/architecture/kafka-mode.adoc, covering event pipeline, components, persistence, idempotency, out-of-order handling, error processing, and a comparison with other modes.CLAUDE.mdto reflect MODE 3 as production ready, describe its architecture, clarify when to use it, and list new key components, code structure, and integration tests for Kafka ingestion. [1] [2] [3] [4] [5] [6] [7] [8]Documentation Navigation and Structure:
nav.adoc). [1] [2]nodemon. [1] [2] [3] [4]These changes collectively enable and document the Kafka-based ingestion (MODE 3) as a first-class, production-ready option, and modernize the documentation workflow for contributors.