diff --git a/CLAUDE.md b/CLAUDE.md index fd78722c0..ee34b7aae 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,8 +1,8 @@ # Claude AI Assistant Guidelines - KubeSmarts Logic Apps **Project:** Data Index v1.0.0 for Serverless Workflow 1.0.0 -**Status:** Production Ready (MODE 1 & MODE 2) -**Last Updated:** 2026-04-29 +**Status:** Production Ready (MODE 1, MODE 2 & MODE 3) +**Last Updated:** 2026-05-29 --- @@ -160,15 +160,53 @@ curl http://localhost:8080/q/metrics | grep data_index_transform - Auto-scaling storage needed - Multi-tenancy requirements -**Both modes share:** -- Identical GraphQL API +**Use MODE 3 (Kafka) when:** +- Kafka infrastructure already exists +- Security requirements (no log files) +- Direct event stream processing +- Need encrypted transport (SSL/SASL_SSL) + +**All modes share:** +- Same normalized PostgreSQL tables (workflow_instances, task_instances) - Same domain model -- FluentBit ingestion - Idempotent event processing - No Event Processor service --- +## Architecture (MODE 3 - Kafka) + +``` +Quarkus Flow → Kafka (CloudEvents, topic: flow-lifecycle-out) + ↓ (SmallRye Reactive Messaging) + KafkaLifecycleConsumer (event type routing) + ↓ (Mapper → WorkflowInstanceEvent / TaskExecutionEvent) + WorkflowEventProcessor / TaskExecutionProcessor + ↓ + WorkflowPersistence / TaskPersistence (JDBC UPSERT) + ↓ + PostgreSQL normalized tables + ↓ (JPA/Hibernate) + GraphQL API (SmallRye GraphQL) + + (failed records → dead-letter topic: data-index-events-dlq) +``` + +**Key Components:** +- **KafkaLifecycleConsumer** - Consumes CloudEvents (`io.cloudevents.CloudEvent`), validates them, and routes by event type prefix via `LifecycleEventUtils.isWorkflow()` / `isTask()` +- **Mapper** - Maps a `CloudEvent` + `LifecycleEvent` payload into a `WorkflowInstanceEvent` or `TaskExecutionEvent` +- **EventProcessor** - Generic processing interface; implemented by `WorkflowEventProcessor` and `TaskExecutionProcessor` +- **WorkflowPersistence** - UPSERT to workflow_instances with field-level idempotency +- **TaskPersistence** - UPSERT to task_instances with FK violation recovery (savepoint + placeholder workflow), ON CONFLICT `(instance_id, task_position)` +- **Dead-letter queue** - Records that fail processing throw `ProcessEventFailedException` and are routed to the `data-index-events-dlq` topic + +**NOT used in MODE 3:** +- ❌ FluentBit (events come from Kafka, not log files) +- ❌ PostgreSQL triggers (normalization done in Java via JDBC) +- ❌ Raw event tables (writes directly to normalized tables) + +--- + ## Code Structure ``` @@ -197,6 +235,9 @@ data-index/ │ │ └── GraphQLConfiguration.java │ └── service/ # JAX-RS resources │ └── RootResource.java # Landing page +├── data-index-ingestion/ # MODE 3 Kafka ingestion +│ ├── data-index-ingestion-kafka-processor/ # Normalizers (JDBC UPSERT) +│ └── data-index-ingestion-kafka-service/ # Quarkus Kafka consumer service ├── data-index-integration-tests/ # E2E tests (MODE 1 & MODE 2) │ ├── WorkflowInstanceGraphQLApiTest.java (PostgreSQL) │ └── WorkflowInstanceElasticsearchTest.java (Elasticsearch) @@ -1005,7 +1046,7 @@ curl http://localhost:9200/_transform/workflow-instances-transform/_stats - Don't add Event Processor service (MODE 1 uses triggers, MODE 2 uses transforms) - Don't use polling architecture - Don't create staging tables (MODE 1) or separate processing indices (MODE 2) -- Don't add Kafka (MODE 3 not implemented) +- Don't mix MODE 3 Kafka ingestion with MODE 1 FluentBit ingestion in the same deployment - Don't mix PostgreSQL and Elasticsearch in same deployment ### ❌ Dependencies @@ -1109,9 +1150,10 @@ curl http://localhost:9200/_transform/workflow-instances-transform/_stats ## Key Files Reference **Architecture & Documentation:** -- `data-index/docs/ARCHITECTURE-SUMMARY.md` - All deployment modes - `data-index/docs/deployment/MODE1_HANDOFF.md` - MODE 1 (PostgreSQL) details - `data-index/docs/deployment/MODE2_HANDOFF.md` - MODE 2 (Elasticsearch) details +- `data-index/data-index-ingestion/README.md` - MODE 3 (Kafka) overview +- `data-index/data-index-ingestion/data-index-ingestion-kafka-service/README.md` - MODE 3 (Kafka) service details - `data-index/docs/elasticsearch/TRANSFORM_OPTIMIZATION.md` - Transform optimization & metrics guide **Code (Common):** @@ -1130,6 +1172,10 @@ curl http://localhost:9200/_transform/workflow-instances-transform/_stats - `data-index-elasticsearch-schema/src/main/java/.../` - Schema initializer - `data-index-elasticsearch-schema/src/main/resources/schema/` - ILM, templates, transforms +**Code (MODE 3 - Kafka):** +- `data-index-ingestion/data-index-ingestion-kafka-processor/` - `EventProcessor`, `WorkflowEventProcessor`, `TaskExecutionProcessor`, `persistence/WorkflowPersistence`, `persistence/TaskPersistence`, `data/WorkflowInstanceEvent`, `data/TaskExecutionEvent`, `util/LifecycleEventUtils`, `ProcessEventFailedException` +- `data-index-ingestion/data-index-ingestion-kafka-service/` - `KafkaLifecycleConsumer`, `Mapper`, `LifecycleEvent`, `HealthChecks`, `RootResource` + **Configuration:** - `data-index-service/data-index-service-elasticsearch/src/main/resources/application.properties` - Elasticsearch config (metrics, ILM, smart filtering) - `data-index/scripts/fluentbit/elasticsearch/fluent-bit.conf` - MODE 2 FluentBit (Elasticsearch) @@ -1140,6 +1186,9 @@ curl http://localhost:9200/_transform/workflow-instances-transform/_stats - `data-index-storage/data-index-storage-elasticsearch/src/test/java/.../ElasticsearchWorkflowInstanceStorageIT.java` - Elasticsearch storage tests - `data-index-storage/data-index-storage-elasticsearch/src/test/java/.../ElasticsearchTransformMetricsIT.java` - Transform metrics tests - `data-index-storage/data-index-storage-elasticsearch/src/test/java/.../ElasticsearchTransformPerformanceBenchmarkIT.java` - Performance benchmarks +- `data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/.../KafkaIngestionITest.java` - Kafka ingestion integration tests +- `data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/.../BaseWorkflowLifecycleITest.java` - Shared base for lifecycle integration tests +- `data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/.../{Cancelled,Faulted,Suspended}WorkflowITest.java` - Lifecycle-specific integration tests **Build:** - `pom.xml` (root) - Generic dependencies, plugin versions @@ -1183,9 +1232,6 @@ curl http://localhost:9200/_transform/workflow-instances-transform/_stats 5. Add Elasticsearch aggregations API 6. Add full-text search capabilities -**Not Planned:** -- MODE 3 (Kafka) - design documented, not implemented - --- ## Questions? Check These First diff --git a/data-index/data-index-docs/README.md b/data-index/data-index-docs/README.md index 8da2f667f..32d2b6130 100644 --- a/data-index/data-index-docs/README.md +++ b/data-index/data-index-docs/README.md @@ -18,6 +18,13 @@ cd data-index-docs mvn clean package ``` +You can also build directly with npm: + +```bash +cd data-index/data-index-docs +npm run build +``` + ## Output **Generated HTML:** `target/generated-docs/` @@ -52,9 +59,8 @@ open target/generated-docs/index.html Or serve with a local web server: ```bash -cd target/generated-docs -python3 -m http.server 8000 -# Open http://localhost:8000 +npm run serve +# Open http://localhost:8080 ``` ## Documentation Structure @@ -91,8 +97,8 @@ modules/ROOT/ 1. Edit AsciiDoc files in `modules/ROOT/pages/` 2. Update navigation in `modules/ROOT/nav.adoc` -3. Rebuild: `mvn clean package` -4. View changes: Open `target/generated-docs/index.html` +3. Rebuild with `npm run build` or run `npm run dev` for auto-rebuild +4. View changes at `http://localhost:8080` (dev) or `target/generated-docs/index.html` ## Antora Configuration @@ -106,6 +112,7 @@ modules/ROOT/ - NPM packages: - `@antora/cli` - Antora command-line interface - `@antora/site-generator` - Site generator + - `nodemon` - Auto rebuild/restart during docs development These are automatically installed during the build. diff --git a/data-index/data-index-docs/modules/ROOT/nav.adoc b/data-index/data-index-docs/modules/ROOT/nav.adoc index 71932af24..12076ab8c 100644 --- a/data-index/data-index-docs/modules/ROOT/nav.adoc +++ b/data-index/data-index-docs/modules/ROOT/nav.adoc @@ -6,6 +6,7 @@ ** xref:deployment/kind-local.adoc[Local Development (KIND)] ** xref:deployment/postgresql.adoc[PostgreSQL Production] ** xref:deployment/elasticsearch.adoc[Elasticsearch Production] +** xref:deployment/kafka.adoc[Kafka Production] ** xref:deployment/fluentbit-config.adoc[FluentBit Configuration] * Operations @@ -24,3 +25,4 @@ ** xref:architecture/overview.adoc[How Data Index Works] ** xref:architecture/postgresql-mode.adoc[PostgreSQL Mode] ** xref:architecture/elasticsearch-mode.adoc[Elasticsearch Mode] +** xref:architecture/kafka-mode.adoc[Kafka Mode] diff --git a/data-index/data-index-docs/modules/ROOT/pages/architecture/kafka-mode.adoc b/data-index/data-index-docs/modules/ROOT/pages/architecture/kafka-mode.adoc new file mode 100644 index 000000000..355c78f32 --- /dev/null +++ b/data-index/data-index-docs/modules/ROOT/pages/architecture/kafka-mode.adoc @@ -0,0 +1,318 @@ += Kafka Mode Architecture (MODE 3) + +**Status:** Production Ready + +== Overview + +MODE 3 is a Kafka-based event ingestion service that provides an alternative to log-file ingestion. + +Event pipeline: +[source] +---- +Quarkus Flow + ↓ (publishes CloudEvents to Kafka) +Kafka topic: flow-lifecycle-out + ↓ (SmallRye Reactive Messaging) +KafkaLifecycleConsumer + ↓ (event validation + routing) +WorkflowEventProcessor / TaskExecutionProcessor + ↓ (JDBC UPSERT with field-level idempotency) +PostgreSQL normalized tables + ↓ (JPA/Hibernate) +Data Index GraphQL API + +(failed records → data-index-events-dlq topic) +---- + +== Components + +=== KafkaLifecycleConsumer + +Listens to the `flow-lifecycle-out` topic and: + +* Validates incoming CloudEvents (specversion, type, time) +* Routes to processors based on event type prefix: + ** `io.serverlessworkflow.workflow.*` → WorkflowEventProcessor + ** `io.serverlessworkflow.task.*` → TaskExecutionProcessor +* Throws `ProcessEventFailedException` on errors (triggers DLQ) + +=== Event Processors + +==== WorkflowEventProcessor + +Normalizes workflow lifecycle events: + +* `workflow.started` → INSERT/UPDATE workflow_instances with status=RUNNING +* `workflow.completed` → UPDATE status=COMPLETED, set end time and output +* `workflow.faulted` → UPDATE status=FAULTED, set error fields +* `workflow.suspended` → UPDATE status=SUSPENDED +* `workflow.cancelled` → UPDATE status=CANCELLED + +Uses field-level idempotency (see Idempotency section below). + +==== TaskExecutionProcessor + +Normalizes task lifecycle events: + +* `task.started` → INSERT/UPDATE task_instances with status=RUNNING +* `task.completed` → UPDATE status=COMPLETED, set end time and output +* `task.faulted` → UPDATE status=FAULTED, set error fields +* `task.suspended` → UPDATE status=SUSPENDED +* `task.cancelled` → UPDATE status=CANCELLED + +Also handles out-of-order recovery (see Out-of-Order Handling below). + +=== Persistence Layer + +==== WorkflowPersistence + +JDBC-based UPSERT for workflow_instances: + +[source,sql] +---- +INSERT INTO workflow_instances ( + id, namespace, name, version, status, start, "end", last_update, + input, output, error_type, error_title, error_detail, error_status, error_instance, + last_event_time, created_at, updated_at +) VALUES (...) +ON CONFLICT (id) DO UPDATE SET + status = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN EXCLUDED.status + ELSE workflow_instances.status + END, + ... +---- + +==== TaskPersistence + +Similar UPSERT for task_instances, but with FK recovery: + +[source,sql] +---- +INSERT INTO task_instances ( + task_execution_id, instance_id, task_name, task_position, status, + start, "end", input, output, error fields, last_event_time, ... +) VALUES (...) +ON CONFLICT (instance_id, task_position) DO UPDATE SET + ... +---- + +Key design decisions: + +* **Composite key**: `(instance_id, task_position)` uniquely identifies a task + ** Handles Quarkus Flow's changing `taskExecutionId` per event + ** `task_position` is stable across task lifecycle +* **FK Recovery**: Savepoint-based retry if parent workflow doesn't exist + ** Creates placeholder workflow on first attempt failure + ** Retries task insert with placeholder in place + ** Later workflow.started event updates the placeholder + +== Event Format + +=== CloudEvent (v1.0) + +[source,json] +---- +{ + "specversion": "1.0", + "type": "io.serverlessworkflow.workflow.started.v1", + "source": "/workflow/executions/01KSGKY66DMS0KPPMFMMR3BJZX", + "id": "event-123", + "time": "2026-05-25T22:40:10.676900Z", + "datacontenttype": "application/json", + "data": { + "instanceId": "01KSGKY66DMS0KPPMFMMR3BJZX", + "workflowName": "order-processing", + "workflowNamespace": "org.acme", + "workflowVersion": "1.0.0", + "status": "RUNNING", + "startTime": "2026-05-25T19:40:10.676802-03:00", + "lastUpdateTime": "2026-05-25T19:40:10.676802-03:00", + "input": { "orderId": "ORD-789" } + } +} +---- + +=== Timestamp Handling + +All timestamp fields are automatically converted to UTC OffsetDateTime (TIMESTAMP WITH TIME ZONE). + +Accepted formats: + +* ISO-8601 with offset: `2026-05-25T19:40:10.676802-03:00` (recommended) +* ISO-8601 UTC: `2026-05-25T22:40:10.676900Z` +* Unix epoch seconds: `1747486200` + +== Idempotency Guarantees + +MODE 3 implements field-level idempotency to handle out-of-order and duplicate events. + +=== Immutable Fields (First Value Wins) + +Once set, never updated: + +**Workflow:** + +* start, input, name, version, namespace + +**Task:** + +* start, input, task_name, task_position, task_execution_id + +Example: If `workflow.started` sets `start = 10:00`, later events cannot change it. + +=== Terminal Fields (Last Non-Null Wins) + +Updated only if incoming event is newer (based on `last_event_time`): + +**Workflow:** + +* end, output, last_update + +**Task:** + +* end, output + +**Both:** + +* error_type, error_title, error_detail, error_status, error_instance + +Example: If `workflow.completed` arrives at 10:05 with `end = 10:05`, later events at 10:01 don't override the end time. + +=== Status Field + +Updated based on timestamp and precedence: + +* Terminal states override less-terminal: COMPLETED/FAULTED/CANCELLED > RUNNING > CREATED +* If incoming event is newer: status is updated +* If incoming event is older: status is preserved + +Example: + +[source] +---- +t=10:00: workflow.started + → status = RUNNING, last_event_time = 10:00 + +t=10:05: workflow.completed + → status = COMPLETED, end = 10:05, last_event_time = 10:05 + +t=10:01: workflow.completed (OUT OF ORDER) + → 10:01 < 10:05, so status stays COMPLETED, end not overwritten +---- + +== Out-of-Order Event Handling + +MODE 3 guarantees task events aren't lost even if they arrive before the parent workflow. + +=== FK Recovery Flow + +. **Task event consumed**: Instance for workflow not yet in database +. **Initial INSERT fails**: Foreign key constraint violation (SQL state 23503) +. **Savepoint rolled back**: Transaction restored to known point +. **Placeholder workflow created**: Minimal row inserted + ** `id = task.instanceId` + ** `created_at = NOW()` + ** `last_event_time = task.eventTimestamp` +. **Task INSERT retried**: Now succeeds (FK satisfied) +. **Workflow event arrives later**: Updates placeholder with full data + +=== Example + +[source] +---- +Queue: [task.started(wf-1), task.completed(wf-1), workflow.started(wf-1)] + +Processing: + +1. task.started(wf-1) + → INSERT INTO task_instances ... [FK fails, wf-1 doesn't exist] + → Insert placeholder: workflow_instances (id='wf-1', created_at=NOW(), ...) + → Retry task insert → OK + +2. task.completed(wf-1) + → UPDATE task_instances ... [OK, wf-1 exists] + +3. workflow.started(wf-1) + → UPDATE workflow_instances SET namespace='...', name='...', ... [OK, replaces placeholder] + +Result: + workflow_instances: 1 row (placeholder replaced with real data) + task_instances: 2 rows (both tasks for wf-1) +---- + +== Error Handling + +=== Failed Event Processing + +When an event cannot be processed (deserialization error, DB constraint violation, etc.): + +. Exception thrown: `ProcessEventFailedException` wraps the underlying error +. Dead-letter queue: Record automatically sent to `data-index-events-dlq` topic +. Consumer continues: Next message is processed immediately (fail-fast disabled) +. Monitoring: Check DLQ topic to inspect and replay failed events + +Failure reasons: +* Malformed CloudEvent (missing type, time, or data) +* Network errors +* Serialization/deserialization errors + +=== Monitoring & Recovery + +Monitor the DLQ: +[source,bash] +---- +kafka-console-consumer.sh \ + --bootstrap-server kafka:9092 \ + --topic data-index-events-dlq \ + --from-beginning +---- + +Replay failed events: +. Fix the root cause (fix event publisher, upgrade service, resolve DB issues) +. Copy failed event from DLQ back to main topic +. Service automatically reprocesses + +== Comparison: MODE 1 vs MODE 2 vs MODE 3 + +[cols="1,1,1,1"] +|=== +| Aspect | MODE 1 (PostgreSQL) | MODE 2 (Elasticsearch) | MODE 3 (Kafka) + +| Event source | Log files | Log files | Kafka topics +| Ingestion layer | FluentBit | FluentBit | SmallRye Messaging +| Normalization | SQL triggers | ES transforms | Java JDBC +| Raw storage | PostgreSQL tables | ES indices | None (direct) +| Normalized storage | PostgreSQL | Elasticsearch | PostgreSQL +| Query API | GraphQL on PostgreSQL | GraphQL on Elasticsearch | GraphQL on PostgreSQL +| Latency | ~10ms | ~1000ms | - +| Idempotency | SQL COALESCE | Painless script | SQL CASE/COALESCE +| DLQ support | N/A | N/A | Yes +| Security | File-based (log files) | File-based (log files) | Kafka (SSL/SASL) +| Query capabilities | Standard SQL | Full-text, aggregations | Standard SQL +|=== + +**Choose MODE 3 when:** + +* Kafka already deployed +* Security concern: avoid log files +* Need encrypted Kafka transport +* Prefer stream-based ingestion + +**Choose MODE 1 when:** +* Simplest setup (triggers are atomic) +* Low latency critical (~10ms) +* No Kafka infrastructure +* Log-based ingestion acceptable + +**Choose MODE 2 when:** +* Need full-text search +* Complex aggregations required +* Large scale (1M+ workflows) +* Multi-tenancy needed + +== References + +* xref:deployment/kafka.adoc[Kafka Deployment Guide] +* xref:../developers/configuration.adoc[Configuration Reference] diff --git a/data-index/data-index-docs/modules/ROOT/pages/deployment/kafka.adoc b/data-index/data-index-docs/modules/ROOT/pages/deployment/kafka.adoc new file mode 100644 index 000000000..0250076fc --- /dev/null +++ b/data-index/data-index-docs/modules/ROOT/pages/deployment/kafka.adoc @@ -0,0 +1,297 @@ += Kafka Deployment (MODE 3) + +**Status:** Preview + +== Overview + +MODE 3 is a Kafka-based event ingestion service that provides an alternative to xref:deployment/postgresql.adoc[MODE 1] and xref:deployment/elasticsearch.adoc[MODE 2]. + +Use MODE 3 when: + +* Kafka infrastructure already exists in your environment +* Security requirements demand events not be written to disk (credit cards, PII, etc.) +* You need encrypted transport (SSL/SASL_SSL) +* Direct stream processing is preferred over log-based ingestion +* You want to leverage Kafka's at-least-once delivery guarantees + +== Event Pipeline + +[source] +---- +Quarkus Flow + → Kafka (CloudEvents, topic: flow-lifecycle-out) + → Data Index Ingestion Service + → PostgreSQL (workflow_instances, task_instances) + → Data Index GraphQL API + +(failed records → data-index-events-dlq) +---- + +== Kafka Topic Configuration + +=== Required Topics + +* `flow-lifecycle-out` - Main event topic (published by Quarkus Flow applications) +* `data-index-events-dlq` - Dead-letter queue for failed records + +[NOTE] +==== +You can change topic names via configuration through environment variables: + +* `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_TOPIC` (default is `flow-lifecycle-out`) +* `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_DEAD_LETTER_QUEUE_TOPIC` (default is `data-index-events-dlq`) +==== + +=== Creating Topics + +In non-production environments, topics are typically auto-created. +In production, create them explicitly in accordance with your Kafka cluster management practices. + +[source,bash] +---- +# Main topic (replicas=3, partitions=3) +kafka-topics.sh --create \ + --bootstrap-server kafka.kafka.svc.cluster.local:9092 \ + --topic flow-lifecycle-out \ + --replication-factor 3 \ + --partitions 3 \ + --config retention.ms=86400000 \ + --config min.insync.replicas=2 + +# DLQ topic +kafka-topics.sh --create \ + --bootstrap-server kafka.kafka.svc.cluster.local:9092 \ + --topic data-index-events-dlq \ + --replication-factor 3 \ + --partitions 1 \ + --config retention.ms=604800000 +---- + +== Kubernetes Deployment + +=== Basic Manifest + +[source,yaml] +---- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: data-index-ingestion-kafka + namespace: data-index +spec: + replicas: 1 + selector: + matchLabels: + app: data-index-ingestion-kafka + template: + metadata: + labels: + app: data-index-ingestion-kafka + spec: + containers: + - name: kafka-ingestion + image: kubesmarts/data-index-ingestion-kafka-service:999-SNAPSHOT + ports: + - containerPort: 8080 + name: http + env: + - name: KAFKA_BOOTSTRAP_SERVERS + value: "kafka.kafka.svc.cluster.local:9092" + - name: QUARKUS_DATASOURCE_JDBC_URL + value: "jdbc:postgresql://postgresql:5432/data-index" + - name: QUARKUS_DATASOURCE_USERNAME + valueFrom: + secretKeyRef: + name: database-credentials + key: username + - name: QUARKUS_DATASOURCE_PASSWORD + valueFrom: + secretKeyRef: + name: database-credentials + key: password + livenessProbe: + httpGet: + path: /q/health/live + port: 8080 + initialDelaySeconds: 30 + periodSeconds: 10 + readinessProbe: + httpGet: + path: /q/health/ready + port: 8080 + initialDelaySeconds: 10 + periodSeconds: 5 + resources: + requests: + cpu: 500m + memory: 512Mi + limits: + cpu: 2000m + memory: 2Gi +--- +apiVersion: v1 +kind: Service +metadata: + name: data-index-ingestion-kafka + namespace: data-index +spec: + selector: + app: data-index-ingestion-kafka + ports: + - port: 8080 + targetPort: 8080 + name: http +---- + +=== Configuration + +==== Required Environment Variables + +[cols="1,1,2"] +|=== +| Variable | Default | Description + +| `KAFKA_BOOTSTRAP_SERVERS` | localhost:29092 | Kafka broker URLs (comma-separated) +| `QUARKUS_DATASOURCE_JDBC_URL` | jdbc:h2:mem:test | PostgreSQL JDBC connection string +| `QUARKUS_DATASOURCE_USERNAME` | (dev services) | Database username +| `QUARKUS_DATASOURCE_PASSWORD` | (dev services) | Database password +|=== + +==== Optional Configuration + +[cols="1,1,2"] +|=== +| Variable | Default | Description + +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_TOPIC` | `flow-lifecycle-out` | Kafka topic name +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_GROUP_ID` | `data-index-ingestion` | Consumer group +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_HEALTH_ENABLED` | `true` | Enable channel health checks +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_HEALTH_READINESS_ENABLED` | `true` | Include channel in readiness checks +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_DEAD_LETTER_QUEUE_TOPIC` | `data-index-events-dlq` | DLQ topic name +|=== + +== Monitoring + +=== Health Checks + +[source,bash] +---- +# Liveness (service is running) +curl http://localhost:8080/q/health/live + +# Readiness (ready to consume events) +curl http://localhost:8080/q/health/ready + +# Full health +curl http://localhost:8080/q/health + +# Metrics (Prometheus format) +curl http://localhost:8080/q/metrics +---- + +=== Kubernetes Monitoring + +[source,bash] +---- +# Follow logs +kubectl logs -f deployment/data-index-ingestion-kafka -n data-index + +# Inspect DLQ messages +kafka-console-consumer.sh \ + --bootstrap-server kafka.kafka.svc.cluster.local:9092 \ + --topic data-index-events-dlq \ + --from-beginning \ + --max-messages 10 +---- + +== Event Processing + +=== Field-Level Idempotency + +MODE 3 guarantees idempotency for out-of-order and duplicate events: + +**Immutable fields** (first value wins):: +* start, input, name, version, namespace +* Never updated after initial insertion + +**Terminal fields** (last non-null wins):: +* end, output, error fields +* Updated only if incoming event timestamp is newer + +**Status precedence**:: +* COMPLETED, FAULTED, CANCELLED > RUNNING > CREATED +* Terminal states override less-terminal states + +=== Out-of-Order Recovery + +If a task event arrives before the parent workflow: + +. Task event consumed → INSERT fails (foreign key constraint) +. Savepoint rolled back +. Placeholder workflow created with minimal data +. Task event retried → INSERT succeeds +. Workflow event arrives later → updates placeholder with full data + +This ensures no task events are lost due to event ordering. + +== Troubleshooting + +=== Service won't start + +Check logs: +[source,bash] +---- +kubectl logs deployment/data-index-ingestion-kafka -n data-index +---- + +Common causes: +* PostgreSQL unreachable → verify `QUARKUS_DATASOURCE_JDBC_URL` +* Kafka unreachable → verify `KAFKA_BOOTSTRAP_SERVERS` +* Database schema missing → run Flyway migrations + +=== Events not consumed + +Check readiness: +[source,bash] +---- +kubectl get pods -n data-index | grep data-index-ingestion-kafka + +# Check logs +kubectl logs deployment/data-index-ingestion-kafka -n data-index | grep -i error +---- + +=== DLQ messages pile up + +Inspect failed events: +[source,bash] +---- +kafka-console-consumer.sh \ + --bootstrap-server kafka:9092 \ + --topic data-index-events-dlq \ + --max-messages 5 | jq +---- + +Common causes: +* Malformed CloudEvents → fix event publisher +* Database unavailable → events will retry once recovered +* Schema mismatch → upgrade service or downgrade event publisher + +== Comparison + +[cols="1,1,1,1"] +|=== +| Feature | MODE 1 (FluentBit + Triggers) | MODE 2 (FluentBit + ES) | MODE 3 (Kafka) + +| Event Source | Log files | Log files | Kafka topics +| Ingestion | FluentBit DaemonSet | FluentBit DaemonSet | SmallRye Reactive Messaging +| Normalization | PostgreSQL triggers | ES transforms | Java processors (JDBC) +| Raw Storage | `workflow_events_raw` | `workflow-events` index | None (direct to normalized) +| Performance | ~10ms latency | ~1s latency | ~100ms latency +| DLQ | N/A | N/A | Yes (`data-index-events-dlq`) +| Security | Disk files | Disk files | Kafka (SSL/SASL capable) +|=== + +== Next Steps + +* xref:architecture/kafka-mode.adoc[Learn about MODE 3 architecture] +* xref:developers/configuration.adoc[Full configuration reference] diff --git a/data-index/data-index-docs/modules/ROOT/pages/getting-started.adoc b/data-index/data-index-docs/modules/ROOT/pages/getting-started.adoc index 14985dcc4..2155d451e 100644 --- a/data-index/data-index-docs/modules/ROOT/pages/getting-started.adoc +++ b/data-index/data-index-docs/modules/ROOT/pages/getting-started.adoc @@ -46,7 +46,7 @@ mvn quarkus:dev * GraphQL UI available at http://localhost:8080/q/graphql-ui * Elasticsearch available at http://localhost:9200 -**Note:** Development mode doesn't include FluentBit. To test with real workflow events, use the KIND installation below. +NOTE: Development mode doesn't include FluentBit. To test with real workflow events, use the KIND installation below. == Quick Install (KIND) @@ -54,6 +54,7 @@ Data Index supports two storage backends. Choose one based on your needs: * **PostgreSQL (MODE 1)** - Recommended for most users, simpler deployment * **Elasticsearch (MODE 2)** - For high throughput or full-text search requirements +* **Kafka (MODE 3)** - For stream-based ingestion, requires Kafka infrastructure === Option 1: PostgreSQL Backend (Recommended) @@ -93,6 +94,27 @@ kubectl apply -f elasticsearch/kubernetes/configmap.yaml kubectl apply -f elasticsearch/kubernetes/daemonset.yaml ---- +=== Option 3: Kafka Ingestion + +[source,bash] +---- +# 1. Setup KIND cluster and Infrastructure dependencies (Kafka, PostgreSQL) +./setup-cluster.sh +MODE=kafka ./install-dependencies.sh + +# 2. Deploy the data index query service (PostgreSQL backend) +./deploy-data-index.sh kafka + +# 3. Initialize the database schema +./init-database-schema.sh + +# 4. Deploy the Kafka ingestion service +./deploy-kafka-ingestion.sh + +# 5. Deploy test workflow app with kafka profile +MODE=kafka ./deploy-workflow-app.sh +---- + == Verify Installation === Check Pods @@ -109,6 +131,9 @@ kubectl get pods -n logging # PostgreSQL kubectl get pods -n postgresql + +# Kafka +kubectl get pods -n kafka ---- Expected output: diff --git a/data-index/data-index-ingestion/README.md b/data-index/data-index-ingestion/README.md new file mode 100644 index 000000000..86ed66f05 --- /dev/null +++ b/data-index/data-index-ingestion/README.md @@ -0,0 +1,46 @@ +# Data Index Ingestion - Kafka Mode (MODE 3) + +Standalone Kafka-based event ingestion service for Data Index. Consumes CloudEvents from Kafka topics and writes directly to normalized PostgreSQL tables. + +## Architecture + +``` +Quarkus Flow --> Kafka (CloudEvents, topic: flow-lifecycle-out) + | + KafkaLifecycleConsumer (SmallRye Reactive Messaging) + | + WorkflowEventProcessor / TaskExecutionProcessor + | + WorkflowPersistence / TaskPersistence (JDBC UPSERT, field-level idempotency) + | + Database tables (workflow_instances, task_instances) + | + Data Index GraphQL API + + (failed records --> dead-letter topic: data-index-events-dlq) +``` + +## Modules + +- **data-index-ingestion-kafka-processor** - Event models (`WorkflowInstanceEvent`, `TaskExecutionEvent`), processors (`WorkflowEventProcessor`, `TaskExecutionProcessor`), and JDBC UPSERT persistence (`WorkflowPersistence`, `TaskPersistence`) +- **data-index-ingestion-kafka-service** - Quarkus service with the Kafka consumer (`KafkaLifecycleConsumer`), CloudEvent mapping (`Mapper`), health checks, and dead-letter queue handling + +## Development + +```bash +# Build all modules +cd data-index +mvn clean package -pl data-index-ingestion -am -DskipTests + +# Run in dev mode (auto-starts Kafka + PostgreSQL via Quarkus Dev Services) +cd data-index-ingestion/data-index-ingestion-kafka-service +mvn quarkus:dev + +# Run integration tests +mvn test -pl data-index-ingestion/data-index-ingestion-kafka-service -am -Dsurefire.failIfNoSpecifiedTests=false +``` + +## Documentation + +- Service configuration & deployment: `data-index-ingestion-kafka-service/README.md` +- Kafka cluster / KIND scripts: `data-index/scripts/kafka/README.md` diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/pom.xml b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/pom.xml new file mode 100644 index 000000000..968bbaa52 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/pom.xml @@ -0,0 +1,87 @@ + + + + 4.0.0 + + + org.kubesmarts.logic.apps + data-index-ingestion + 999-SNAPSHOT + ../pom.xml + + + data-index-ingestion-kafka-processor + KubeSmarts Logic Apps :: Data Index :: Ingestion :: Kafka Processor + Event processing and normalization logic for Kafka mode + + + org.kubesmarts.logic.dataindex.ingestion.kafka.processor + + + + + org.kubesmarts.logic.apps + data-index-storage-common + + + jakarta.inject + jakarta.inject-api + + + jakarta.enterprise + jakarta.enterprise.cdi-api + + + com.fasterxml.jackson.core + jackson-databind + + + org.slf4j + slf4j-api + + + io.cloudevents + cloudevents-core + + + + + + + + io.smallrye + jandex-maven-plugin + + + make-index + + jandex + + + + + + + + diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/EventProcessor.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/EventProcessor.java new file mode 100644 index 000000000..b353cc06c --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/EventProcessor.java @@ -0,0 +1,6 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka.processor; + +public interface EventProcessor { + + void process(T event); +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/ProcessEventFailedException.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/ProcessEventFailedException.java new file mode 100644 index 000000000..41e0fec8e --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/ProcessEventFailedException.java @@ -0,0 +1,8 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka.processor; + +public class ProcessEventFailedException extends RuntimeException { + + public ProcessEventFailedException(String message, Throwable cause) { + super(message, cause); + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/TaskExecutionProcessor.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/TaskExecutionProcessor.java new file mode 100644 index 000000000..54d73e472 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/TaskExecutionProcessor.java @@ -0,0 +1,62 @@ +/* + * Copyright 2024 KubeSmarts Authors + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.kubesmarts.logic.dataindex.ingestion.kafka.processor; + +import io.quarkus.arc.Unremovable; +import jakarta.enterprise.context.ApplicationScoped; +import jakarta.inject.Inject; +import org.kubesmarts.logic.dataindex.model.TaskExecution; +import org.kubesmarts.logic.dataindex.ingestion.kafka.processor.persistence.TaskPersistence; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.SQLException; +import java.util.Objects; + +@Unremovable +@ApplicationScoped +public class TaskExecutionProcessor implements EventProcessor { + + private static final Logger log = LoggerFactory.getLogger(TaskExecutionProcessor.class); + + final TaskPersistence taskPersistence; + + @Inject + public TaskExecutionProcessor(TaskPersistence taskPersistence) { + this.taskPersistence = taskPersistence; + } + + @Override + public void process(TaskExecution event) { + Objects.requireNonNull(event, "event cannot be null"); + log.debug("Processing task: {}", event); + event.setId(generateTaskExecutionId(event)); + try { + this.taskPersistence.persist(event); + log.debug("Successfully processed the task event with ID: {}", event.getInstanceId()); + } catch (SQLException e) { + log.error("Error while processing the task event: {}", event, e); + throw new ProcessEventFailedException("Failed to process the task event with instance ID: " + event.getInstanceId(), e); + } + } + + private String generateTaskExecutionId(TaskExecution taskExecutionEvent) { + // Generate deterministic ID based on instance's ID + task position + return taskExecutionEvent.getInstanceId() + + ":" + taskExecutionEvent.getTaskPosition(); + } +} + diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/WorkflowEventProcessor.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/WorkflowEventProcessor.java new file mode 100644 index 000000000..7063c709b --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/WorkflowEventProcessor.java @@ -0,0 +1,52 @@ +/* + * Copyright 2024 KubeSmarts Authors + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.kubesmarts.logic.dataindex.ingestion.kafka.processor; + +import io.quarkus.arc.Unremovable; +import jakarta.enterprise.context.ApplicationScoped; +import jakarta.inject.Inject; +import org.kubesmarts.logic.dataindex.model.WorkflowInstance; +import org.kubesmarts.logic.dataindex.ingestion.kafka.processor.persistence.WorkflowPersistence; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.SQLException; +import java.util.Objects; + +@Unremovable +@ApplicationScoped +public class WorkflowEventProcessor implements EventProcessor { + + private static final Logger log = LoggerFactory.getLogger(WorkflowEventProcessor.class); + + final WorkflowPersistence workflowPersistence; + + @Inject + public WorkflowEventProcessor(WorkflowPersistence workflowPersistence) { + this.workflowPersistence = workflowPersistence; + } + + public void process(final WorkflowInstance event) { + try { + this.workflowPersistence.persist(Objects.requireNonNull(event, "event cannot be null")); + log.debug("Successfully processed the workflow event with ID: {}", event.getId()); + } catch (SQLException e) { + log.error("Error while processing the workflow event: {}", event, e); + throw new ProcessEventFailedException("Failed to process the workflow event with instance ID: " + event.getId(), e); + } + } +} + diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/persistence/LoadSQL.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/persistence/LoadSQL.java new file mode 100644 index 000000000..ba22fcc9e --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/persistence/LoadSQL.java @@ -0,0 +1,23 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka.processor.persistence; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; + +import java.util.Objects; +import java.util.stream.Collectors; + +public class LoadSQL { + private LoadSQL() { + } + + public static String load(String path) { + try (InputStream stream = Thread.currentThread().getContextClassLoader().getResourceAsStream(path)) { + BufferedReader buff = new BufferedReader(new InputStreamReader(Objects.requireNonNull(stream, "stream from path '" + path + "' is null"))); + return buff.lines().collect(Collectors.joining("\n")); + } catch (IOException | NullPointerException e) { + throw new IllegalStateException("Failed to load SQL resource: " + path, e); + } + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/persistence/TaskPersistence.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/persistence/TaskPersistence.java new file mode 100644 index 000000000..a5bf7d6dc --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/persistence/TaskPersistence.java @@ -0,0 +1,130 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka.processor.persistence; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import io.quarkus.arc.Unremovable; +import jakarta.annotation.PostConstruct; +import jakarta.enterprise.context.ApplicationScoped; +import jakarta.inject.Inject; +import org.kubesmarts.logic.dataindex.model.TaskExecution; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.sql.DataSource; +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.SQLException; +import java.sql.Savepoint; +import java.time.ZonedDateTime; +import java.util.Optional; + +@Unremovable +@ApplicationScoped +public class TaskPersistence { + + private static final Logger log = LoggerFactory.getLogger(TaskPersistence.class); + private static final String INVALID_FOREIGN_KEY = "23503"; + + private String insertTaskUpsert; + private String insertPlaceholderWorkflow; + + final DataSource dataSource; + final ObjectMapper objectMapper; + + @Inject + public TaskPersistence(DataSource dataSource, ObjectMapper objectMapper) { + this.dataSource = dataSource; + this.objectMapper = objectMapper; + } + + @PostConstruct + void init() { + insertTaskUpsert = LoadSQL.load("/sql/task-instance-upsert.sql"); + insertPlaceholderWorkflow = LoadSQL.load("/sql/task-placeholder-workflow-insert.sql"); + } + + public void persist(TaskExecution event) throws SQLException { + try (Connection conn = this.dataSource.getConnection()) { + conn.setAutoCommit(false); + Savepoint sp = conn.setSavepoint("before_task_insert"); + try { + // Try to insert directly + tryInsertTask(event, conn); + conn.commit(); + } catch (SQLException e) { + if (INVALID_FOREIGN_KEY.equals(e.getSQLState())) { + conn.rollback(sp); + tryCreatePlaceholder(event, conn); + tryInsertTask(event, conn); + conn.commit(); + } else { + conn.rollback(); + throw e; + } + } + } + } + + private void tryInsertTask(TaskExecution event, Connection conn) throws SQLException { + try (PreparedStatement stmt = conn.prepareStatement(insertTaskUpsert)) { + setTaskParameters(stmt, event); + stmt.executeUpdate(); + } + } + + private void tryCreatePlaceholder(TaskExecution event, Connection conn) throws SQLException { + // Create placeholder workflow and retry + log.debug("Task arrived before workflow. Creating placeholder for instance: {}", + event.getInstanceId()); + try (PreparedStatement placeholderStmt = conn.prepareStatement(insertPlaceholderWorkflow)) { + placeholderStmt.setString(1, event.getInstanceId()); + placeholderStmt.setObject(2, Optional.ofNullable(event.getEventTimestamp()).map(ZonedDateTime::toOffsetDateTime).orElse(null)); + placeholderStmt.executeUpdate(); + } + } + + private void setTaskParameters(PreparedStatement stmt, TaskExecution event) throws SQLException { + + stmt.setString(1, event.getId()); + stmt.setString(2, event.getInstanceId()); + stmt.setString(3, event.getTaskName()); + stmt.setString(4, event.getTaskPosition()); + stmt.setString(5, event.getStatus()); + stmt.setObject(6, Optional.ofNullable(event.getStart()).map(ZonedDateTime::toOffsetDateTime).orElse(null)); + stmt.setObject(7, Optional.ofNullable(event.getEnd()).map(ZonedDateTime::toOffsetDateTime).orElse(null)); + stmt.setString(8, toJsonString(event.getInput())); + stmt.setString(9, toJsonString(event.getOutput())); + + + // Error fields + if (event.getError() != null) { + stmt.setString(10, event.getError().getType()); + stmt.setString(11, event.getError().getTitle()); + stmt.setString(12, event.getError().getDetail()); + stmt.setObject(13, event.getError().getStatus()); + stmt.setString(14, event.getError().getInstance()); + } else { + stmt.setNull(10, java.sql.Types.VARCHAR); + stmt.setNull(11, java.sql.Types.VARCHAR); + stmt.setNull(12, java.sql.Types.VARCHAR); + stmt.setNull(13, java.sql.Types.INTEGER); + stmt.setNull(14, java.sql.Types.VARCHAR); + } + + stmt.setObject(15, Optional.ofNullable(event.getEventTimestamp()).map(ZonedDateTime::toOffsetDateTime).orElse(null)); + } + + private String toJsonString(JsonNode node) { + if (node == null) { + return null; + } + try { + return objectMapper.writeValueAsString(node); + } catch (JsonProcessingException e) { + log.warn("Failed to serialize JSON node, returning null", e); + return null; + } + } + +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/persistence/WorkflowPersistence.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/persistence/WorkflowPersistence.java new file mode 100644 index 000000000..6a4646e46 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/processor/persistence/WorkflowPersistence.java @@ -0,0 +1,108 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka.processor.persistence; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import io.quarkus.arc.Unremovable; +import jakarta.annotation.PostConstruct; +import jakarta.enterprise.context.ApplicationScoped; +import jakarta.inject.Inject; +import org.kubesmarts.logic.dataindex.model.WorkflowInstance; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.sql.DataSource; +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.SQLException; +import java.sql.Types; +import java.time.ZonedDateTime; +import java.util.Optional; + +@Unremovable +@ApplicationScoped +public class WorkflowPersistence { + + private static final Logger log = LoggerFactory.getLogger(WorkflowPersistence.class); + + private String insertWorkflowUpsert; + + final DataSource dataSource; + final ObjectMapper objectMapper; + + @Inject + public WorkflowPersistence(DataSource dataSource, ObjectMapper objectMapper) { + this.dataSource = dataSource; + this.objectMapper = objectMapper; + } + + @PostConstruct + void init() { + insertWorkflowUpsert = LoadSQL.load("/sql/workflow-instance-upsert.sql"); + } + + public void persist(WorkflowInstance event) throws SQLException { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement(insertWorkflowUpsert)) { + + conn.setAutoCommit(false); + + try { + setWorkflowParameters(stmt, event); + stmt.executeUpdate(); + conn.commit(); + } catch (SQLException e) { + conn.rollback(); + throw e; + } + } + } + + private void setWorkflowParameters(PreparedStatement stmt, WorkflowInstance event) throws SQLException { + stmt.setString(1, event.getId()); + stmt.setString(2, event.getNamespace()); + stmt.setString(3, event.getName()); + stmt.setString(4, event.getVersion()); + if (event.getStatus() != null) { + stmt.setString(5, event.getStatus().name()); + } else { + stmt.setNull(5, Types.VARCHAR); + } + stmt.setObject(6, Optional.ofNullable(event.getStart()).map(ZonedDateTime::toOffsetDateTime).orElse(null)); + stmt.setObject(7, Optional.ofNullable(event.getEnd()).map(ZonedDateTime::toOffsetDateTime).orElse(null)); + stmt.setObject(8, Optional.ofNullable(event.getLastUpdate()).map(ZonedDateTime::toOffsetDateTime).orElse(null)); + + // JSON fields + stmt.setString(9, toJsonString(event.getInput())); + stmt.setString(10, toJsonString(event.getOutput())); + + // Error fields + if (event.getError() != null) { + stmt.setString(11, event.getError().getType()); + stmt.setString(12, event.getError().getTitle()); + stmt.setString(13, event.getError().getDetail()); + stmt.setObject(14, event.getError().getStatus()); + stmt.setString(15, event.getError().getInstance()); + } else { + stmt.setNull(11, Types.VARCHAR); + stmt.setNull(12, Types.VARCHAR); + stmt.setNull(13, Types.VARCHAR); + stmt.setNull(14, Types.INTEGER); + stmt.setNull(15, Types.VARCHAR); + } + stmt.setObject(16, Optional.ofNullable(event.getEventTimestamp()).map(ZonedDateTime::toOffsetDateTime).orElse(null)); + } + + private String toJsonString(JsonNode node) { + if (node == null) { + return null; + } + try { + return objectMapper.writeValueAsString(node); + } catch (JsonProcessingException e) { + log.warn("Failed to serialize JSON node, returning null", e); + return null; + } + } + +} \ No newline at end of file diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/META-INF/beans.xml b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/META-INF/beans.xml new file mode 100644 index 000000000..ba33a335e --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/META-INF/beans.xml @@ -0,0 +1,9 @@ + + + + + diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/sql/task-instance-upsert.sql b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/sql/task-instance-upsert.sql new file mode 100644 index 000000000..59828f062 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/sql/task-instance-upsert.sql @@ -0,0 +1,54 @@ +INSERT INTO task_instances ( + task_execution_id, instance_id, task_name, task_position, status, + start, "end", input, output, + error_type, error_title, error_detail, error_status, error_instance, + last_event_time, created_at, updated_at +) VALUES (?, ?, ?, ?, ?, ?, ?, ?::jsonb, ?::jsonb, ?, ?, ?, ?, ?, ?, NOW(), NOW()) +ON CONFLICT (instance_id, task_position) DO UPDATE SET + instance_id = COALESCE(EXCLUDED.instance_id, task_instances.instance_id), + task_name = COALESCE(EXCLUDED.task_name, task_instances.task_name), + task_position = COALESCE(EXCLUDED.task_position, task_instances.task_position), + status = CASE + WHEN EXCLUDED.last_event_time >= task_instances.last_event_time + THEN EXCLUDED.status + ELSE task_instances.status + END, + start = COALESCE(task_instances.start, EXCLUDED.start), + input = COALESCE(task_instances.input, EXCLUDED.input), + "end" = CASE + WHEN EXCLUDED.last_event_time >= task_instances.last_event_time + THEN COALESCE(EXCLUDED."end", task_instances."end") + ELSE task_instances."end" + END, + output = CASE + WHEN EXCLUDED.last_event_time >= task_instances.last_event_time + THEN COALESCE(EXCLUDED.output, task_instances.output) + ELSE task_instances.output + END, + error_type = CASE + WHEN EXCLUDED.last_event_time >= task_instances.last_event_time + THEN COALESCE(EXCLUDED.error_type, task_instances.error_type) + ELSE task_instances.error_type + END, + error_title = CASE + WHEN EXCLUDED.last_event_time >= task_instances.last_event_time + THEN COALESCE(EXCLUDED.error_title, task_instances.error_title) + ELSE task_instances.error_title + END, + error_detail = CASE + WHEN EXCLUDED.last_event_time >= task_instances.last_event_time + THEN COALESCE(EXCLUDED.error_detail, task_instances.error_detail) + ELSE task_instances.error_detail + END, + error_status = CASE + WHEN EXCLUDED.last_event_time >= task_instances.last_event_time + THEN COALESCE(EXCLUDED.error_status, task_instances.error_status) + ELSE task_instances.error_status + END, + error_instance = CASE + WHEN EXCLUDED.last_event_time >= task_instances.last_event_time + THEN COALESCE(EXCLUDED.error_instance, task_instances.error_instance) + ELSE task_instances.error_instance + END, + last_event_time = GREATEST(EXCLUDED.last_event_time, task_instances.last_event_time), + updated_at = NOW() diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/sql/task-placeholder-workflow-insert.sql b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/sql/task-placeholder-workflow-insert.sql new file mode 100644 index 000000000..6f2051cbb --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/sql/task-placeholder-workflow-insert.sql @@ -0,0 +1,3 @@ +INSERT INTO workflow_instances (id, created_at, updated_at, last_event_time) +VALUES (?, NOW(), NOW(), ?) +ON CONFLICT (id) DO NOTHING diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/sql/workflow-instance-upsert.sql b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/sql/workflow-instance-upsert.sql new file mode 100644 index 000000000..0b305ff4b --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-processor/src/main/resources/sql/workflow-instance-upsert.sql @@ -0,0 +1,58 @@ +INSERT INTO workflow_instances ( + id, namespace, name, version, status, start, "end", last_update, + input, output, error_type, error_title, error_detail, error_status, error_instance, + last_event_time, created_at, updated_at +) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?::jsonb, ?::jsonb, ?, ?, ?, ?, ?, ?, NOW(), NOW()) +ON CONFLICT (id) DO UPDATE SET + status = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN EXCLUDED.status + ELSE workflow_instances.status + END, + namespace = COALESCE(workflow_instances.namespace, EXCLUDED.namespace), + name = COALESCE(workflow_instances.name, EXCLUDED.name), + version = COALESCE(workflow_instances.version, EXCLUDED.version), + start = COALESCE(workflow_instances.start, EXCLUDED.start), + input = COALESCE(workflow_instances.input, EXCLUDED.input), + "end" = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN COALESCE(EXCLUDED."end", workflow_instances."end") + ELSE workflow_instances."end" + END, + output = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN COALESCE(EXCLUDED.output, workflow_instances.output) + ELSE workflow_instances.output + END, + error_type = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN COALESCE(EXCLUDED.error_type, workflow_instances.error_type) + ELSE workflow_instances.error_type + END, + error_title = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN COALESCE(EXCLUDED.error_title, workflow_instances.error_title) + ELSE workflow_instances.error_title + END, + error_detail = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN COALESCE(EXCLUDED.error_detail, workflow_instances.error_detail) + ELSE workflow_instances.error_detail + END, + error_status = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN COALESCE(EXCLUDED.error_status, workflow_instances.error_status) + ELSE workflow_instances.error_status + END, + error_instance = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN COALESCE(EXCLUDED.error_instance, workflow_instances.error_instance) + ELSE workflow_instances.error_instance + END, + last_update = CASE + WHEN EXCLUDED.last_event_time >= workflow_instances.last_event_time + THEN COALESCE(EXCLUDED.last_update, workflow_instances.last_update) + ELSE workflow_instances.last_update + END, + last_event_time = GREATEST(EXCLUDED.last_event_time, workflow_instances.last_event_time), + updated_at = NOW() diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/README.md b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/README.md new file mode 100644 index 000000000..66a076376 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/README.md @@ -0,0 +1,336 @@ +# Data Index Kafka Ingestion Service + +Standalone Quarkus service that implements **MODE 3**: Kafka-based event ingestion for Data Index. + +Consumes CloudEvents from Kafka topics published by Quarkus Flow and writes normalized workflow/task data directly to PostgreSQL using JDBC UPSERT with **UTC offset timestamps**. + +## MODE 3 vs MODE 1 & MODE 2 + +| Feature | MODE 1 (FluentBit + Triggers) | MODE 2 (FluentBit + ES Transforms) | MODE 3 (Kafka + JDBC) | +|---|---|---|---| +| **Event source** | Log files | Log files | Kafka topics | +| **Ingestion** | FluentBit DaemonSet | FluentBit DaemonSet | SmallRye Reactive Messaging | +| **Normalization** | PostgreSQL triggers | Elasticsearch transforms | Java JDBC processors | +| **Raw storage** | `workflow_events_raw` table | `workflow-events` index | None (direct to normalized) | +| **Transport security** | File system | File system | SSL/SASL_SSL supported | +| **Dead letter queue** | N/A | N/A | Yes (`data-index-events-dlq`) | +| **Timestamp format** | Converted by trigger | Converted by transform | **OffsetDateTime (UTC)** | +| **Idempotency** | SQL `COALESCE` | Painless script | SQL `COALESCE` + `last_event_time` | +| **FK recovery** | N/A (triggers atomic) | N/A (no FK) | **Savepoint + retry** | + +**Use MODE 3 when:** +- Kafka infrastructure already exists +- Security requirements (no log files on disk) +- Need dead letter queue for failed events +- Direct event stream processing required +- Encrypted transport (SSL/SASL_SSL) needed + +## Quick Start + +### Prerequisites +- Java 17+ +- Maven 3.8+ +- Docker (for dev services: Kafka + PostgreSQL) + +### Development + +```bash +# Run in development mode (auto-starts Kafka + PostgreSQL via Dev Services) +mvn quarkus:dev + +# Service runs at: http://localhost:8080 +# Health checks: http://localhost:8080/q/health +# Readiness: http://localhost:8080/q/health/ready +# Liveness: http://localhost:8080/q/health/live +``` + +**Dev Services automatically provisions:** +- Kafka broker (RedPanda testcontainer) +- PostgreSQL database +- Required topics: `flow-lifecycle-out`, `data-index-events-dlq` +- Database schema via Flyway migrations + +### KIND Cluster Setup + +For testing in a local Kubernetes cluster: + +```bash +# 1. Setup KIND cluster and dependencies (installs PostgreSQL + Kafka) +cd ../../scripts/kind +./setup-cluster.sh +MODE=kafka ./install-dependencies.sh + +# 2. Deploy the data index query service (PostgreSQL backend) +./deploy-data-index.sh kafka + +# 3. Initialize the database schema +./init-database-schema.sh + +# 4. Deploy the Kafka ingestion service +./deploy-kafka-ingestion.sh + +# 5. Deploy test workflow app +./deploy-workflow-app.sh + +# 6. Run the end-to-end test +./test-mode3-e2e.sh +``` + +Topics (`flow-lifecycle-out` and the `data-index-events-dlq` dead-letter topic) are +auto-created on first publish — the Kafka cluster runs with +`KAFKA_AUTO_CREATE_TOPICS_ENABLE=true`. See `data-index/scripts/kafka/README.md` for +cluster details. + +## Architecture + +``` +Quarkus Flow (workflow runtime) + | + CloudEvents to Kafka (topic: flow-lifecycle-out) + | + KafkaLifecycleConsumer (SmallRye Reactive Messaging) + | + CloudEvent validation + payload mapping (Mapper: CloudEvent + LifecycleEvent -> WorkflowInstanceEvent / TaskExecutionEvent) + | + WorkflowEventProcessor / TaskExecutionProcessor + | + WorkflowPersistence / TaskPersistence (JDBC UPSERT with OffsetDateTime in UTC) + | + workflow_instances / task_instances (normalized tables, TIMESTAMP WITH TIME ZONE) + + (records that fail processing -> dead-letter topic: data-index-events-dlq) +``` + +### Module Structure + +The Kafka ingestion service is composed of two Maven modules: + +1. **data-index-ingestion-kafka-processor**: Event models and processing logic + - `data/WorkflowInstanceEvent.java`, `data/TaskExecutionEvent.java` - Internal event payloads + - `EventProcessor` - Generic processor interface + - `WorkflowEventProcessor.java` / `TaskExecutionProcessor.java` - Processor implementations + - `persistence/WorkflowPersistence.java` - JDBC UPSERT for workflow events + - `persistence/TaskPersistence.java` - JDBC UPSERT for task events with FK recovery + - `util/LifecycleEventUtils.java` - Event-type routing + status mapping + - `ProcessEventFailedException.java` - Thrown on processing failure (triggers DLQ) + +2. **data-index-ingestion-kafka-service**: Quarkus application + - `KafkaLifecycleConsumer.java` - Reactive Messaging consumer (validates and routes by event type) + - `Mapper.java` - Maps `CloudEvent` + `LifecycleEvent` to processor event types + - `LifecycleEvent.java` - CloudEvent data payload model + - `HealthChecks.java` - Kubernetes health probes + - `application.properties` - Kafka, database, messaging, and dead-letter-queue configuration + +### Processing Guarantees + +- **At-least-once delivery**: Kafka offsets committed after successful DB writes +- **Out-of-order handling**: Timestamp-based idempotency (last_event_time) +- **Task before workflow**: If task arrives first, placeholder workflow created via savepoint/retry +- **Field-level idempotency**: Immutable, terminal, and status fields handled correctly +- **UTC timestamps**: All timestamps saved as OffsetDateTime in UTC (TIMESTAMP WITH TIME ZONE) +- **Dead letter queue**: Failed messages sent to `data-index-events-dlq` topic + +### Idempotency Rules + +**Immutable fields** (first value wins): +- `namespace`, `name`, `version` (workflow) +- `start`, `input` (workflow and task) +- `task_name`, `task_position` (task) + +**Terminal fields** (last non-null wins): +- `end`, `output` (workflow and task) +- `last_update` (workflow only) +- Error fields: `error_type`, `error_title`, `error_detail`, `error_status`, `error_instance` + +**Status precedence** (via last_event_time comparison): +- Events with the most recent timestamp win the status field + +## Configuration + +### Key Properties + +| Property | Default | Description | +|---|---|---| +| `kafka.bootstrap.servers` | `KAFKA_BOOTSTRAP_SERVERS` (prod) | Kafka broker URLs | +| `quarkus.datasource.jdbc.url` | (dev services) | PostgreSQL connection | +| `mp.messaging.incoming.data-index-events.topic` | `flow-lifecycle-out` | Kafka topic name | +| `mp.messaging.incoming.data-index-events.group.id` | `data-index-ingestion` | Kafka consumer group | +| `mp.messaging.incoming.data-index-events.dead-letter-queue.topic` | `data-index-events-dlq` | Dead letter queue topic | +| `mp.messaging.incoming.data-index-events.auto.offset.reset` | `earliest` | Offset reset strategy | + +See `src/main/resources/application.properties` for full configuration. + +## Event Format + +### CloudEvent (v1.0) + +```json +{ + "specversion": "1.0", + "type": "io.serverlessworkflow.workflow.started.v1", + "source": "/workflow/executions/01KSGKY66DMS0KPPMFMMR3BJZX", + "id": "event-123", + "time": "2026-05-25T22:40:10.676900Z", + "datacontenttype": "application/json", + "data": { + "instanceId": "01KSGKY66DMS0KPPMFMMR3BJZX", + "workflowName": "order-processing", + "workflowNamespace": "org.acme", + "workflowVersion": "1.0.0", + "status": "RUNNING", + "startTime": "2026-05-25T19:40:10.676802-03:00", + "lastUpdateTime": "2026-05-25T19:40:10.676802-03:00", + "input": { "orderId": "ORD-789" } + } +} +``` + +### Event Type Routing + +- `io.serverlessworkflow.workflow.*` events -> WorkflowEventProcessor (via `LifecycleEventUtils.isWorkflow()`) +- `io.serverlessworkflow.task.*` events -> TaskExecutionProcessor (via `LifecycleEventUtils.isTask()`) + +### Timestamp Formats + +All timestamp fields (`startTime`, `endTime`, `lastUpdateTime`) are automatically converted to **UTC OffsetDateTime** and stored as `TIMESTAMP WITH TIME ZONE` in PostgreSQL. + +Accepted input formats: +- **ISO-8601 with offset**: `2026-05-25T19:40:10.676802-03:00` (recommended) +- **ISO-8601 UTC**: `2026-05-25T22:40:10.676900Z` +- **Unix epoch seconds**: `1747486200` (converted to UTC) + +All timestamps are normalized to UTC before database insertion. + +## Testing + +```bash +# Run all integration tests (uses Quarkus Dev Services for Kafka + PostgreSQL) +mvn test + +# Run a single integration test +mvn test -Dtest=KafkaIngestionITest +``` + +Integration tests (extending `BaseWorkflowLifecycleITest`) verify: +- Workflow started/completed event normalization (`KafkaIngestionITest`) +- Faulted workflow + error field normalization (`FaultedWorkflowITest`) +- Cancelled workflow lifecycle (`CancelledWorkflowITest`) +- Suspended workflow lifecycle (`SuspendedWorkflowITest`) +- Task lifecycle (started -> completed) +- Out-of-order events (task before workflow with savepoint recovery) +- Field-level idempotency (immutable fields preserved on update) +- UTC timestamp conversion and storage + +## Health Checks + +The service provides three health check endpoints: + +```bash +# Readiness: Database connectivity (used by Kubernetes readiness probe) +curl http://localhost:8080/q/health/ready + +# Liveness: Service is running (used by Kubernetes liveness probe) +curl http://localhost:8080/q/health/live + +# Wellness: Overall health summary +curl http://localhost:8080/q/health +``` + +**Health Check Details:** +- **Liveness**: Verifies database connection with `SELECT 1` query +- **Readiness**: Same as liveness - ensures DB is available before consuming messages +- **Wellness**: Aggregates all health indicators + +Kafka consumer health is automatically monitored via SmallRye Reactive Messaging health integration. + +## Error Handling + +### Failed Event Processing + +When event processing fails (database errors, constraint violations, etc.): + +1. **Exception thrown**: `ProcessEventFailedException` wraps the SQL error +2. **Dead letter queue**: Failed message is sent to `data-index-events-dlq` topic +3. **Consumer continues**: Next messages are processed (fail-fast disabled) +4. **Monitoring**: Check DLQ topic for failed events + +### Task FK Violation Recovery + +If a task event arrives before its parent workflow: + +1. **Initial INSERT fails**: Foreign key constraint violation (SQL state `23503`) +2. **Savepoint rollback**: Transaction rolled back to savepoint +3. **Placeholder workflow created**: Minimal workflow row with only `id` and `last_event_time` +4. **Task INSERT retried**: Now succeeds with placeholder workflow in place +5. **Workflow event arrives later**: Updates placeholder with full workflow data + +This ensures tasks are never lost due to event ordering issues. + +## Deployment + +### Kubernetes + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: data-index-kafka-ingestion +spec: + replicas: 1 # Single instance recommended (Kafka consumer group handles scaling) + template: + spec: + containers: + - name: kafka-ingestion + image: kubesmarts/data-index-ingestion-kafka-service:999-SNAPSHOT + env: + - name: KAFKA_BOOTSTRAP_SERVERS + value: "kafka.kafka.svc.cluster.local:9092" + - name: QUARKUS_DATASOURCE_JDBC_URL + value: "jdbc:postgresql://postgresql:5432/data-index" + livenessProbe: + httpGet: + path: /q/health/live + port: 8080 + initialDelaySeconds: 30 + periodSeconds: 10 + readinessProbe: + httpGet: + path: /q/health/ready + port: 8080 + initialDelaySeconds: 10 + periodSeconds: 5 +``` + +### Environment Variables + +| Variable | Required | Default | Description | +|---|---|---|---| +| `KAFKA_BOOTSTRAP_SERVERS` | Yes (prod) | - | Kafka broker URLs (comma-separated) | +| `QUARKUS_DATASOURCE_JDBC_URL` | Yes (prod) | - | PostgreSQL JDBC connection string | +| `QUARKUS_DATASOURCE_USERNAME` | Yes (prod) | - | Database username | +| `QUARKUS_DATASOURCE_PASSWORD` | Yes (prod) | - | Database password | + +## Monitoring + +### Key Metrics + +Monitor the following for production deployments: + +- **Kafka consumer lag**: `kafka_consumer_lag` (messages behind) +- **Processing rate**: `kafka_messages_consumed_total` +- **DLQ messages**: Monitor `data-index-events-dlq` topic for failed events +- **Database connection pool**: `agroal_*` metrics +- **Health check failures**: Kubernetes probe failures indicate DB connectivity issues + +### Logs + +```bash +# Follow service logs +kubectl logs -f deployment/data-index-kafka-ingestion + +# Search for errors +kubectl logs deployment/data-index-kafka-ingestion | grep ERROR + +# Check DLQ processing +kubectl logs deployment/data-index-kafka-ingestion | grep "dead-letter" +``` diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/pom.xml b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/pom.xml new file mode 100644 index 000000000..86ae55b3d --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/pom.xml @@ -0,0 +1,151 @@ + + + + 4.0.0 + + + org.kubesmarts.logic.apps + data-index-ingestion + 999-SNAPSHOT + ../pom.xml + + + data-index-ingestion-kafka-service + KubeSmarts Logic Apps :: Data Index :: Ingestion :: Kafka Service + Quarkus service with Kafka consumer and event listeners + + + org.kubesmarts.logic.dataindex.ingestion.kafka.service + + + + + org.kubesmarts.logic.apps + data-index-ingestion-kafka-processor + ${project.version} + + + io.quarkus + quarkus-qute + + + io.quarkus + quarkus-messaging-kafka + + + io.quarkus + quarkus-jdbc-postgresql + + + io.quarkus + quarkus-jackson + + + io.quarkus + quarkus-smallrye-health + + + io.quarkus + quarkus-container-image-jib + + + io.cloudevents + cloudevents-json-jackson + + + com.fasterxml.jackson.datatype + jackson-datatype-jsr310 + + + io.quarkus + quarkus-micrometer-registry-prometheus + + + + org.kubesmarts.logic.apps + data-index-storage-migrations + ${project.version} + test + + + io.quarkus + quarkus-flyway + test + + + io.quarkus + quarkus-junit5 + test + + + org.assertj + assertj-core + test + + + io.rest-assured + rest-assured + test + + + org.awaitility + awaitility + test + + + + + + + io.quarkus + quarkus-maven-plugin + true + + + + build + + + + + + org.apache.maven.plugins + maven-failsafe-plugin + + + **/*IT.java + + + + + + integration-test + verify + + + + + + + diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/HealthChecks.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/HealthChecks.java new file mode 100644 index 000000000..c1b6d05dd --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/HealthChecks.java @@ -0,0 +1,85 @@ +/* + * Copyright 2024 KubeSmarts Authors + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.kubesmarts.logic.dataindex.ingestion.kafka.service; + +import io.quarkus.arc.Unremovable; +import io.smallrye.health.api.Wellness; +import jakarta.enterprise.context.ApplicationScoped; +import org.eclipse.microprofile.health.HealthCheck; +import org.eclipse.microprofile.health.HealthCheckResponse; +import org.eclipse.microprofile.health.HealthCheckResponseBuilder; +import org.eclipse.microprofile.health.Liveness; +import org.eclipse.microprofile.health.Readiness; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.sql.DataSource; +import java.sql.Connection; +import java.sql.ResultSet; +import java.sql.Statement; + +@Unremovable +@ApplicationScoped +public class HealthChecks { + + private static final Logger log = LoggerFactory.getLogger(HealthChecks.class); + final DataSource dataSource; + + public HealthChecks(DataSource dataSource) { + this.dataSource = dataSource; + } + + @Liveness + public HealthCheck livenessCheck() { + return () -> getDatabaseHealthCheck(HealthCheckResponse.named("Kafka Ingestion Service - liveness check")); + } + + @Readiness + public HealthCheck readinessCheck() { + HealthCheckResponseBuilder builder = HealthCheckResponse.named("Kafka Ingestion Service - readiness check"); + return () -> getDatabaseHealthCheck(builder); + } + + private HealthCheckResponse getDatabaseHealthCheck(HealthCheckResponseBuilder builder) { + try { + try (Connection conn = dataSource.getConnection(); + Statement stmt = conn.createStatement(); + ResultSet rs = stmt.executeQuery("SELECT 1")) { + if (rs.next()) { + return builder.up() + .withData("database", "connected") + .build(); + } else { + return builder.down() + .withData("database", "disconnected") + .build(); + } + } + } catch (Exception e) { + log.error("Error while connecting to database", e); + return builder + .down() + .withData("database", "error: " + e.getMessage()) + .build(); + } + } + + @Wellness + public HealthCheck wellnessCheck() { + return readinessCheck(); + } +} + diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/KafkaIngestionObjectMapperCustomizer.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/KafkaIngestionObjectMapperCustomizer.java new file mode 100644 index 000000000..cf6383aef --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/KafkaIngestionObjectMapperCustomizer.java @@ -0,0 +1,16 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka.service; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.datatype.jsr310.JavaTimeModule; +import io.cloudevents.jackson.JsonFormat; +import io.quarkus.jackson.ObjectMapperCustomizer; +import jakarta.enterprise.context.ApplicationScoped; + +@ApplicationScoped +public class KafkaIngestionObjectMapperCustomizer implements ObjectMapperCustomizer { + @Override + public void customize(ObjectMapper objectMapper) { + objectMapper.registerModule(JsonFormat.getCloudEventJacksonModule()); + objectMapper.registerModule(new JavaTimeModule()); + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/KafkaLifecycleConsumer.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/KafkaLifecycleConsumer.java new file mode 100644 index 000000000..5ad0d66e5 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/KafkaLifecycleConsumer.java @@ -0,0 +1,117 @@ +/* + * Copyright 2024 KubeSmarts Authors + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.kubesmarts.logic.dataindex.ingestion.kafka.service; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import io.cloudevents.CloudEvent; +import io.cloudevents.jackson.JsonCloudEventData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowCEData; +import jakarta.enterprise.context.ApplicationScoped; +import jakarta.inject.Inject; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.eclipse.microprofile.reactive.messaging.Incoming; +import org.kubesmarts.logic.dataindex.ingestion.kafka.processor.EventProcessor; +import org.kubesmarts.logic.dataindex.ingestion.kafka.processor.ProcessEventFailedException; +import org.kubesmarts.logic.dataindex.model.LifecycleEventUtils; +import org.kubesmarts.logic.dataindex.model.TaskExecution; +import org.kubesmarts.logic.dataindex.model.WorkflowInstance; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@ApplicationScoped +public class KafkaLifecycleConsumer { + + private static final Logger log = LoggerFactory.getLogger(KafkaLifecycleConsumer.class); + + @Inject + ObjectMapper jackson; + + @Inject + EventProcessor workflowEventProcessor; + + @Inject + EventProcessor taskExecutionProcessor; + + @Incoming("data-index-events") + public void consumeLifecycleEvent(ConsumerRecord record) { + + try { + CloudEvent cloudEvent = validateCloudEvent(record); + + JsonCloudEventData cloudEventData = (JsonCloudEventData) cloudEvent.getData(); + if (cloudEventData == null || cloudEventData.getNode() == null) { + throw new IllegalArgumentException("The CloudEvent data node consumed at offset %s from partition %s is null or empty." + .formatted(record.offset(), record.partition())); + } + + Class eventClass = LifecycleEventUtils.getEventClass(cloudEvent.getType()); + Object data = jackson.convertValue(cloudEventData.getNode(), eventClass); + + if (data instanceof TaskCEData taskData) { + handleTaskEvent(cloudEvent, taskData); + } else if (data instanceof WorkflowCEData workflowData) { + handleWorkflowEvent(cloudEvent, workflowData); + } else { + throw new IllegalArgumentException("Unsupported event type '%s' consumed at offset %s from partition %s." + .formatted(cloudEvent.getType(), record.offset(), record.partition())); + } + } catch (Exception e) { + log.error("Failed to consume the record from Kafka at offset '{}' from partition '{}'.", record.offset(), record.partition(), e); + throw new ProcessEventFailedException("Failed to consume Kafka record at offset %s from partition %s".formatted( + record.offset(), record.partition()), e); + } + } + + private CloudEvent validateCloudEvent(ConsumerRecord record) throws JsonProcessingException { + if (record.value() == null || record.value().isEmpty()) { + throw new IllegalArgumentException("Event record consumed at offset %s, from partition %s is null or empty." + .formatted(record.topic(), record.partition())); + } + CloudEvent cloudEvent = jackson.readValue(record.value(), CloudEvent.class); + if (cloudEvent == null || cloudEvent.getType() == null) { + log.error("The CloudEvent consumed at offset '{}', from partition '{}' is null or has a null type.", record.offset(), record.partition()); + throw new IllegalArgumentException("CloudEvent type is null or empty."); + } + + if (cloudEvent.getTime() == null) { + log.error("The CloudEvent's time at offset '{}', from partition '{}' is null.", record.offset(), record.partition()); + throw new IllegalArgumentException("CloudEvent time is null."); + } + return cloudEvent; + } + + private void handleWorkflowEvent(CloudEvent cloudEvent, WorkflowCEData data) { + try { + WorkflowInstance workflow = Mapper.mapWorkflowInstanceEvent(cloudEvent, data, jackson); + workflowEventProcessor.process(workflow); + } catch (Exception e) { + log.error("Error while processing CloudEvent (workflow) with ID: {}", data.getName(), e); + throw new ProcessEventFailedException("Failed to process CloudEvent with ID: " + data.getName(), e); + } + } + + private void handleTaskEvent(CloudEvent cloudEvent, TaskCEData data) { + try { + TaskExecution taskExecution = Mapper.mapTaskExecutionEvent(cloudEvent, data, jackson); + taskExecutionProcessor.process(taskExecution); + } catch (Exception e) { + log.error("Error while processing CloudEvent (task) with ID: {}", cloudEvent.getId(), e); + throw new ProcessEventFailedException("Failed to process CloudEvent with ID: " + cloudEvent.getId(), e); + } + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/Mapper.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/Mapper.java new file mode 100644 index 000000000..dca727aaa --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/Mapper.java @@ -0,0 +1,149 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka.service; + +import com.fasterxml.jackson.databind.ObjectMapper; +import io.cloudevents.CloudEvent; +import io.serverlessworkflow.impl.LifecycleEvents; +import io.serverlessworkflow.impl.WorkflowError; +import io.serverlessworkflow.impl.lifecycle.ce.TaskCEData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskCancelledCEData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskCompletedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskCompletedCEDataWithOutput; +import io.serverlessworkflow.impl.lifecycle.ce.TaskFailedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskStartedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskStartedCEDataWithInput; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowCancelledCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowCompletedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowCompletedCEDataWithOutput; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowFailedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowStartedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowStartedCEDataWithInput; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowStatusCEDataEvent; +import org.kubesmarts.logic.dataindex.model.Error; +import org.kubesmarts.logic.dataindex.model.LifecycleEventUtils; +import org.kubesmarts.logic.dataindex.model.TaskExecution; +import org.kubesmarts.logic.dataindex.model.WorkflowInstance; +import org.kubesmarts.logic.dataindex.model.WorkflowInstanceStatus; +import java.time.OffsetDateTime; +import java.util.UUID; + +public final class Mapper { + + private Mapper() {} + + public static WorkflowInstance mapWorkflowInstanceEvent(CloudEvent cloudEvent, WorkflowCEData data, ObjectMapper jackson) { + + String id = data.getName(); + if (id == null || id.isBlank()) { + throw new IllegalArgumentException("WorkflowCEData name field is null or empty."); + } + + String status = defineStatus(cloudEvent, data); + + WorkflowInstance workflow = new WorkflowInstance(); + workflow.setId(id); + workflow.setNamespace(data.getDefinition().namespace()); + workflow.setName(data.getDefinition().name()); + workflow.setVersion(data.getDefinition().version()); + workflow.setStatus(WorkflowInstanceStatus.valueOf(status)); + workflow.setEventTimestamp(cloudEvent.getTime().toZonedDateTime()); + + if (data instanceof WorkflowStartedCEData started) { + workflow.setStart(started.getStartedAt().toZonedDateTime()); + } else if (data instanceof WorkflowCompletedCEData completed) { + workflow.setEnd(completed.getCompletedAt().toZonedDateTime()); + } else if (data instanceof WorkflowFailedCEData failed) { + workflow.setEnd(failed.getFaultedAt().toZonedDateTime()); + if (failed.getError() != null) { + workflow.setError(mapError(failed.getError())); + } + } else if (data instanceof WorkflowCancelledCEData cancelled) { + workflow.setEnd(cancelled.getCancelledAt().toZonedDateTime()); + } else if (data instanceof WorkflowStatusCEDataEvent statusChanged) { + workflow.setLastUpdate(statusChanged.getUpdatedAt().toZonedDateTime()); + } + + if (data instanceof WorkflowStartedCEDataWithInput withInput && withInput.getInput() != null) { + workflow.setInput(jackson.valueToTree(withInput.getInput())); + } + if (data instanceof WorkflowCompletedCEDataWithOutput withOutput && withOutput.getOutput() != null) { + workflow.setOutput(jackson.valueToTree(withOutput.getOutput())); + } + + return workflow; + } + + public static TaskExecution mapTaskExecutionEvent(CloudEvent cloudEvent, TaskCEData data, ObjectMapper jackson) { + + String instanceId = data.getWorkflow(); + if (instanceId == null || instanceId.isBlank()) { + throw new IllegalArgumentException("The workflow's instance id field ('workflow') is null or empty."); + } + + String taskPosition = data.getTask(); + if (taskPosition == null || taskPosition.isBlank()) { + throw new IllegalArgumentException("The task position field ('task') is null or empty."); + } + + String status = defineStatus(cloudEvent, data); + + TaskExecution taskExecution = new TaskExecution(); + taskExecution.setInstanceId(instanceId); + taskExecution.setEventTimestamp(cloudEvent.getTime().toZonedDateTime()); + taskExecution.setStatus(status); + taskExecution.setId(generateTaskExecutionId(instanceId, taskPosition, cloudEvent.getTime())); + taskExecution.setTaskPosition(taskPosition); + taskExecution.setTaskName(taskPosition.substring(taskPosition.lastIndexOf("/") + 1)); + + if (data instanceof TaskStartedCEData started) { + taskExecution.setStart(started.getStartedAt().toZonedDateTime()); + } + + if (data instanceof TaskCompletedCEData completed) { + taskExecution.setEnd(completed.getCompletedAt().toZonedDateTime()); + } else if (data instanceof TaskFailedCEData failed) { + taskExecution.setEnd(failed.getFaultedAt().toZonedDateTime()); + if (failed.getError() != null) { + taskExecution.setError(mapError(failed.getError())); + } + } else if (data instanceof TaskCancelledCEData cancelled) { + taskExecution.setEnd(cancelled.getCancelledAt().toZonedDateTime()); + } + + if (data instanceof TaskStartedCEDataWithInput withInput && withInput.getInput() != null) { + taskExecution.setInput(jackson.valueToTree(withInput.getInput())); + } + if (data instanceof TaskCompletedCEDataWithOutput withOutput && withOutput.getOutput() != null) { + taskExecution.setOutput(jackson.valueToTree(withOutput.getOutput())); + } + + return taskExecution; + } + + private static String defineStatus(CloudEvent cloudEvent, Object data) { + if (cloudEvent.getType().equals(LifecycleEvents.WORKFLOW_STATUS_CHANGED) + && data instanceof WorkflowStatusCEDataEvent statusChanged) { + return statusChanged.getStatus(); + } + String status = LifecycleEventUtils.defineStatusLooking(cloudEvent.getType()); + if (status == null) { + throw new IllegalArgumentException("It was not possible to define status looking for 'status' event."); + } + return status; + } + + private static Error mapError(WorkflowError workflowError) { + Error error = new Error(); + error.setType(workflowError.type()); + error.setTitle(workflowError.title()); + error.setDetail(workflowError.detail()); + error.setStatus(workflowError.status()); + error.setInstance(workflowError.instance()); + return error; + } + + private static String generateTaskExecutionId(String instanceId, String taskPosition, OffsetDateTime time) { + String composite = instanceId + ":" + taskPosition + ":" + time.toInstant().toEpochMilli(); + return UUID.nameUUIDFromBytes(composite.getBytes()).toString(); + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/RootResource.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/RootResource.java new file mode 100644 index 000000000..5659ec6ef --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/java/org/kubesmarts/logic/dataindex/ingestion/kafka/service/RootResource.java @@ -0,0 +1,72 @@ +/* + * Copyright 2024 KubeSmarts Authors + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.kubesmarts.logic.dataindex.ingestion.kafka.service; + +import io.quarkus.qute.Template; +import io.smallrye.health.SmallRyeHealthReporter; +import jakarta.inject.Inject; +import jakarta.json.JsonObject; +import jakarta.ws.rs.GET; +import jakarta.ws.rs.Path; +import jakarta.ws.rs.Produces; +import jakarta.ws.rs.core.MediaType; + +import java.io.InputStream; +import java.nio.charset.StandardCharsets; +import java.util.Properties; + +/** + * Serves the landing page with dynamic version injection. + */ +@Path("/") +public class RootResource { + + @Inject + SmallRyeHealthReporter reporter; + + @Inject + Template index; + + private final String version; + private String gitCommit; + + public RootResource() { + Package pkg = getClass().getPackage(); + version = pkg != null && pkg.getImplementationVersion() != null + ? pkg.getImplementationVersion() + : "999-SNAPSHOT"; + + try (InputStream is = getClass().getClassLoader().getResourceAsStream("git.properties")) { + if (is != null) { + Properties props = new Properties(); + props.load(is); + gitCommit = props.getProperty("git.commit.id.abbrev", "unknown"); + } else { + gitCommit = "dev"; + } + } catch (Exception e) { + gitCommit = "unknown"; + } + } + + @GET + @Produces(MediaType.TEXT_HTML) + public String root() { + JsonObject payload = reporter.getHealth().getPayload(); + return index.data("version", version, "payload", payload).render(); + } +} + diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/resources/application.properties b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/resources/application.properties new file mode 100644 index 000000000..87e21b118 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/resources/application.properties @@ -0,0 +1,35 @@ +quarkus.application.name=data-index-ingestion-kafka-service + +# http +quarkus.http.port=8080 +quarkus.http.enable-compression=true + +# health +quarkus.smallrye-health.root-path=/q/health + +# jib +quarkus.container-image.build=true +quarkus.container-image.group=kubesmarts +quarkus.container-image.name=data-index-ingestion-kafka-service +quarkus.container-image.tag=999-SNAPSHOT + +# kafka +%prod.kafka.bootstrap.servers=${KAFKA_BOOTSTRAP_SERVERS} + +# mp messaging +mp.messaging.incoming.data-index-events.connector=smallrye-kafka +mp.messaging.incoming.data-index-events.topic=flow-lifecycle-out +mp.messaging.incoming.data-index-events.group.id=data-index-ingestion +mp.messaging.incoming.data-index-events.auto.offset.reset=earliest +mp.messaging.incoming.data-index-events.retry-attempts=2 +mp.messaging.incoming.data-index-events.value.deserializer=org.apache.kafka.common.serialization.StringDeserializer +mp.messaging.incoming.data-index-events.key.deserializer=org.apache.kafka.common.serialization.StringDeserializer +mp.messaging.incoming.data-index-events.health-enabled=true +mp.messaging.incoming.data-index-events.health-readiness-enabled=true + +mp.messaging.incoming.data-index-events.failure-strategy=dead-letter-queue +mp.messaging.incoming.data-index-events.dead-letter-queue.topic=data-index-events-dlq +mp.messaging.incoming.data-index-events.dead-letter-queue.key.serializer=org.apache.kafka.common.serialization.StringSerializer +mp.messaging.incoming.data-index-events.dead-letter-queue.value.serializer=org.apache.kafka.common.serialization.StringSerializer + +quarkus.native.resources.includes=task-instance-upsert.sql,task-placeholder-workflow-insert.sql,workflow-instance-upsert.sql \ No newline at end of file diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/resources/templates/index.html b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/resources/templates/index.html new file mode 100644 index 000000000..ddf149fb2 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/main/resources/templates/index.html @@ -0,0 +1,131 @@ +{@java.lang.String version} + + + + + + Data Index Kafka Ingestion Service + + + +

Data Index Kafka Ingestion Service

+

Version: {version}

+

MODE 3: Direct Kafka → PostgreSQL event normalization

+ +
+

Health Status

+
Loading health information...
+
+ +

Available Endpoints

+
+ Health: GET /q/health +

Kubernetes liveness and readiness probes

+
+ +

Processing

+

This service consumes CloudEvents from the Kafka topic flow-lifecycle-out + and normalizes workflow and task execution data to PostgreSQL.

+ +

Dead Letter Queue

+
+ DLQ topic: data-index-events-dlq +

Records that fail processing (deserialization errors, persistence failures) + are routed to this topic instead of blocking the consumer. Monitor it to + inspect and reprocess failed events.

+
+ +

Configuration

+

Configure via environment variables or application.properties:

+
    +
  • KAFKA_BOOTSTRAP_SERVERS - Kafka brokers
  • +
  • QUARKUS_DATASOURCE_JDBC_URL - PostgreSQL connection
  • +
  • QUARKUS_DATASOURCE_USERNAME - Database user
  • +
  • QUARKUS_DATASOURCE_PASSWORD - Database password
  • +
+ + + diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/BaseWorkflowLifecycleIT.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/BaseWorkflowLifecycleIT.java new file mode 100644 index 000000000..dd308de62 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/BaseWorkflowLifecycleIT.java @@ -0,0 +1,346 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka; + +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import jakarta.inject.Inject; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.clients.producer.KafkaProducer; +import org.apache.kafka.clients.producer.ProducerRecord; +import org.awaitility.Awaitility; +import org.eclipse.microprofile.config.inject.ConfigProperty; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.sql.DataSource; +import java.io.IOException; +import java.net.URL; +import java.nio.charset.StandardCharsets; +import java.nio.file.Files; +import java.nio.file.Paths; +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.time.Duration; +import java.time.Instant; +import java.time.OffsetDateTime; +import java.time.ZonedDateTime; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.UUID; + +import static org.assertj.core.api.Assertions.assertThat; + +public abstract class BaseWorkflowLifecycleIT { + + final Logger log = LoggerFactory.getLogger(BaseWorkflowLifecycleIT.class); + + @Inject + protected DataSource dataSource; + + @Inject + protected ObjectMapper mapper; + + @ConfigProperty(name = "kafka.bootstrap.servers") + protected String kafkaBootstrapServers; + + protected KafkaProducer producer; + protected KafkaConsumer dlqConsumer; + + @BeforeEach + void setUp() { + var producerProps = new Properties(); + producerProps.put("bootstrap.servers", kafkaBootstrapServers); + producerProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); + producerProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); + producer = new KafkaProducer<>(producerProps); + } + + @AfterEach + void tearDown() throws SQLException { + if (producer != null) { + producer.close(); + } + try (Connection conn = dataSource.getConnection()) { + conn.prepareStatement("DELETE FROM task_instances;").executeUpdate(); + conn.prepareStatement("DELETE FROM workflow_instances;").executeUpdate(); + } + } + + protected void publishEventsToKafka(String jsonFileName) throws Exception { + String array = readCloudEvents(jsonFileName); + JsonNode events = mapper.readTree(array); + + for (JsonNode event : events) { + String eventJson = mapper.writeValueAsString(event); + producer.send(new ProducerRecord<>("flow-lifecycle-out", null, eventJson)).get(); + } + producer.flush(); + } + + protected void awaitByWorkflowStatus(String instanceId, String status) { + Awaitility.await().atMost(Duration.ofSeconds(5)).pollInterval(Duration.ofMillis(500)) + .untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT status FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo(status); + } + } + }); + } + + protected void awaitWorkflowWithName(String instanceId) { + Awaitility.await().atMost(Duration.ofSeconds(15)).pollInterval(Duration.ofMillis(500)) + .untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT name FROM workflow_instances WHERE id = ? AND name IS NOT NULL")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + } + } + }); + } + + protected void awaitByWorkflow(String instanceId) { + Awaitility.await().atMost(Duration.ofSeconds(15)).pollInterval(Duration.ofMillis(500)) + .untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT 1 FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + } + } + }); + } + + protected void awaitByTaskNameAndInstanceId(String taskName, String instanceId) { + Awaitility.await().atMost(Duration.ofSeconds(15)).pollInterval(Duration.ofMillis(500)) + .untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT 1 FROM task_instances WHERE task_name = ? AND instance_id = ?")) { + stmt.setString(1, taskName); + stmt.setString(2, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + } + } + }); + } + + protected void awaitByTaskPositionAndInstanceId(String taskPosition, String instanceId) { + Awaitility.await().atMost(Duration.ofSeconds(15)).pollInterval(Duration.ofMillis(500)) + .untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT 1 FROM task_instances WHERE task_position = ? AND instance_id = ?")) { + stmt.setString(1, taskPosition); + stmt.setString(2, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + } + } + }); + } + + protected void awaitTaskStatus(String taskName, String status) { + Awaitility.await().atMost(Duration.ofSeconds(15)).pollInterval(Duration.ofMillis(500)) + .untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT status FROM task_instances WHERE task_name = ? AND status = ?")) { + stmt.setString(1, taskName); + stmt.setString(2, status); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo(status); + } + } + }); + } + + protected List> pollDLQ(Duration timeout) { + List> records = new ArrayList<>(); + long endTime = System.currentTimeMillis() + timeout.toMillis(); + int consecutiveEmptyPolls = 0; + + while (System.currentTimeMillis() < endTime) { + var polled = dlqConsumer.poll(Duration.ofMillis(500)); + if (polled.isEmpty()) { + consecutiveEmptyPolls++; + if (consecutiveEmptyPolls >= 3 && !records.isEmpty()) { + break; + } + } else { + consecutiveEmptyPolls = 0; + polled.forEach(records::add); + } + } + + return records; + } + + protected void drainDLQ() { + long endTime = System.currentTimeMillis() + Duration.ofSeconds(2).toMillis(); + int drained = 0; + + while (System.currentTimeMillis() < endTime) { + var polled = dlqConsumer.poll(Duration.ofMillis(500)); + drained += polled.count(); + if (polled.isEmpty()) { + break; + } + } + + if (drained > 0) { + log.debug("Drained {} old messages from DLQ", drained); + } + } + + protected Map createWorkflowDefinition() { + return Map.of( + "namespace", "default", + "name", "test-workflow", + "version", "1.0.0" + ); + } + + protected String getEventTypeSuffix(String status) { + return switch (status) { + case "RUNNING", "STARTED" -> "started.v1"; + case "COMPLETED" -> "completed.v1"; + case "FAULTED" -> "faulted.v1"; + case "CANCELLED" -> "cancelled.v1"; + default -> status.toLowerCase() + ".v1"; + }; + } + + protected void publishStatusChanged(String instanceId, String status) throws Exception { + var data = new HashMap(); + data.put("name", instanceId); + data.put("definition", createWorkflowDefinition()); + data.put("status", status); + data.put("updatedAt", OffsetDateTime.now().toString()); + + var event = Map.of( + "specversion", "1.0", + "type", "io.serverlessworkflow.workflow.status-changed.v1", + "source", "test", + "id", UUID.randomUUID().toString(), + "time", Instant.now().toString(), + "datacontenttype", "application/json", + "data", data); + + String json = mapper.writeValueAsString(event); + producer.send(new ProducerRecord<>("flow-lifecycle-out", null, json)).get(); + producer.flush(); + } + + protected void publishWorkflowEvent(String instanceId, String status, + ZonedDateTime startTime, ZonedDateTime endTime, String inputJson, String outputJson, + Map error) throws Exception { + publishWorkflowEvent(instanceId, status, startTime, endTime, inputJson, outputJson, error, null); + } + + protected void publishWorkflowEvent(String instanceId, String status, + ZonedDateTime startTime, ZonedDateTime endTime, String inputJson, String outputJson, + Map error, OffsetDateTime cloudEventTime) throws Exception { + var data = new HashMap(); + data.put("name", instanceId); + data.put("definition", createWorkflowDefinition()); + data.put("status", status); + + if ("RUNNING".equals(status) || "STARTED".equals(status)) { + if (startTime != null) data.put("startedAt", startTime); + if (inputJson != null) data.put("input", mapper.readTree(inputJson)); + } else if ("COMPLETED".equals(status)) { + if (endTime != null) data.put("completedAt", endTime); + if (outputJson != null) data.put("output", mapper.readTree(outputJson)); + } else if ("FAULTED".equals(status)) { + if (endTime != null) data.put("faultedAt", endTime); + if (error != null) data.put("error", error); + } else if ("CANCELLED".equals(status)) { + if (endTime != null) data.put("cancelledAt", endTime); + } + + if (error != null && !"FAULTED".equals(status)) data.put("error", error); + + var event = Map.of( + "specversion", "1.0", + "type", "io.serverlessworkflow.workflow." + getEventTypeSuffix(status), + "source", "test", + "id", UUID.randomUUID().toString(), + "time", cloudEventTime != null ? cloudEventTime.toInstant().toString() : Instant.now().toString(), + "datacontenttype", "application/json", + "data", data); + + String json = mapper.writeValueAsString(event); + producer.send(new ProducerRecord<>("flow-lifecycle-out", instanceId, json)).get(); + producer.flush(); + } + + protected void publishTaskEvent(String task, String workflow, String status, + ZonedDateTime startTime, ZonedDateTime endTime) throws Exception { + publishTaskEvent(task, workflow, status, startTime, endTime, null); + } + + protected void publishTaskEvent(String task, String workflow, String status, + ZonedDateTime startTime, ZonedDateTime endTime, Instant cloudEventTime) throws Exception { + publishTaskEvent(task, workflow, status, startTime, endTime, cloudEventTime, null, null); + + } + + protected void publishTaskEvent(String task, String workflow, String status, + ZonedDateTime startTime, ZonedDateTime endTime, Instant cloudEventTime, String input, String output) throws Exception { + var data = new HashMap(); + data.put("workflow", workflow); + data.put("task", task); + data.put("definition", createWorkflowDefinition()); + data.put("status", status); + + if ("RUNNING".equals(status) || "STARTED".equals(status)) { + if (startTime != null) data.put("startedAt", startTime); + if (input != null) data.put("input", input); + } else if ("COMPLETED".equals(status)) { + if (endTime != null) data.put("completedAt", endTime); + if (output != null) data.put("output", output); + } else if ("FAULTED".equals(status)) { + if (endTime != null) data.put("faultedAt", endTime); + } else if ("CANCELLED".equals(status)) { + if (endTime != null) data.put("cancelledAt", endTime); + } + + var event = Map.of( + "specversion", "1.0", + "type", "io.serverlessworkflow.task." + getEventTypeSuffix(status), + "source", "test", + "id", UUID.randomUUID().toString(), + "time", cloudEventTime != null ? cloudEventTime : Instant.now(), + "datacontenttype", "application/json", + "data", data); + + String json = mapper.writeValueAsString(event); + producer.send(new ProducerRecord<>("flow-lifecycle-out", null, json)).get(); + producer.flush(); + } + + protected static String readCloudEvents(String filename) throws IOException { + URL resource = Thread.currentThread().getContextClassLoader() + .getResource(filename); + return Files.readString(Paths.get(resource.getPath()), StandardCharsets.UTF_8); + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/CancelledWorkflowIT.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/CancelledWorkflowIT.java new file mode 100644 index 000000000..8916e4eab --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/CancelledWorkflowIT.java @@ -0,0 +1,49 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka; + +import io.quarkus.test.junit.QuarkusTest; +import org.junit.jupiter.api.Test; + +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.ResultSet; + +import static org.assertj.core.api.Assertions.assertThat; + +@QuarkusTest +public class CancelledWorkflowIT extends BaseWorkflowLifecycleIT { + + @Test + void shouldSaveCancelledWorkflowCloudEvents() throws Exception { + publishEventsToKafka("cancelled-workflow.json"); + + String workflowId = "01KSR5FER167JC2SN81K0K2N0S"; + awaitByWorkflowStatus(workflowId, "CANCELLED"); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT id, status, name, namespace, version " + + "FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, workflowId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo("CANCELLED"); + assertThat(rs.getString("name")).isEqualTo("SwitchLoopWait"); + assertThat(rs.getString("namespace")).isEqualTo("example"); + assertThat(rs.getString("version")).isEqualTo("0.1.0"); + } + } + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT task_name, task_position, status " + + "FROM task_instances WHERE instance_id = ? AND status = 'CANCELLED'")) { + stmt.setString(1, workflowId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("task_name")).isEqualTo("waitABit"); + assertThat(rs.getString("task_position")).isEqualTo("do/2/waitABit"); + assertThat(rs.getString("status")).isEqualTo("CANCELLED"); + } + } + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/FaultedWorkflowIT.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/FaultedWorkflowIT.java new file mode 100644 index 000000000..0488a554a --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/FaultedWorkflowIT.java @@ -0,0 +1,70 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka; + +import io.quarkus.test.junit.QuarkusTest; +import io.serverlessworkflow.impl.WorkflowStatus; +import org.awaitility.Awaitility; +import org.junit.jupiter.api.Test; + +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.util.concurrent.TimeUnit; + +import static org.assertj.core.api.Assertions.assertThat; + +@QuarkusTest +public class FaultedWorkflowIT extends BaseWorkflowLifecycleIT { + + @Test + void shouldSaveFaultedWorkflowCloudEvents() throws Exception { + publishEventsToKafka("faulted-workflow.json"); + + String workflowId = "01KSR2FQCGEFV9B5V6QQ6PTJDK"; + awaitByWorkflowStatus(workflowId, WorkflowStatus.FAULTED.name()); + + + Awaitility.await().atMost(5, TimeUnit.SECONDS).untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT id, status, name, namespace, version, error_type, error_status, error_detail " + + "FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, workflowId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo("FAULTED"); + assertThat(rs.getString("name")).isEqualTo("faulted-workflow"); + assertThat(rs.getString("namespace")).isEqualTo("quarkus.flow"); + assertThat(rs.getString("version")).isEqualTo("0.0.1"); + + // Verify error fields + assertThat(rs.getString("error_type")).isEqualTo("https://serverlessworkflow.io/spec/1.0.0/errors/data"); + assertThat(rs.getInt("error_status")).isEqualTo(422); + assertThat(rs.getString("error_detail")).contains("Connection refused"); + } + } + }); + + Awaitility.await().atMost(5, TimeUnit.SECONDS).untilAsserted(() -> { + // Verify task is also FAULTED with error + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT task_name, task_position, status, error_type, error_status, error_detail " + + "FROM task_instances WHERE instance_id = ? AND task_position = ? AND status = ?")) { + stmt.setString(1, workflowId); + stmt.setString(2 , "do/0/http-0"); + stmt.setString(3 , "FAILED"); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("task_name")).isEqualTo("http-0"); + assertThat(rs.getString("task_position")).isEqualTo("do/0/http-0"); + assertThat(rs.getString("status")).isEqualTo("FAILED"); + + // Verify task error fields + assertThat(rs.getString("error_type")).isEqualTo("https://serverlessworkflow.io/spec/1.0.0/errors/data"); + assertThat(rs.getInt("error_status")).isEqualTo(422); + assertThat(rs.getString("error_detail")).contains("Connection refused"); + } + } + }); + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/KafkaIngestionIT.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/KafkaIngestionIT.java new file mode 100644 index 000000000..9d5260c89 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/KafkaIngestionIT.java @@ -0,0 +1,603 @@ +/* + * Copyright 2024 KubeSmarts Authors + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.kubesmarts.logic.dataindex.ingestion.kafka; + +import io.quarkus.test.junit.QuarkusTest; +import io.serverlessworkflow.impl.WorkflowStatus; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.clients.producer.KafkaProducer; +import org.apache.kafka.clients.producer.ProducerRecord; +import org.awaitility.Awaitility; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.time.Duration; +import java.time.Instant; +import java.time.OffsetDateTime; +import java.time.ZoneOffset; +import java.time.ZonedDateTime; +import java.time.temporal.ChronoUnit; +import java.util.Map; +import java.util.Properties; +import java.util.UUID; + +import static java.util.Collections.singletonList; +import static org.assertj.core.api.Assertions.assertThat; +import static org.awaitility.Awaitility.await; + +@QuarkusTest +public class KafkaIngestionIT extends BaseWorkflowLifecycleIT { + + @BeforeEach + void setUp() { + var producerProps = new Properties(); + producerProps.put("bootstrap.servers", kafkaBootstrapServers); + producerProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); + producerProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); + producer = new KafkaProducer<>(producerProps); + + var consumerProps = new Properties(); + consumerProps.put("bootstrap.servers", kafkaBootstrapServers); + consumerProps.put("group.id", "dlq-test-consumer-" + UUID.randomUUID()); + consumerProps.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); + consumerProps.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); + consumerProps.put("auto.offset.reset", "earliest"); + consumerProps.put("enable.auto.commit", "true"); + dlqConsumer = new KafkaConsumer<>(consumerProps); + dlqConsumer.subscribe(singletonList("data-index-events-dlq")); + + // Drain any existing DLQ messages from previous tests + drainDLQ(); + } + + @AfterEach + void tearDown() throws SQLException { + if (producer != null) { + producer.close(); + } + if (dlqConsumer != null) { + dlqConsumer.close(); + } + try (Connection conn = dataSource.getConnection()) { + conn.prepareStatement("DELETE FROM task_instances;").executeUpdate(); + conn.prepareStatement("DELETE FROM workflow_instances;").executeUpdate(); + } + } + + @Test + void shouldNormalizeWorkflowStartedEvent() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + ZonedDateTime startTime = ZonedDateTime.now(ZoneOffset.UTC); + + publishWorkflowEvent(instanceId, "RUNNING", startTime, null, null, null, null); + + awaitByWorkflow(instanceId); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT id, status, name, namespace, version FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo("RUNNING"); + assertThat(rs.getString("name")).isEqualTo("test-workflow"); + assertThat(rs.getString("namespace")).isEqualTo("default"); + } + } + } + + @Test + void shouldNormalizeWorkflowCompletedEvent() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + ZonedDateTime startTime = ZonedDateTime.now(ZoneOffset.UTC); + ZonedDateTime endTime = startTime.plusSeconds(10); + + publishWorkflowEvent(instanceId, "RUNNING", startTime, null, null, null, null); + awaitByWorkflow(instanceId); + + publishWorkflowEvent(instanceId, "COMPLETED", null, endTime, + null, "{\"result\":\"ok\"}", null); + awaitByWorkflowStatus(instanceId, "COMPLETED"); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT status, \"end\", output FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo("COMPLETED"); + assertThat(rs.getTimestamp("end")).isNotNull(); + assertThat(rs.getString("output")).contains("result"); + } + } + } + + @Test + void shouldPreserveImmutableFieldsOnUpdate() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + var startTime = ZonedDateTime.now(ZoneOffset.UTC); + + publishWorkflowEvent(instanceId, WorkflowStatus.RUNNING.name(), startTime, null, + "{\"original\":true}", null, null); + + awaitByWorkflow(instanceId); + + var completedTime = startTime.plusSeconds(5); + publishWorkflowEvent(instanceId, WorkflowStatus.COMPLETED.name(), completedTime, completedTime, + "{\"overwrite\":true}", null, null); + + publishStatusChanged(instanceId, WorkflowStatus.COMPLETED.name()); + + awaitByWorkflowStatus(instanceId, "COMPLETED"); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT input, start FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + // Immutable: first value wins + assertThat(rs.getString("input")).contains("original"); + } + } + } + + @Test + void shouldHandleTaskBeforeWorkflow() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + String taskName = "callHttp"; + String task = "do/0/" + taskName; + ZonedDateTime startTime = ZonedDateTime.now(ZoneOffset.UTC); + + // Publish task event BEFORE workflow event + publishTaskEvent(task, instanceId, "RUNNING", startTime, null); + awaitByTaskNameAndInstanceId(taskName, instanceId); + + // Placeholder workflow should have been created + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT id FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).as("Placeholder workflow should exist").isTrue(); + } + } + + // Now send the actual workflow event + publishWorkflowEvent(instanceId, "RUNNING", startTime, null, null, null, null); + awaitWorkflowWithName(instanceId); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT name FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("name")).isEqualTo("test-workflow"); + } + } + } + + @Test + void shouldNormalizeTaskLifecycle() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + String taskName = "doSomething"; + String task = "do/0/" + taskName; + var startTime = ZonedDateTime.now(ZoneOffset.UTC); + + publishWorkflowEvent(instanceId, "RUNNING", startTime, null, null, null, null); + awaitByWorkflow(instanceId); + + publishTaskEvent(task, instanceId, "RUNNING", startTime, null); + awaitByTaskNameAndInstanceId(taskName, instanceId); + + var endTime = startTime.plusSeconds(5); + publishTaskEvent(task, instanceId, "COMPLETED", null, endTime); + awaitTaskStatus(taskName, "COMPLETED"); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT status, task_name, \"end\" FROM task_instances WHERE task_name = ? AND status = ?")) { + stmt.setString(1, taskName); + stmt.setString(2, "COMPLETED"); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo("COMPLETED"); + assertThat(rs.getString("task_name")).isEqualTo(taskName); + assertThat(rs.getTimestamp("end")).isNotNull(); + } + } + } + + @Test + void shouldHandleWorkflowWithError() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + var startTime = ZonedDateTime.now(ZoneOffset.UTC); + + Map error = Map.of( + "type", "RuntimeException", + "title", "Workflow failed", + "detail", "NullPointerException at line 42", + "status", 500); + + publishWorkflowEvent(instanceId, "FAULTED", startTime, startTime.plusSeconds(1), + null, null, error); + awaitByWorkflow(instanceId); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT status, error_type, error_title, error_detail, error_status FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo("FAULTED"); + assertThat(rs.getString("error_type")).isEqualTo("RuntimeException"); + assertThat(rs.getString("error_title")).isEqualTo("Workflow failed"); + assertThat(rs.getString("error_detail")).contains("NullPointerException"); + assertThat(rs.getInt("error_status")).isEqualTo(500); + } + } + } + + @Test + void shouldHandleOutOfOrderWorkflowEvents() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + var startTime = ZonedDateTime.now(ZoneOffset.UTC); + var endTime = startTime.plusSeconds(10); + + // Send COMPLETED event BEFORE RUNNING event + publishWorkflowEvent(instanceId, "COMPLETED", null, endTime, + null, "{\"result\":\"ok\"}", null, endTime.toOffsetDateTime()); + awaitByWorkflow(instanceId); + + // Now send the RUNNING event with earlier timestamp + publishWorkflowEvent(instanceId, "RUNNING", startTime, null, + "{\"input\":\"data\"}", null, null, startTime.toOffsetDateTime()); + + await().atMost(Duration.ofSeconds(5)).untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT status, input, output, start FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + // Status should remain COMPLETED (latest timestamp wins for status) + assertThat(rs.getString("status")).isEqualTo(WorkflowStatus.COMPLETED.name()); + // Immutable field: input should be set from the first event that provides a non-null input + assertThat(rs.getString("input")).contains("data"); + // Terminal field: output should be preserved + assertThat(rs.getString("output")).contains("result"); + } + } + }); + + } + + @Test + void shouldUseTimestampToDetermineStatusWinner() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + var t1 = ZonedDateTime.now(ZoneOffset.UTC); + var t2 = t1.plusSeconds(5); + var t3 = t2.plusSeconds(5); + + // Send events in order: RUNNING -> COMPLETED -> late RUNNING + publishWorkflowEvent(instanceId, WorkflowStatus.RUNNING.name(), t1, null, null, null, null, t1.toOffsetDateTime()); + awaitByWorkflow(instanceId); + + publishWorkflowEvent(instanceId, WorkflowStatus.COMPLETED.name(), null, t3, null, "{\"result\":\"ok\"}", null, t3.toOffsetDateTime()); + awaitByWorkflowStatus(instanceId, WorkflowStatus.COMPLETED.name()); + + // Send late RUNNING event with timestamp between t1 and t3 + publishWorkflowEvent(instanceId, WorkflowStatus.RUNNING.name(), null, t2, null, null, null, t2.toOffsetDateTime()); + + await().atMost(Duration.ofSeconds(5)).untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT status FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + // Should remain COMPLETED (latest timestamp wins) + assertThat(rs.getString("status")).isEqualTo(WorkflowStatus.COMPLETED.name()); + } + } + }); + } + + @Test + void shouldHandleIdempotentWorkflowEventReplay() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + var startTime = ZonedDateTime.now(ZoneOffset.UTC); + + // Send the same event twice + publishWorkflowEvent(instanceId, WorkflowStatus.RUNNING.name(), startTime, null, + "{\"input\":\"original\"}", null, null, startTime.toOffsetDateTime()); + awaitByWorkflow(instanceId); + + // Replay the exact same event + publishWorkflowEvent(instanceId, WorkflowStatus.RUNNING.name(), startTime, null, + "{\"input\":\"original\"}", null, null, startTime.toOffsetDateTime()); + + await() + .atMost(Duration.ofSeconds(5)) + .untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT COUNT(*) as cnt, input FROM workflow_instances WHERE id = ? GROUP BY input")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + // Should only have one row + assertThat(rs.getInt("cnt")).isEqualTo(1); + assertThat(rs.getString("input")).contains("original"); + } + } + }); + } + + @Test + void shouldHandleIdempotentTaskEventReplay() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + String taskName = "set-0"; + String task = "do/0/" + taskName; + var startTime = ZonedDateTime.now(ZoneOffset.UTC); + + publishWorkflowEvent(instanceId, WorkflowStatus.RUNNING.name(), startTime, null, null, null, null); + awaitByWorkflow(instanceId); + + Instant cloudEventTime = OffsetDateTime.now().toInstant(); + + // Send TASK_STARTED event + publishTaskEvent(task, instanceId, WorkflowStatus.RUNNING.name(), startTime.plus(4, ChronoUnit.MILLIS), null, cloudEventTime); + awaitByTaskNameAndInstanceId(taskName, instanceId); + + // Replay the exact same event + publishTaskEvent(task, instanceId, WorkflowStatus.RUNNING.name(), startTime.plus(4, ChronoUnit.MILLIS), null, cloudEventTime); + + await().atMost(Duration.ofSeconds(5)).untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT COUNT(*) FROM task_instances WHERE task_name = ? AND instance_id = ?")) { + stmt.setString(1, taskName); + stmt.setString(2, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + // Should only have one row + assertThat(rs.getInt(1)).isEqualTo(1); + } + } + }); + } + + @Test + void shouldNotOverwriteImmutableFieldsOnReplay() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + var startTime = ZonedDateTime.now(ZoneOffset.UTC); + + // Send initial event with input + publishWorkflowEvent(instanceId, WorkflowStatus.RUNNING.name(), startTime, null, + "{\"original\":\"value\"}", null, null); + awaitByWorkflow(instanceId); + + // Try to replay with different input (simulating corrupted replay) + publishWorkflowEvent(instanceId, WorkflowStatus.RUNNING.name(), startTime, null, + "{\"modified\":\"value\"}", null, null); + + await().atMost(Duration.ofSeconds(5)).untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT input FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + // Input should remain original (immutable field) + assertThat(rs.getString("input")).contains("original"); + assertThat(rs.getString("input")).doesNotContain("modified"); + } + } + }); + } + + @Test + void shouldAcceptTerminalFieldUpdatesOnReplay() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + var startTime = ZonedDateTime.now(ZoneOffset.UTC); + var endTime = startTime.plusSeconds(5); + + // Send COMPLETED event without output + publishWorkflowEvent(instanceId, "COMPLETED", startTime, endTime, + null, null, null); + awaitByWorkflow(instanceId); + + // Replay with output (late-arriving data) + publishWorkflowEvent(instanceId, "COMPLETED", startTime, endTime, + null, "{\"result\":\"ok\"}", null); + + await().atMost(Duration.ofSeconds(5)).untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT output FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + // Terminal field (output) should accept non-null value + assertThat(rs.getString("output")).contains("result"); + } + } + }); + } + + @Test + void shouldHandleRepeatedPlaceholderWorkflowCreation() throws Exception { + String instanceId = "wf-" + UUID.randomUUID(); + String taskName = "set-0"; + String task1 = "do/0/" + taskName; + String task2 = "do/1/" + taskName; + var startTime = ZonedDateTime.now(ZoneOffset.UTC); + + // Send two different tasks for same non-existent workflow + publishTaskEvent(task1, instanceId, "RUNNING", startTime, null); + publishTaskEvent(task2, instanceId, "RUNNING", startTime, null); + awaitByTaskPositionAndInstanceId(task1, instanceId); + awaitByTaskPositionAndInstanceId(task2, instanceId); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT COUNT(*) FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + // Should only have one placeholder workflow (idempotent) + assertThat(rs.getInt(1)).isEqualTo(1); + } + } + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT COUNT(*) FROM task_instances WHERE instance_id = ?")) { + stmt.setString(1, instanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + // Should have both tasks + assertThat(rs.getInt(1)).isEqualTo(2); + } + } + } + + @Test + void shouldSendInvalidJsonEventToDLQ() throws Exception { + String invalidJson = "{invalid-json-not-properly-formatted"; + + producer.send(new ProducerRecord<>("flow-lifecycle-out", "invalid-key", invalidJson)).get(); + producer.flush(); + + var dlqRecords = pollDLQ(Duration.ofSeconds(10)); + + assertThat(dlqRecords).isNotEmpty(); + assertThat(dlqRecords.get(0).value()).contains("invalid-json"); + } + + @Test + void shouldIgnoreEventWithMissingRequiredFields() throws Exception { + var event = Map.of( + "specversion", "1.0", + "type", "io.serverlessworkflow.workflow.running", + "source", "test", + "id", UUID.randomUUID().toString(), + "time", Instant.now().toString(), + "datacontenttype", "application/json", + "data", Map.of()); + + String json = mapper.writeValueAsString(event); + producer.send(new ProducerRecord<>("flow-lifecycle-out", "missing-fields", json)).get(); + producer.flush(); + + Awaitility.await().atMost(Duration.ofSeconds(5)).untilAsserted(() -> { + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT COUNT(*) FROM workflow_instances")) { + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getInt(1)).isZero(); + } + } + }); + } + + @Test + void shouldIgnoreEventWithUnknownType() throws Exception { + var data = Map.of( + "instanceId", "unknown-123", + "status", "RUNNING"); + + var event = Map.of( + "specversion", "1.0", + "type", "unknown.event.type", + "source", "test", + "id", UUID.randomUUID().toString(), + "time", Instant.now().toString(), + "datacontenttype", "application/json", + "data", data); + + String json = mapper.writeValueAsString(event); + producer.send(new ProducerRecord<>("flow-lifecycle-out", "unknown-type", json)).get(); + producer.flush(); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT COUNT(*) FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, "unknown-123"); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getInt(1)).isZero(); + } + } + + var dlqRecords = pollDLQ(Duration.ofSeconds(3)); + assertThat(dlqRecords).as("Unknown event type should not be ignored.").isNotEmpty(); + } + + @Test + void shouldNotIgnoreEmptyMessages() throws Exception { + producer.send(new ProducerRecord<>("flow-lifecycle-out", "empty-key", "")).get(); + producer.flush(); + var dlqRecords = pollDLQ(Duration.ofSeconds(3)); + assertThat(dlqRecords).as("Empty messages should not be ignored.").isNotEmpty(); + } + + @Test + void shouldProcessValidEventAfterDLQEvent() throws Exception { + String invalidJson = "{invalid-json"; + producer.send(new ProducerRecord<>("flow-lifecycle-out", "invalid", invalidJson)).get(); + + String validInstanceId = "wf-" + UUID.randomUUID(); + publishWorkflowEvent(validInstanceId, "RUNNING", ZonedDateTime.now(ZoneOffset.UTC), null, null, null, null); + producer.flush(); + + awaitByWorkflow(validInstanceId); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT status FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, validInstanceId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo("RUNNING"); + } + } + + var dlqRecords = pollDLQ(Duration.ofSeconds(5)); + assertThat(dlqRecords).isNotEmpty(); + } + + @Test + void shouldReiveMessagesInDlqWhenSendingMalformadEvents() throws Exception { + for (int i = 0; i < 5; i++) { + String invalidJson = "{\"invalid-event-" + i + "\":"; + producer.send(new ProducerRecord<>("flow-lifecycle-out", "invalid-" + i, invalidJson)).get(); + } + producer.flush(); + + var dlqRecords = pollDLQ(Duration.ofSeconds(15)); + + assertThat(dlqRecords.size()).isGreaterThanOrEqualTo(5); + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/SuspendedWorkflowIT.java b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/SuspendedWorkflowIT.java new file mode 100644 index 000000000..c6dea8f4a --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/java/org/kubesmarts/logic/dataindex/ingestion/kafka/SuspendedWorkflowIT.java @@ -0,0 +1,46 @@ +package org.kubesmarts.logic.dataindex.ingestion.kafka; + +import io.quarkus.test.junit.QuarkusTest; +import org.junit.jupiter.api.Test; + +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.ResultSet; + +import static org.assertj.core.api.Assertions.assertThat; + +@QuarkusTest +public class SuspendedWorkflowIT extends BaseWorkflowLifecycleIT { + + @Test + void shouldSaveSuspendedWorkflowCloudEvents() throws Exception { + publishEventsToKafka("suspended-workflow.json"); + + String workflowId = "01KSR74SRH4D8A4ZGQ4Q2A56VE"; + awaitByWorkflowStatus(workflowId, "SUSPENDED"); + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT id, status, name, namespace, version " + + "FROM workflow_instances WHERE id = ?")) { + stmt.setString(1, workflowId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getString("status")).isEqualTo("SUSPENDED"); + assertThat(rs.getString("name")).isEqualTo("SwitchLoopWait"); + assertThat(rs.getString("namespace")).isEqualTo("example"); + assertThat(rs.getString("version")).isEqualTo("0.1.0"); + } + } + + try (Connection conn = dataSource.getConnection(); + PreparedStatement stmt = conn.prepareStatement( + "SELECT COUNT(*) as task_count FROM task_instances WHERE instance_id = ?")) { + stmt.setString(1, workflowId); + try (ResultSet rs = stmt.executeQuery()) { + assertThat(rs.next()).isTrue(); + assertThat(rs.getInt("task_count")).isGreaterThan(0); + } + } + } +} diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/application.properties b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/application.properties new file mode 100644 index 000000000..7d2104650 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/application.properties @@ -0,0 +1,16 @@ +mp.messaging.incoming.data-index-events.connector=smallrye-kafka +mp.messaging.incoming.data-index-events.topic=flow-lifecycle-out +mp.messaging.incoming.data-index-events.group.id=data-index-ingestion-test +mp.messaging.incoming.data-index-events.auto.offset.reset=earliest +mp.messaging.incoming.data-index-events.value.deserializer=org.apache.kafka.common.serialization.StringDeserializer +mp.messaging.incoming.data-index-events.key.deserializer=org.apache.kafka.common.serialization.StringDeserializer + +# DLQ configuration for testing +mp.messaging.incoming.data-index-events.failure-strategy=dead-letter-queue +mp.messaging.incoming.data-index-events.dead-letter-queue.topic=data-index-events-dlq +mp.messaging.incoming.data-index-events.dead-letter-queue.key.serializer=org.apache.kafka.common.serialization.StringSerializer +mp.messaging.incoming.data-index-events.dead-letter-queue.value.serializer=org.apache.kafka.common.serialization.StringSerializer +mp.messaging.incoming.data-index-events.retry-attempts=0 + +quarkus.flyway.migrate-at-start=true +quarkus.flyway.locations=classpath:db/migration \ No newline at end of file diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/cancelled-workflow.json b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/cancelled-workflow.json new file mode 100644 index 000000000..621113d25 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/cancelled-workflow.json @@ -0,0 +1,217 @@ +[ + { + "specversion": "1.0", + "id": "f16c9945-8c51-41e1-91a2-687bc885a3f6", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:04:32.948881-03:00", + "data": { + "workflow": "01KSR5FER167JC2SN81K0K2N0S", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:04:32.948864-03:00", + "output": { + "count": 100 + } + } + }, + { + "specversion": "1.0", + "id": "e9f07372-9162-4579-aaa3-06ff2f821e97", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:04:33.951031-03:00", + "data": { + "workflow": "01KSR5FER167JC2SN81K0K2N0S", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:04:33.950923-03:00", + "output": { + "count": 100 + } + } + }, + { + "specversion": "1.0", + "id": "7bcc7e21-c04b-4bb4-9ebe-2fd4ada5b69b", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:04:33.952315-03:00", + "data": { + "name": "01KSR5FER167JC2SN81K0K2N0S", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:04:33.952194-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "6fbf824b-7997-4f9e-b4dc-d4b08ded0ccd", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:04:33.952509-03:00", + "data": { + "workflow": "01KSR5FER167JC2SN81K0K2N0S", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:04:33.952501-03:00" + } + }, + { + "specversion": "1.0", + "id": "f9ac0e0d-e0a0-428b-84ee-e02e3b4c7239", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:04:33.952988-03:00", + "data": { + "workflow": "01KSR5FER167JC2SN81K0K2N0S", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:04:33.95298-03:00", + "output": { + "count": 101 + } + } + }, + { + "specversion": "1.0", + "id": "d5e64349-b58d-4826-bac9-aac4f93cabf3", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:04:33.955016-03:00", + "data": { + "workflow": "01KSR5FER167JC2SN81K0K2N0S", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:04:33.955-03:00" + } + }, + { + "specversion": "1.0", + "id": "14c47a09-1d3e-4848-a8ff-e8d8149d9b01", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:04:33.955577-03:00", + "data": { + "workflow": "01KSR5FER167JC2SN81K0K2N0S", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:04:33.955571-03:00", + "output": { + "count": 101 + } + } + }, + { + "specversion": "1.0", + "id": "e2734831-cee2-40c7-ade7-a089f7debe8d", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:04:33.955822-03:00", + "data": { + "workflow": "01KSR5FER167JC2SN81K0K2N0S", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:04:33.955815-03:00" + } + }, + { + "specversion": "1.0", + "id": "a3fe0325-7816-4e4b-9299-d71ff4cfc3f4", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:04:33.955909-03:00", + "data": { + "name": "01KSR5FER167JC2SN81K0K2N0S", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:04:33.955904-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "39694535-f640-4d74-853e-c885dbf7fed2", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:04:34.154668-03:00", + "data": { + "name": "01KSR5FER167JC2SN81K0K2N0S", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:04:34.154592-03:00", + "status": "CANCELLED" + } + }, + { + "specversion": "1.0", + "id": "50ea8445-5aee-4f13-89a4-a93e250e5227", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.cancelled.v1", + "time": "2026-05-28T18:04:34.155472-03:00", + "data": { + "name": "01KSR5FER167JC2SN81K0K2N0S", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "cancelledAt": "2026-05-28T18:04:34.155454-03:00" + } + }, + { + "specversion": "1.0", + "id": "8d29a8a1-42f3-485c-b4af-ee807ce1186e", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.cancelled.v1", + "time": "2026-05-28T18:04:34.956893-03:00", + "data": { + "workflow": "01KSR5FER167JC2SN81K0K2N0S", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "cancelledAt": "2026-05-28T18:04:34.956803-03:00" + } + } +] \ No newline at end of file diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/faulted-workflow.json b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/faulted-workflow.json new file mode 100644 index 000000000..90496da97 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/faulted-workflow.json @@ -0,0 +1,92 @@ +[ + { + "specversion": "1.0", + "id": "7b356b1a-e54b-4bc3-82cd-27ef7cf3eb03", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T16:55:43.953261-03:00", + "data": { + "name": "01KSR2FQCGEFV9B5V6QQ6PTJDK", + "definition": { "namespace": "quarkus.flow", "name": "faulted-workflow", "version": "0.0.1" }, + "updatedAt": "2026-05-28T16:55:43.953184-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "1dced28c-7733-45b2-b391-a0c976b1217e", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.started.v1", + "time": "2026-05-28T16:55:43.954898-03:00", + "data": { + "name": "01KSR2FQCGEFV9B5V6QQ6PTJDK", + "definition": { "namespace": "quarkus.flow", "name": "faulted-workflow", "version": "0.0.1" }, + "startedAt": "2026-05-28T16:55:43.95464-03:00" + } + }, + { + "specversion": "1.0", + "id": "8c4cf798-10bf-411c-98b5-e919bd773ce7", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T16:55:43.955806-03:00", + "data": { + "workflow": "01KSR2FQCGEFV9B5V6QQ6PTJDK", + "task": "do/0/http-0", + "definition": { "namespace": "quarkus.flow", "name": "faulted-workflow", "version": "0.0.1" }, + "startedAt": "2026-05-28T16:55:43.955762-03:00" + } + }, + { + "specversion": "1.0", + "id": "385bae84-b4da-4d8d-b5d9-c1db4f4144db", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T16:55:44.148226-03:00", + "data": { + "name": "01KSR2FQCGEFV9B5V6QQ6PTJDK", + "definition": { "namespace": "quarkus.flow", "name": "faulted-workflow", "version": "0.0.1" }, + "updatedAt": "2026-05-28T16:55:44.148203-03:00", + "status": "FAULTED" + } + }, + { + "specversion": "1.0", + "id": "9cac3c62-e1ce-44b0-a454-1b79b25d8480", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.faulted.v1", + "time": "2026-05-28T16:55:44.145617-03:00", + "data": { + "workflow": "01KSR2FQCGEFV9B5V6QQ6PTJDK", + "task": "do/0/http-0", + "definition": { "namespace": "quarkus.flow", "name": "faulted-workflow", "version": "0.0.1" }, + "faultedAt": "2026-05-28T16:55:44.145462-03:00", + "error": { + "type": "https://serverlessworkflow.io/spec/1.0.0/errors/data", + "status": 422, + "instance": "do/0/http-0", + "title": null, + "detail": "io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:9899" + } + } + }, + { + "specversion": "1.0", + "id": "4c30f7ca-052c-43d8-a736-7de21b733b45", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.faulted.v1", + "time": "2026-05-28T16:55:44.148727-03:00", + "data": { + "name": "01KSR2FQCGEFV9B5V6QQ6PTJDK", + "definition": { "namespace": "quarkus.flow", "name": "faulted-workflow", "version": "0.0.1" }, + "faultedAt": "2026-05-28T16:55:44.148639-03:00", + "error": { + "type": "https://serverlessworkflow.io/spec/1.0.0/errors/data", + "status": 422, + "instance": "do/0/http-0", + "title": null, + "detail": "io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:9899" + } + } + } +] \ No newline at end of file diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/retryable-workflow.json b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/retryable-workflow.json new file mode 100644 index 000000000..75d266af1 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/retryable-workflow.json @@ -0,0 +1,139 @@ +[ + { + "specversion": "1.0", + "id": "b8533b5f-3767-41f6-89bc-c25f1eecf1ed", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T17:32:47.950823-03:00", + "data": { + "name": "01KSR2GDEGPM83CANG5QRCQ2TE", + "definition": { + "namespace": "test", + "name": "retryable-example", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T17:32:47.950724-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "9ab8a81c-b334-44a8-b39e-4905efb8a7fc", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.started.v1", + "time": "2026-05-28T17:32:47.952436-03:00", + "data": { + "name": "01KSR2GDEGPM83CANG5QRCQ2TE", + "definition": { + "namespace": "test", + "name": "retryable-example", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T17:32:47.952403-03:00" + } + }, + { + "specversion": "1.0", + "id": "99952c4b-e236-4656-bb9a-64224e7f31d4", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T17:32:47.95312-03:00", + "data": { + "workflow": "01KSR2GDEGPM83CANG5QRCQ2TE", + "task": "do/0/tryGetPet/do", + "definition": { + "namespace": "test", + "name": "retryable-example", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T17:32:47.953074-03:00" + } + }, + { + "specversion": "1.0", + "id": "601c303a-4a99-44c4-8d0c-78ef16d07ddf", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T17:32:47.954286-03:00", + "data": { + "workflow": "01KSR2GDEGPM83CANG5QRCQ2TE", + "task": "do/0/tryGetPet/do/0/getPet", + "definition": { + "namespace": "test", + "name": "retryable-example", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T17:32:47.954257-03:00" + } + }, + { + "specversion": "1.0", + "id": "0c160a9c-0a55-4898-8d1e-7d68b3cccb94", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T17:32:48.186207-03:00", + "data": { + "workflow": "01KSR2GDEGPM83CANG5QRCQ2TE", + "task": "do/0/tryGetPet/do/0/getPet", + "definition": { + "namespace": "test", + "name": "retryable-example", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T17:32:48.186152-03:00", + "output": null + } + }, + { + "specversion": "1.0", + "id": "0a293c82-d2b7-4104-b514-c368ab5e59ff", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T17:32:48.186749-03:00", + "data": { + "workflow": "01KSR2GDEGPM83CANG5QRCQ2TE", + "task": "do/0/tryGetPet/do", + "definition": { + "namespace": "test", + "name": "retryable-example", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T17:32:48.186728-03:00", + "output": null + } + }, + { + "specversion": "1.0", + "id": "bb951ecb-4fee-4b02-b2f1-44a1e62ba51a", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T17:32:48.18711-03:00", + "data": { + "name": "01KSR2GDEGPM83CANG5QRCQ2TE", + "definition": { + "namespace": "test", + "name": "retryable-example", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T17:32:48.18709-03:00", + "status": "COMPLETED" + } + }, + { + "specversion": "1.0", + "id": "cb32e4cf-744d-40f2-ba7c-43f5fc3c5293", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.completed.v1", + "time": "2026-05-28T17:32:48.187395-03:00", + "data": { + "name": "01KSR2GDEGPM83CANG5QRCQ2TE", + "definition": { + "namespace": "test", + "name": "retryable-example", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T17:32:48.18738-03:00", + "output": null + } + } +] \ No newline at end of file diff --git a/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/suspended-workflow.json b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/suspended-workflow.json new file mode 100644 index 000000000..d8dde9300 --- /dev/null +++ b/data-index/data-index-ingestion/data-index-ingestion-kafka-service/src/test/resources/suspended-workflow.json @@ -0,0 +1,1936 @@ +[ + { + "specversion": "1.0", + "id": "f47a1e06-2d54-4c73-93e1-a5d3cc5cd19d", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:46.978529-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:46.978515-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "5e76d785-24a8-4e26-97a3-25f62e13af2d", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.started.v1", + "time": "2026-05-28T18:16:46.978744-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:46.978739-03:00" + } + }, + { + "specversion": "1.0", + "id": "aebaf370-9cde-4174-ac60-137ddbde28bb", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:46.97902-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:46.979011-03:00" + } + }, + { + "specversion": "1.0", + "id": "df24082a-d06a-4a97-9b4c-7726a0c2e7b6", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:46.979588-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:46.979574-03:00", + "output": { + "count": 1 + } + } + }, + { + "specversion": "1.0", + "id": "7b194b35-4c98-4655-9ef4-d4f99cb793db", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:46.981057-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:46.981049-03:00" + } + }, + { + "specversion": "1.0", + "id": "936e1416-b25c-4177-b6db-321346675d04", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:46.981384-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:46.981378-03:00", + "output": { + "count": 1 + } + } + }, + { + "specversion": "1.0", + "id": "dde7476d-500c-4fcf-91a6-9218d801fbff", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:46.981681-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:46.981675-03:00" + } + }, + { + "specversion": "1.0", + "id": "65f4813f-412f-431d-94bf-9c7eeb1fea28", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:46.981813-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:46.98179-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "8308a21e-3849-4d4f-9bcb-ed7575d16526", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:47.982231-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:47.98211-03:00", + "output": { + "count": 1 + } + } + }, + { + "specversion": "1.0", + "id": "e5ed991d-2223-49fc-ba6f-6eb9f0edbcad", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:47.983463-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:47.983422-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "3f0aaf4f-2bbd-4a4c-9ee3-8a603df65468", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:47.98376-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:47.983744-03:00" + } + }, + { + "specversion": "1.0", + "id": "1e8df227-e844-439f-87f3-da23d35a49ef", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:47.984555-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:47.984537-03:00", + "output": { + "count": 2 + } + } + }, + { + "specversion": "1.0", + "id": "23a9b62a-8ce2-4ed9-baa4-4413e2b3aa59", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:47.98663-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:47.986611-03:00" + } + }, + { + "specversion": "1.0", + "id": "65de1ba6-8f7c-40de-9e60-3d1737435bb2", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:47.987514-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:47.987496-03:00", + "output": { + "count": 2 + } + } + }, + { + "specversion": "1.0", + "id": "d3379527-f3d8-4bba-8346-b6d2198df125", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:47.987935-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:47.987926-03:00" + } + }, + { + "specversion": "1.0", + "id": "08c55a6a-f756-481c-bb03-32224e97fafc", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:47.988042-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:47.988036-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "3044eda5-bd00-47ea-bc1e-f3e80005a910", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:48.9887-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:48.988599-03:00", + "output": { + "count": 2 + } + } + }, + { + "specversion": "1.0", + "id": "5c9f3374-af37-4672-8442-8c3db1574772", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:48.989872-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:48.989844-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "31119284-b270-4c8f-8988-6fb1db6a16a2", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:48.990114-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:48.9901-03:00" + } + }, + { + "specversion": "1.0", + "id": "18f2bad9-d3bd-4c77-8d6b-6ca2f4efeffc", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:48.990806-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:48.990799-03:00", + "output": { + "count": 3 + } + } + }, + { + "specversion": "1.0", + "id": "09712118-b8e7-459c-afd3-ee1b42ad4885", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:48.992073-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:48.992064-03:00" + } + }, + { + "specversion": "1.0", + "id": "85c727e8-be49-4ed2-8b5d-97fde58a78b0", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:48.992739-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:48.99273-03:00", + "output": { + "count": 3 + } + } + }, + { + "specversion": "1.0", + "id": "b3e942f7-a1cd-4297-a74b-02ff7cc80610", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:48.992955-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:48.992948-03:00" + } + }, + { + "specversion": "1.0", + "id": "ff5a8eb2-96a9-4caa-95d2-d359cc21eb87", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:48.993137-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:48.993127-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "3506f565-0876-4ec4-bef6-47995e555dfa", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:49.994476-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:49.994376-03:00", + "output": { + "count": 3 + } + } + }, + { + "specversion": "1.0", + "id": "7d774e9f-e199-4ef2-9e9b-666fa5cfe5ca", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:49.995567-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:49.995534-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "9f69f341-42fd-49b6-b588-369bd1ab6db2", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:49.995826-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:49.995811-03:00" + } + }, + { + "specversion": "1.0", + "id": "f89b114f-b993-44b6-b7f9-19f80f69a1bd", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:49.996567-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:49.996559-03:00", + "output": { + "count": 4 + } + } + }, + { + "specversion": "1.0", + "id": "7d436e47-9c7e-415a-afff-0a8cefdd94ef", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:49.998347-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:49.998319-03:00" + } + }, + { + "specversion": "1.0", + "id": "c2e49289-3b09-46ee-bdcb-716f1781ae6d", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:49.999162-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:49.999151-03:00", + "output": { + "count": 4 + } + } + }, + { + "specversion": "1.0", + "id": "6585ad86-1e2d-4ef3-950b-59fb2a9038f1", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:49.99964-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:49.999626-03:00" + } + }, + { + "specversion": "1.0", + "id": "e5a31418-e41c-460e-88e6-87ca85943e32", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:49.999857-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:49.999848-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "36c55d7e-8ebd-4e54-9440-16fe31bb70e2", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:51.001204-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:51.001096-03:00", + "output": { + "count": 4 + } + } + }, + { + "specversion": "1.0", + "id": "3420e989-a4ba-489c-bed2-e15380b61580", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:51.002368-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:51.00234-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "8e86bf6e-8664-401c-8124-52417d74c389", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:51.00255-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:51.002543-03:00" + } + }, + { + "specversion": "1.0", + "id": "f436baa5-744c-4c74-a336-e34dc184c4cc", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:51.003025-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:51.003019-03:00", + "output": { + "count": 5 + } + } + }, + { + "specversion": "1.0", + "id": "8b9f17dc-4750-4d86-9edc-801be0ca52ff", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:51.004821-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:51.0048-03:00" + } + }, + { + "specversion": "1.0", + "id": "0d3a1e48-6ffd-45d0-9c71-897b86af5ff8", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:51.005601-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:51.005587-03:00", + "output": { + "count": 5 + } + } + }, + { + "specversion": "1.0", + "id": "42ca88b1-74de-4294-a9bd-43eb49af2615", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:51.005991-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:51.00598-03:00" + } + }, + { + "specversion": "1.0", + "id": "6d8a2440-6828-4046-9975-77e65d4627e8", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:51.006141-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:51.006132-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "a9857116-9a60-4e59-8402-0b9209092473", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:52.007382-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:52.007335-03:00", + "output": { + "count": 5 + } + } + }, + { + "specversion": "1.0", + "id": "cc088ba3-4228-4ecc-ac5d-01ed716c0470", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:52.007783-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:52.007773-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "7847fec6-c169-49d7-b631-b551b7818d71", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:52.007878-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:52.007873-03:00" + } + }, + { + "specversion": "1.0", + "id": "f1c573d6-a90d-4cba-aaf6-671ad5f01f77", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:52.008135-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:52.008131-03:00", + "output": { + "count": 6 + } + } + }, + { + "specversion": "1.0", + "id": "ac2b5703-749c-405b-a149-68ace0c5e6cc", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:52.010956-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:52.010936-03:00" + } + }, + { + "specversion": "1.0", + "id": "5051c122-2b64-452d-b146-409728610132", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:52.011637-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:52.011627-03:00", + "output": { + "count": 6 + } + } + }, + { + "specversion": "1.0", + "id": "cf0866a0-8127-45ba-ac3d-7e7ac3305f9f", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:52.011957-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:52.011938-03:00" + } + }, + { + "specversion": "1.0", + "id": "8b81158b-e4af-40f8-b8ea-e93c1cce71c1", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:52.012081-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:52.012072-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "e41c052d-b1dd-47be-9371-0823a7c8288d", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:53.013392-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:53.013312-03:00", + "output": { + "count": 6 + } + } + }, + { + "specversion": "1.0", + "id": "ea86acfc-8f64-4b7d-b3d2-2f6f67ffd5b0", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:53.014388-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:53.014346-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "6a169fc0-066e-4840-8334-d9586886bb97", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:53.014639-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:53.014627-03:00" + } + }, + { + "specversion": "1.0", + "id": "0e863859-c79d-4d1d-b248-2246563df79f", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:53.015462-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:53.015447-03:00", + "output": { + "count": 7 + } + } + }, + { + "specversion": "1.0", + "id": "a6852fe3-9807-47c1-9129-f212bf8bfca6", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:53.017183-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:53.017161-03:00" + } + }, + { + "specversion": "1.0", + "id": "2102eaba-685b-47e8-9ba2-d3f1e1ef2fc1", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:53.017851-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:53.017847-03:00", + "output": { + "count": 7 + } + } + }, + { + "specversion": "1.0", + "id": "d845704f-28d9-43c5-81c7-9ca5ef7b2d7c", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:53.017996-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:53.017991-03:00" + } + }, + { + "specversion": "1.0", + "id": "cd7af0b1-9a39-4775-9d3f-c7ce22f4cf91", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:53.018052-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:53.018046-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "a732d6fc-d9cb-4765-897b-0313b7526d83", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:54.019396-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:54.019293-03:00", + "output": { + "count": 7 + } + } + }, + { + "specversion": "1.0", + "id": "b945f051-51bb-4531-b4d7-aee56419822d", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:54.020926-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:54.020898-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "bba99429-9a4f-4d28-bb40-018ff2299faf", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:54.02114-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:54.021132-03:00" + } + }, + { + "specversion": "1.0", + "id": "74b4d707-f638-435f-a8f9-f9442a2eefb4", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:54.021702-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:54.021693-03:00", + "output": { + "count": 8 + } + } + }, + { + "specversion": "1.0", + "id": "45441a90-1a4b-4fb8-a779-248ebdda890d", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:54.023462-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:54.02345-03:00" + } + }, + { + "specversion": "1.0", + "id": "94d59abe-8f03-4694-8b9c-418f48b5324f", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:54.024513-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:54.024486-03:00", + "output": { + "count": 8 + } + } + }, + { + "specversion": "1.0", + "id": "2a6c6078-a5fb-4363-b151-d6cc099bc587", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:54.025154-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:54.025137-03:00" + } + }, + { + "specversion": "1.0", + "id": "8c4a3e60-7e2b-4746-b6ac-9c377442b623", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:54.025547-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:54.025499-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "ac0bd2b1-0948-4975-8d52-1fb86215095b", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:55.026963-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:55.026866-03:00", + "output": { + "count": 8 + } + } + }, + { + "specversion": "1.0", + "id": "ecb7acc4-1410-4246-96c8-ab6f43022b97", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:55.028066-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:55.028033-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "85689a1a-d20f-4d1d-a894-b669067b81d3", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:55.028356-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:55.028342-03:00" + } + }, + { + "specversion": "1.0", + "id": "63464322-c9c0-4c9e-aaed-7a79eaff0aa7", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:55.029047-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:55.029039-03:00", + "output": { + "count": 9 + } + } + }, + { + "specversion": "1.0", + "id": "9e8cf750-0b13-41eb-adfa-333eba418cc4", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:55.029994-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:55.029982-03:00" + } + }, + { + "specversion": "1.0", + "id": "e832ff39-def8-4fa2-96fb-fb3c8a13e103", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:55.030749-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:55.030736-03:00", + "output": { + "count": 9 + } + } + }, + { + "specversion": "1.0", + "id": "76fde24e-71ee-4770-b4aa-623594b1c5d8", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:55.031167-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:55.031161-03:00" + } + }, + { + "specversion": "1.0", + "id": "76046963-2010-4441-9c5c-9ebb68131076", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:55.031247-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:55.03124-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "3b7b45d9-3d46-4ebb-a484-67e3f147d01f", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:56.032015-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:56.031914-03:00", + "output": { + "count": 9 + } + } + }, + { + "specversion": "1.0", + "id": "44bb58e3-0e2b-490d-9775-3cb809d95dd0", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:56.033125-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:56.033093-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "2f9b48d3-f20f-48b3-a7f4-19f0a767ae0c", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:56.03339-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:56.033378-03:00" + } + }, + { + "specversion": "1.0", + "id": "cfccd65f-c28f-45c0-bb2d-af9ae0276b28", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:56.034153-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:56.034139-03:00", + "output": { + "count": 10 + } + } + }, + { + "specversion": "1.0", + "id": "10daba02-4835-43be-8ba6-d8134c17425a", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:56.035776-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:56.035759-03:00" + } + }, + { + "specversion": "1.0", + "id": "c34a6844-36e3-448e-950d-46f599012b9a", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:56.036427-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:56.036418-03:00", + "output": { + "count": 10 + } + } + }, + { + "specversion": "1.0", + "id": "fb4eac88-5084-4bb4-9ff5-69b6f717340e", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:56.036675-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:56.036668-03:00" + } + }, + { + "specversion": "1.0", + "id": "852a0bdf-f7e4-42b1-a81d-715b1c4d3412", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:56.036757-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:56.036751-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "f8edefc7-d565-41bf-b51b-cc75d0b74af2", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:57.038081-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:57.038006-03:00", + "output": { + "count": 10 + } + } + }, + { + "specversion": "1.0", + "id": "95304fda-396a-4d8d-82f2-cb9292d3f8d6", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:57.039144-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:57.039066-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "8207d9bc-04b7-4549-8893-f7ed7ab25225", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:57.039451-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:57.039433-03:00" + } + }, + { + "specversion": "1.0", + "id": "38cb5039-6b97-45d5-9d1a-cbe6f2875f07", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:57.040117-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:57.040104-03:00", + "output": { + "count": 11 + } + } + }, + { + "specversion": "1.0", + "id": "f32afb49-6780-47cd-b581-fedbdc82fdd7", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:57.041683-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:57.041655-03:00" + } + }, + { + "specversion": "1.0", + "id": "f7320dd0-38d0-4013-9ec3-f44bfb5dfe3b", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:57.043055-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:57.043038-03:00", + "output": { + "count": 11 + } + } + }, + { + "specversion": "1.0", + "id": "3bc177a3-dfe6-45e9-b5b9-bfd221221b75", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:57.043561-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:57.043551-03:00" + } + }, + { + "specversion": "1.0", + "id": "453049e5-c27b-4774-b9b1-a7d018a0a425", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:57.043737-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:57.043724-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "89feba59-46eb-48e9-8559-566ef4054681", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:58.044413-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:58.044332-03:00", + "output": { + "count": 11 + } + } + }, + { + "specversion": "1.0", + "id": "02b61f6e-83f8-46e5-870d-dbc7e0468e73", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:58.045524-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:58.045486-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "00273a62-ca1c-4113-95a3-68c2351977ea", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:58.045794-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:58.045781-03:00" + } + }, + { + "specversion": "1.0", + "id": "decd9256-9b2c-486f-b470-44cb54148962", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:58.046471-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:58.046458-03:00", + "output": { + "count": 12 + } + } + }, + { + "specversion": "1.0", + "id": "4a452a14-9476-4f6a-816e-9f8636b36418", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:58.048176-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:58.048155-03:00" + } + }, + { + "specversion": "1.0", + "id": "ee5f15a1-cddb-4ae0-a8b3-9670f43e116b", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:58.049158-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:58.049146-03:00", + "output": { + "count": 12 + } + } + }, + { + "specversion": "1.0", + "id": "1bc88449-473b-492b-9405-710cf7098322", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:58.04959-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:58.049574-03:00" + } + }, + { + "specversion": "1.0", + "id": "2ec5e064-55b6-408d-82d0-9e6649de3acc", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:58.049753-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:58.049738-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "6f296875-d03f-473e-a37b-3a8db2f8d9f6", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:59.051405-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:59.051222-03:00", + "output": { + "count": 12 + } + } + }, + { + "specversion": "1.0", + "id": "82362473-43e9-4d4f-ac66-425b1c3b69ec", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:59.0523-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:59.052274-03:00", + "status": "RUNNING" + } + }, + { + "specversion": "1.0", + "id": "1a99e2d6-89a8-4801-b211-a664c93d75b4", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:59.052453-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:59.052445-03:00" + } + }, + { + "specversion": "1.0", + "id": "3b044eeb-78ff-49dd-97de-ab8f5f31339a", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:59.052945-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/0/inc", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:59.052935-03:00", + "output": { + "count": 13 + } + } + }, + { + "specversion": "1.0", + "id": "a70203ec-20ea-45ea-9683-28b080466887", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:59.054701-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:59.054688-03:00" + } + }, + { + "specversion": "1.0", + "id": "9fc14ec4-ff6d-4cb5-b626-66327a83c8ed", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:16:59.055304-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/1/looping", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:16:59.055294-03:00", + "output": { + "count": 13 + } + } + }, + { + "specversion": "1.0", + "id": "fe1b8a0a-80e8-4d07-9bb7-40d25b3fe3c9", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.started.v1", + "time": "2026-05-28T18:16:59.055577-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "startedAt": "2026-05-28T18:16:59.055568-03:00" + } + }, + { + "specversion": "1.0", + "id": "dbe942f6-3bd8-4bb0-af5a-a6f753708458", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:59.055683-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:59.055676-03:00", + "status": "WAITING" + } + }, + { + "specversion": "1.0", + "id": "c30cad19-36d5-4f29-903d-f656e3119a3e", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.status-changed.v1", + "time": "2026-05-28T18:16:59.784919-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "updatedAt": "2026-05-28T18:16:59.784872-03:00", + "status": "SUSPENDED" + } + }, + { + "specversion": "1.0", + "id": "0481fc10-45f1-4dd2-a419-905500d248e0", + "source": "reference-impl", + "type": "io.serverlessworkflow.workflow.suspended.v1", + "time": "2026-05-28T18:16:59.785221-03:00", + "data": { + "name": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "suspendedAt": "2026-05-28T18:16:59.78521-03:00" + } + }, + { + "specversion": "1.0", + "id": "65d83146-479d-442c-98b4-44b47fd9bfbd", + "source": "reference-impl", + "type": "io.serverlessworkflow.task.completed.v1", + "time": "2026-05-28T18:17:00.05704-03:00", + "data": { + "workflow": "01KSR74SRH4D8A4ZGQ4Q2A56VE", + "task": "do/2/waitABit", + "definition": { + "namespace": "example", + "name": "SwitchLoopWait", + "version": "0.1.0" + }, + "completedAt": "2026-05-28T18:17:00.056953-03:00", + "output": { + "count": 13 + } + } + } +] diff --git a/data-index/data-index-ingestion/pom.xml b/data-index/data-index-ingestion/pom.xml new file mode 100644 index 000000000..020a9bf48 --- /dev/null +++ b/data-index/data-index-ingestion/pom.xml @@ -0,0 +1,41 @@ + + + + + org.kubesmarts.logic.apps + data-index + 999-SNAPSHOT + ../pom.xml + + 4.0.0 + + data-index-ingestion + pom + KubeSmarts Logic Apps :: Data Index :: Ingestion + Kafka-based event ingestion for Data Index (MODE 3) + + + data-index-ingestion-kafka-processor + data-index-ingestion-kafka-service + + + diff --git a/data-index/data-index-model/pom.xml b/data-index/data-index-model/pom.xml index f765af2f8..a9dd66ab4 100644 --- a/data-index/data-index-model/pom.xml +++ b/data-index/data-index-model/pom.xml @@ -60,6 +60,11 @@ persistence-commons-api + + io.serverlessworkflow + serverlessworkflow-impl-core + + org.junit.jupiter diff --git a/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/LifecycleEventUtils.java b/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/LifecycleEventUtils.java new file mode 100644 index 000000000..6285c087d --- /dev/null +++ b/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/LifecycleEventUtils.java @@ -0,0 +1,80 @@ +package org.kubesmarts.logic.dataindex.model; + +import io.cloudevents.CloudEvent; +import io.serverlessworkflow.impl.LifecycleEvents; +import io.serverlessworkflow.impl.WorkflowStatus; +import io.serverlessworkflow.impl.lifecycle.ce.TaskCancelledCEData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskFailedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskResumedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskRetriedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.TaskCompletedCEDataWithOutput; +import io.serverlessworkflow.impl.lifecycle.ce.TaskStartedCEDataWithInput; +import io.serverlessworkflow.impl.lifecycle.ce.TaskSuspendedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowCancelledCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowCompletedCEDataWithOutput; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowFailedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowResumedCEData; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowStartedCEDataWithInput; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowStatusCEDataEvent; +import io.serverlessworkflow.impl.lifecycle.ce.WorkflowSuspendedCEData; + +import java.util.HashMap; +import java.util.Map; + +public final class LifecycleEventUtils { + + private LifecycleEventUtils() { + } + + private static final Map EVENTS = new HashMap<>(); + + static { + EVENTS.put(LifecycleEvents.WORKFLOW_STARTED, WorkflowStartedCEDataWithInput.class); + EVENTS.put(LifecycleEvents.WORKFLOW_RESUMED, WorkflowResumedCEData.class); + EVENTS.put(LifecycleEvents.WORKFLOW_SUSPENDED, WorkflowSuspendedCEData.class); + EVENTS.put(LifecycleEvents.WORKFLOW_CANCELLED, WorkflowCancelledCEData.class); + EVENTS.put(LifecycleEvents.WORKFLOW_COMPLETED, WorkflowCompletedCEDataWithOutput.class); + EVENTS.put(LifecycleEvents.WORKFLOW_FAULTED, WorkflowFailedCEData.class); + EVENTS.put(LifecycleEvents.WORKFLOW_STATUS_CHANGED, WorkflowStatusCEDataEvent.class); + EVENTS.put(LifecycleEvents.TASK_STARTED, TaskStartedCEDataWithInput.class); + EVENTS.put(LifecycleEvents.TASK_CANCELLED, TaskCancelledCEData.class); + EVENTS.put(LifecycleEvents.TASK_COMPLETED, TaskCompletedCEDataWithOutput.class); + EVENTS.put(LifecycleEvents.TASK_RESUMED, TaskResumedCEData.class); + EVENTS.put(LifecycleEvents.TASK_SUSPENDED, TaskSuspendedCEData.class); + EVENTS.put(LifecycleEvents.TASK_FAULTED, TaskFailedCEData.class); + EVENTS.put(LifecycleEvents.TASK_RETRIED, TaskRetriedCEData.class); + } + + + /** + * Define event or workflow status based on {@link CloudEvent#getType()}. + *

+ * The {@link LifecycleEvents#WORKFLOW_STATUS_CHANGED} is ignored and return null. + */ + public static String defineStatusLooking(String eventType) { + return switch (eventType) { + case LifecycleEvents.TASK_RESUMED, + LifecycleEvents.TASK_STARTED, + // workflow + LifecycleEvents.WORKFLOW_RESUMED, + LifecycleEvents.WORKFLOW_STARTED -> WorkflowStatus.RUNNING.name(); + case LifecycleEvents.TASK_SUSPENDED, + LifecycleEvents.WORKFLOW_SUSPENDED -> WorkflowStatus.SUSPENDED.name(); + case LifecycleEvents.TASK_CANCELLED, + LifecycleEvents.WORKFLOW_CANCELLED -> WorkflowStatus.CANCELLED.name(); + case LifecycleEvents.TASK_FAULTED -> "FAILED"; // for task faulted should be FAILED + case LifecycleEvents.WORKFLOW_FAULTED -> WorkflowStatus.FAULTED.name(); + case LifecycleEvents.TASK_COMPLETED, + LifecycleEvents.WORKFLOW_COMPLETED -> WorkflowStatus.COMPLETED.name(); + // "status-changed" is not handled because it points to the new status in the event payload + default -> null; + }; + } + + public static Class getEventClass(String type) { + if (EVENTS.get(type) != null) { + return EVENTS.get(type); + } + throw new IllegalArgumentException(type + " is not a valid lifecycle event type"); + } +} diff --git a/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/TaskExecution.java b/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/TaskExecution.java index 42958adcc..3b535c147 100644 --- a/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/TaskExecution.java +++ b/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/TaskExecution.java @@ -86,6 +86,18 @@ public class TaskExecution { @Ignore private JsonNode output; + /** + * Workflow instance ID (for event context). + *

The workflow instance this task belongs to. + */ + private String instanceId; + + /** + * Event timestamp (when the event occurred). + *

Distinct from task execution times (start/end) + */ + private ZonedDateTime eventTimestamp; + public String getId() { return id; } @@ -158,6 +170,22 @@ public void setOutput(JsonNode output) { this.output = output; } + public String getInstanceId() { + return instanceId; + } + + public void setInstanceId(String instanceId) { + this.instanceId = instanceId; + } + + public ZonedDateTime getEventTimestamp() { + return eventTimestamp; + } + + public void setEventTimestamp(ZonedDateTime eventTimestamp) { + this.eventTimestamp = eventTimestamp; + } + /** * Get input data as JSON string for GraphQL. * @return JSON string or null if no input @@ -204,6 +232,8 @@ public String toString() { ", start=" + start + ", end=" + end + ", error=" + error + + ", instanceId='" + instanceId + '\'' + + ", eventTimestamp=" + eventTimestamp + '}'; } } diff --git a/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/WorkflowInstance.java b/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/WorkflowInstance.java index bcf363e96..fefdadb57 100644 --- a/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/WorkflowInstance.java +++ b/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/WorkflowInstance.java @@ -141,6 +141,12 @@ public class WorkflowInstance { @Ignore private Workflow workflow; + /** + * Event timestamp (when the event occurred). + *

Distinct from instance start/end times + */ + private ZonedDateTime eventTimestamp; + public String getId() { return id; } @@ -247,6 +253,14 @@ public void setWorkflow(Workflow workflow) { this.workflow = workflow; } + public ZonedDateTime getEventTimestamp() { + return eventTimestamp; + } + + public void setEventTimestamp(ZonedDateTime eventTimestamp) { + this.eventTimestamp = eventTimestamp; + } + /** * Get input data as JSON string for GraphQL. * @return JSON string or null if no input @@ -299,6 +313,7 @@ public String toString() { ", taskExecutions=" + taskExecutions + ", error=" + error + ", workflow=" + workflow + + ", eventTimestamp=" + eventTimestamp + '}'; } } diff --git a/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/WorkflowInstanceStatus.java b/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/WorkflowInstanceStatus.java index d10f07f0f..774747f6c 100644 --- a/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/WorkflowInstanceStatus.java +++ b/data-index/data-index-model/src/main/java/org/kubesmarts/logic/dataindex/model/WorkflowInstanceStatus.java @@ -31,7 +31,9 @@ * v1.0.0 uses string status values aligned with SW 1.0.0 spec. */ public enum WorkflowInstanceStatus { + PENDING, RUNNING, + WAITING, COMPLETED, FAULTED, CANCELLED, diff --git a/data-index/docs/deployment/MODE3_KAFKA_INGESTION.md b/data-index/docs/deployment/MODE3_KAFKA_INGESTION.md new file mode 100644 index 000000000..8519ee90d --- /dev/null +++ b/data-index/docs/deployment/MODE3_KAFKA_INGESTION.md @@ -0,0 +1,602 @@ +# MODE 3 (Kafka) Deployment Guide + +**Status:** Production Ready +**Last Updated:** 2026-05-29 + +--- + +## Overview + +MODE 3 is a Kafka-based event ingestion service that provides an alternative to the FluentBit + PostgreSQL triggers approach (MODE 1). This guide covers deployment, configuration, and troubleshooting for production environments. + +**Event Pipeline:** +``` +Quarkus Flow → Kafka (CloudEvents) → Kafka Ingestion Service → PostgreSQL → GraphQL API +``` + +**Use MODE 3 when:** +- Kafka infrastructure already exists in your environment +- Security requirements demand events not be written to disk (credit cards, PII, etc.) +- You need encrypted transport (SSL/SASL_SSL) +- Direct stream processing is preferred over log-based ingestion +- You want to leverage Kafka's at-least-once delivery guarantees + +--- + +## Architecture + +### Components + +1. **Quarkus Flow** - Publishes workflow/task lifecycle events as CloudEvents to Kafka +2. **Kafka Broker** - Topic: `flow-lifecycle-out` (raw CloudEvents) +3. **Data Index Ingestion Service** - Consumes CloudEvents and normalizes to PostgreSQL +4. **PostgreSQL** - Normalized tables: `workflow_instances`, `task_instances` +5. **Data Index GraphQL API** - Query service (same as MODE 1) + +### Processing Flow + +``` +CloudEvent Validation + ↓ +Event Type Routing (workflow vs task) + ↓ +Mapper (CloudEvent → WorkflowInstanceEvent / TaskExecutionEvent) + ↓ +WorkflowEventProcessor / TaskExecutionProcessor + ↓ +JDBC UPSERT with Field-Level Idempotency + ↓ +PostgreSQL normalized tables + ↓ +(failed records → data-index-events-dlq topic) +``` + +### Key Design Decisions + +| Aspect | Decision | Rationale | +|--------|----------|-----------| +| **Event Format** | CloudEvents (v1.0) | Standard, platform-independent, includes metadata | +| **Database Access** | JDBC (not JPA) | Direct SQL for performance, UPSERT idempotency | +| **Normalization** | Java processors (not SQL triggers) | Same idempotency logic as MODE 1, but in-service | +| **Error Handling** | Dead-letter queue | Failed events captured for replay/investigation | +| **Task Identity** | Composite key `(instance_id, task_position)` | Handles Quarkus Flow's changing taskExecutionId per event | +| **FK Recovery** | SavePoint + placeholder workflow | Handles out-of-order task events (before workflow) | + +--- + +## Deployment + +### Kubernetes Manifest + +```yaml +apiVersion: v1 +kind: Namespace +metadata: + name: data-index +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: data-index-ingestion-kafka + namespace: data-index +spec: + replicas: 1 # Single instance recommended; Kafka consumer group handles scaling + selector: + matchLabels: + app: data-index-ingestion-kafka + template: + metadata: + labels: + app: data-index-ingestion-kafka + spec: + containers: + - name: kafka-ingestion + image: kubesmarts/data-index-ingestion-kafka-service:999-SNAPSHOT + imagePullPolicy: IfNotPresent + ports: + - containerPort: 8080 + name: http + env: + # Kafka Configuration + - name: KAFKA_BOOTSTRAP_SERVERS + valueFrom: + configMapKeyRef: + name: kafka-config + key: bootstrap.servers + # Database Configuration + - name: QUARKUS_DATASOURCE_JDBC_URL + valueFrom: + secretKeyRef: + name: database-credentials + key: jdbc-url + - name: QUARKUS_DATASOURCE_USERNAME + valueFrom: + secretKeyRef: + name: database-credentials + key: username + - name: QUARKUS_DATASOURCE_PASSWORD + valueFrom: + secretKeyRef: + name: database-credentials + key: password + # Optional: Application Configuration + - name: QUARKUS_LOG_LEVEL + value: "INFO" + livenessProbe: + httpGet: + path: /q/health/live + port: 8080 + initialDelaySeconds: 30 + periodSeconds: 10 + timeoutSeconds: 5 + readinessProbe: + httpGet: + path: /q/health/ready + port: 8080 + initialDelaySeconds: 10 + periodSeconds: 5 + timeoutSeconds: 5 + resources: + requests: + cpu: 500m + memory: 512Mi + limits: + cpu: 2000m + memory: 2Gi +--- +apiVersion: v1 +kind: Service +metadata: + name: data-index-ingestion-kafka + namespace: data-index +spec: + selector: + app: data-index-ingestion-kafka + ports: + - port: 8080 + targetPort: 8080 + name: http + type: ClusterIP +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: kafka-config + namespace: data-index +data: + bootstrap.servers: "kafka.kafka.svc.cluster.local:9092" +--- +apiVersion: v1 +kind: Secret +metadata: + name: database-credentials + namespace: data-index +type: Opaque +stringData: + jdbc-url: "jdbc:postgresql://postgresql.data-index.svc.cluster.local:5432/data-index" + username: "data-index" + password: "CHANGE_ME" +``` + +### Configuration + +#### Required Environment Variables + +| Variable | Required | Default | Notes | +|----------|----------|---------|-------| +| `KAFKA_BOOTSTRAP_SERVERS` | Yes (prod) | localhost:29092 (dev) | Comma-separated broker URLs | +| `QUARKUS_DATASOURCE_JDBC_URL` | Yes (prod) | jdbc:h2:mem:test (dev) | PostgreSQL JDBC connection string | +| `QUARKUS_DATASOURCE_USERNAME` | Yes (prod) | (dev services) | Database username | +| `QUARKUS_DATASOURCE_PASSWORD` | Yes (prod) | (dev services) | Database password | + +#### Optional Configuration + +| Property | Default | Description | +|-------------------------------------------------------------------|---------|-------------| +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_TOPIC` | `flow-lifecycle-out` | Kafka topic name | +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_GROUP_ID` | `data-index-ingestion` | Kafka consumer group | +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_AUTO_OFFSET_RESET` | `earliest` | Offset reset strategy | +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_RETRY_ATTEMPTS` | `2` | Retries before DLQ | +| `MP_MESSAGING_INCOMING_DATA_INDEX_EVENTS_DEAD_LETTER_QUEUE_TOPIC` | `data-index-events-dlq` | Dead letter queue topic | +| `quarkus.log.level` | `INFO` | Logging level | + +#### Kafka Security (Optional) + +For SASL/SSL authentication, add to application.properties or environment: + +```properties +# SASL Configuration +mp.messaging.incoming.data-index-events.security.protocol=SASL_SSL +mp.messaging.incoming.data-index-events.sasl.mechanism=PLAIN +mp.messaging.incoming.data-index-events.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; + +# SSL Configuration +mp.messaging.incoming.data-index-events.ssl.truststore.location=/etc/ssl/certs/kafka-truststore.jks +mp.messaging.incoming.data-index-events.ssl.truststore.password= +``` + +--- + +## Kafka Topic Setup + +### Required Topics + +1. **`flow-lifecycle-out`** - Main event topic (published by Quarkus Flow) +2. **`data-index-events-dlq`** - Dead-letter queue for failed records + +### Auto-Creation + +In development, topics are auto-created when the cluster runs with: +```properties +KAFKA_AUTO_CREATE_TOPICS_ENABLE=true +``` + +In production, pre-create topics: + +```bash +# Create main topic (replicas=3 for HA, partitions=3 for parallelism) +kafka-topics.sh --create \ + --bootstrap-server kafka.kafka.svc.cluster.local:9092 \ + --topic flow-lifecycle-out \ + --replication-factor 3 \ + --partitions 3 \ + --config retention.ms=86400000 \ # 24 hours + --config min.insync.replicas=2 + +# Create DLQ topic +kafka-topics.sh --create \ + --bootstrap-server kafka.kafka.svc.cluster.local:9092 \ + --topic data-index-events-dlq \ + --replication-factor 3 \ + --partitions 1 \ + --config retention.ms=604800000 # 7 days (for investigation) +``` + +--- + +## Event Format + +### CloudEvent (v1.0) + +```json +{ + "specversion": "1.0", + "type": "io.serverlessworkflow.workflow.started.v1", + "source": "/workflow/executions/01KSGKY66DMS0KPPMFMMR3BJZX", + "id": "event-123", + "time": "2026-05-25T22:40:10.676900Z", + "datacontenttype": "application/json", + "data": { + "instanceId": "01KSGKY66DMS0KPPMFMMR3BJZX", + "workflowName": "order-processing", + "workflowNamespace": "org.acme", + "workflowVersion": "1.0.0", + "status": "RUNNING", + "startTime": "2026-05-25T19:40:10.676802-03:00", + "lastUpdateTime": "2026-05-25T19:40:10.676802-03:00", + "input": { "orderId": "ORD-789" } + } +} +``` + +### Supported Event Types + +| Event Type | Processor | Field Updates | +|------------|-----------|----------------| +| `io.serverlessworkflow.workflow.started` | WorkflowEventProcessor | start, status→RUNNING | +| `io.serverlessworkflow.workflow.completed` | WorkflowEventProcessor | end, status→COMPLETED, output | +| `io.serverlessworkflow.workflow.faulted` | WorkflowEventProcessor | end, status→FAULTED, error | +| `io.serverlessworkflow.workflow.suspended` | WorkflowEventProcessor | status→SUSPENDED | +| `io.serverlessworkflow.workflow.cancelled` | WorkflowEventProcessor | status→CANCELLED | +| `io.serverlessworkflow.task.started` | TaskExecutionProcessor | start, status→RUNNING | +| `io.serverlessworkflow.task.completed` | TaskExecutionProcessor | end, status→COMPLETED, output | +| `io.serverlessworkflow.task.faulted` | TaskExecutionProcessor | end, status→FAULTED, error | +| `io.serverlessworkflow.task.suspended` | TaskExecutionProcessor | status→SUSPENDED | +| `io.serverlessworkflow.task.cancelled` | TaskExecutionProcessor | status→CANCELLED | + +### Timestamp Handling + +All timestamp fields are automatically converted to **UTC OffsetDateTime** and stored as `TIMESTAMP WITH TIME ZONE` in PostgreSQL. + +Accepted formats: +- **ISO-8601 with offset** (recommended): `2026-05-25T19:40:10.676802-03:00` +- **ISO-8601 UTC**: `2026-05-25T22:40:10.676900Z` +- **Unix epoch seconds**: `1747486200` + +--- + +## Idempotency Guarantees + +MODE 3 implements **field-level idempotency** to handle out-of-order and duplicate events: + +### Immutable Fields (First Value Wins) + +Once set, never updated: +- `workflow.start`, `workflow.input`, `workflow.name`, `workflow.version`, `workflow.namespace` +- `task.start`, `task.input`, `task.taskName`, `task.taskPosition` + +### Terminal Fields (Last Non-Null Wins) + +Updated only if incoming event is newer (based on `last_event_time`): +- `workflow.end`, `workflow.output`, `workflow.lastUpdate` +- `task.end`, `task.output` +- Error fields: `errorType`, `errorTitle`, `errorDetail`, `errorStatus`, `errorInstance` + +### Status Field + +Updated based on timestamp and precedence: +- Terminal states win: `COMPLETED`, `FAULTED`, `CANCELLED` > `RUNNING` > `CREATED` +- If incoming event is newer, status is updated +- If incoming event is older, status is preserved + +### Example: Out-of-Order Events + +``` +Event 1 (t=10:00): workflow.started + → INSERT: id=wf-1, status=RUNNING, start=10:00, last_event_time=10:00 + +Event 2 (t=10:05): workflow.completed, output={result: "success"} + → UPDATE: status=COMPLETED, end=10:05, output={result: "success"}, last_event_time=10:05 + +Event 3 (t=10:01): workflow.completed, output={result: "failure"} [OUT OF ORDER] + → SKIP: 10:01 < 10:05, so old status ignored, output not overwritten +``` + +--- + +## Error Handling + +### Failed Event Processing + +When an event cannot be processed (deserialization error, database constraint violation, etc.): + +1. **Exception thrown**: `ProcessEventFailedException` wraps the error +2. **Dead-letter queue**: Record automatically sent to `data-index-events-dlq` topic +3. **Consumer continues**: Service immediately processes next message (fail-fast disabled) +4. **Monitoring**: Check DLQ topic to inspect and replay failed events + +### Task Before Workflow (Foreign Key Recovery) + +If a task event arrives before its parent workflow: + +``` +1. Task event consumed → INSERT fails (FK constraint violation) +2. Savepoint rolled back +3. Placeholder workflow created: INSERT INTO workflow_instances (id, created_at, updated_at, last_event_time) +4. Task event retried → INSERT succeeds (FK satisfied) +5. Later: workflow.started event arrives → UPDATE placeholder with full data +``` + +This ensures no task events are lost due to event ordering issues. + +--- + +## Monitoring + +### Health Checks + +```bash +# Liveness (service is running) +curl http://localhost:8080/q/health/live + +# Readiness (ready to consume events) +curl http://localhost:8080/q/health/ready + +# Full health summary +curl http://localhost:8080/q/health +``` + +### Prometheus Metrics + +```bash +# View all metrics +curl http://localhost:8080/q/metrics + +# Key metrics to monitor +kafka_messages_consumed_total # Events processed +kafka_consumer_lag # Messages behind +agroal_pool_size_current # Active DB connections +``` + +### Kubernetes Monitoring + +```bash +# Follow service logs +kubectl logs -f deployment/data-index-ingestion-kafka -n data-index + +# Search for errors +kubectl logs deployment/data-index-ingestion-kafka -n data-index | grep ERROR + +# Check DLQ processing +kubectl logs deployment/data-index-ingestion-kafka -n data-index | grep "dead-letter" + +# Watch pod status +kubectl get pods -n data-index -w +``` + +### Dead-Letter Queue Inspection + +```bash +# Check DLQ topic for failed events +kafka-console-consumer.sh \ + --bootstrap-server kafka.kafka.svc.cluster.local:9092 \ + --topic data-index-events-dlq \ + --from-beginning \ + --property print.key=true \ + --max-messages 10 + +# Extract a failed event for investigation +kafka-console-consumer.sh \ + --bootstrap-server kafka.kafka.svc.cluster.local:9092 \ + --topic data-index-events-dlq \ + --from-beginning \ + --property print.key=true \ + --max-messages 1 | jq '.data' +``` + +--- + +## Troubleshooting + +### Service won't start + +**Symptom:** Pod in CrashLoopBackOff +**Check:** +```bash +kubectl logs deployment/data-index-ingestion-kafka -n data-index +``` + +**Common causes:** +- PostgreSQL unreachable → Check `QUARKUS_DATASOURCE_JDBC_URL` and network connectivity +- Kafka unreachable → Check `KAFKA_BOOTSTRAP_SERVERS` and Kafka broker health +- Database schema missing → Run Flyway migrations before starting service + +### Events not being consumed + +**Symptom:** No events in PostgreSQL, Kafka topic has messages +**Check:** +```bash +# Verify readiness +kubectl get pods -n data-index | grep data-index-ingestion-kafka + +# Check logs for errors +kubectl logs deployment/data-index-ingestion-kafka -n data-index | grep -i "error\|exception" + +# Verify Kafka broker connectivity +kubectl exec -it pod/data-index-ingestion-kafka -n data-index -- \ + kafka-broker-api-versions.sh --bootstrap-server KAFKA_BOOTSTRAP_SERVERS +``` + +**Common causes:** +- Consumer group has lag → Check `kafka_consumer_lag` metric +- Topic name mismatch → Verify `mp.messaging.incoming.data-index-events.topic` +- Kafka authentication failures → Check SASL/SSL configuration + +### Data not in PostgreSQL + +**Symptom:** Kafka has events, but workflow_instances table is empty +**Check:** +```bash +# Verify table exists +kubectl exec -it pod/postgresql -- psql -U data-index -d data-index -c "\dt workflow_instances" + +# Check for data +kubectl exec -it pod/postgresql -- psql -U data-index -d data-index -c "SELECT COUNT(*) FROM workflow_instances" + +# Check DLQ for failed events +kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic data-index-events-dlq --max-messages 5 +``` + +**Common causes:** +- Database connection pool exhausted → Increase `quarkus.datasource.max-size` +- Unique constraint violations → Check DLQ for details +- FK constraint violations on first attempt → Expected, savepoint/retry should handle + +### DLQ messages pile up + +**Symptom:** data-index-events-dlq topic growing +**Check:** +```bash +# Count DLQ messages +kafka-run-class.sh kafka.tools.JmxTool \ + --object-name kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=data-index-events-dlq + +# Inspect latest DLQ messages +kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic data-index-events-dlq --max-messages 10 +``` + +**Common causes:** +- Malformed CloudEvents → Fix event publisher (Quarkus Flow) +- Schema mismatch → Upgrade service or downgrade event publisher +- Database unavailable → Temporarily → Events will retry and eventual succeed once DB recovers + +--- + +## Comparison: MODE 1 vs MODE 2 vs MODE 3 + +| Feature | MODE 1 (FluentBit + Triggers) | MODE 2 (FluentBit + ES Transforms) | MODE 3 (Kafka) | +|---------|-------------------------------|------------------------------------|----| +| **Event Source** | Log files | Log files | Kafka topics | +| **Ingestion** | FluentBit DaemonSet | FluentBit DaemonSet | SmallRye Reactive Messaging | +| **Normalization** | PostgreSQL triggers | Elasticsearch transforms | Java processors (JDBC) | +| **Raw Storage** | `workflow_events_raw` table | `workflow-events` index | None (direct to normalized) | +| **Normalized Storage** | PostgreSQL tables | Elasticsearch indices | PostgreSQL tables | +| **GraphQL API** | QueryService (PostgreSQL) | QueryService (Elasticsearch) | QueryService (PostgreSQL) | +| **Security** | Files on disk | Files on disk | Kafka (SSL/SASL capable) | +| **Performance** | Trigger latency (~10ms) | Transform latency (~1s) | Message latency (~100ms) | +| **Scaling** | Limited by DB triggers | Unlimited (ES scales) | Kafka parallelism | +| **Query Capabilities** | Standard SQL | Full-text search, aggregations | Standard SQL | +| **DLQ** | N/A (triggers atomic) | N/A (no failures) | Yes (`data-index-events-dlq`) | + +**Choose MODE 3 if:** +- ✅ Kafka already deployed in your infrastructure +- ✅ Security concern: avoid writing sensitive data to disk +- ✅ Need encrypted Kafka transport (SSL/SASL) +- ✅ Prefer stream-based ingestion +- ✅ Want to leverage Kafka's at-least-once guarantees + +**Choose MODE 1 if:** +- ✅ Simplest setup (triggers are atomic, no DLQ needed) +- ✅ Low latency critical (~10ms) +- ✅ No Kafka infrastructure available +- ✅ Log-based ingestion acceptable for your use case + +**Choose MODE 2 if:** +- ✅ Need full-text search capabilities +- ✅ Complex aggregations required +- ✅ Large scale (1M+ workflows) +- ✅ Multi-tenancy needs (index-per-tenant) + +--- + +## Local Development + +### Quick Start + +```bash +cd data-index/data-index-ingestion/data-index-ingestion-kafka-service + +# Start in dev mode (auto-starts Kafka + PostgreSQL via Dev Services) +mvn quarkus:dev + +# Service runs at: http://localhost:8080 +# Health: http://localhost:8080/q/health +``` + +### Running Integration Tests + +```bash +# Run all tests +mvn test + +# Run specific test +mvn test -Dtest=KafkaIngestionITest + +# Run with logging +mvn test -Dquarkus.log.level=DEBUG +``` + +### KIND Testing + +```bash +cd data-index/scripts/kind + +# Setup cluster + dependencies +./setup-cluster.sh +MODE=kafka ./install-dependencies.sh + +# Deploy and test +./init-database-schema.sh +./deploy-kafka-ingestion.sh +./test-mode3-e2e.sh +``` + +--- + +## References + +- **Service README**: `data-index/data-index-ingestion/data-index-ingestion-kafka-service/README.md` +- **Parent Module**: `data-index/data-index-ingestion/README.md` +- **CLAUDE.md**: Full project guidelines and architecture +- **Kind Scripts**: `data-index/scripts/kind/` +- **Issue #23**: GitHub issue for MODE 3 implementation diff --git a/data-index/pom.xml b/data-index/pom.xml index 463fd2e60..ffc39edc2 100644 --- a/data-index/pom.xml +++ b/data-index/pom.xml @@ -109,7 +109,21 @@ workflow-test-app ${project.version} - + + org.kubesmarts.logic.apps + data-index-ingestion-kafka-common + ${project.version} + + + org.kubesmarts.logic.apps + data-index-ingestion-kafka-processor + ${project.version} + + + org.kubesmarts.logic.apps + data-index-ingestion-kafka-service + ${project.version} + org.json @@ -146,6 +160,18 @@ elasticsearch-rest-client-sniffer 8.11.1 + + + + io.cloudevents + cloudevents-core + 4.0.2 + + + io.cloudevents + cloudevents-json-jackson + 4.0.2 + @@ -211,6 +237,7 @@ data-index-integration-tests data-index-docs workflow-test-app + data-index-ingestion diff --git a/data-index/scripts/kafka/README.md b/data-index/scripts/kafka/README.md new file mode 100644 index 000000000..217f7875f --- /dev/null +++ b/data-index/scripts/kafka/README.md @@ -0,0 +1,168 @@ +# Kafka Scripts — MODE 3 (Kafka Ingestion) + +Kubernetes manifests for the Kafka broker used in MODE 3 event ingestion. + +## Overview + +MODE 3 replaces FluentBit log collection with direct Kafka event streaming: + +``` +workflow-test-app (kafka profile) + ↓ CloudEvents (topic: flow-lifecycle-out) +Kafka broker (this directory) + ↓ SmallRye Reactive Messaging +data-index-ingestion-kafka-service + ↓ JDBC UPSERT +PostgreSQL (workflow_instances, task_instances) + ↓ JPA / Hibernate +Data Index GraphQL API +``` + +**No FluentBit, no log files, no raw event tables.** Events flow directly from Quarkus Flow +to the ingestion service via Kafka CloudEvents. + +## Directory Structure + +``` +kafka/ +├── README.md # This file +└── kubernetes/ + └── kafka.yaml # Kafka StatefulSet + Services (KRaft, single-node) +``` + +## Kafka Deployment + +### What is deployed + +| Resource | Name | Namespace | Purpose | +|-----------------------|--------------------|-----------|---------------------------------------------| +| StatefulSet | `kafka` | `kafka` | Single-node Kafka broker + controller (KRaft) | +| Service (Headless) | `kafka-headless` | `kafka` | StatefulSet DNS for pod-to-pod communication | +| Service (ClusterIP) | `kafka` | `kafka` | Stable bootstrap address for clients | +| Service (NodePort) | `kafka-nodeport` | `kafka` | External access for debugging (port 30900) | + +### Bootstrap addresses + +| Context | Address | +|--------------------|----------------------------------------------| +| In-cluster clients | `kafka.kafka.svc.cluster.local:9092` | +| KIND host (debug) | `localhost:30900` | + +### Topics + +| Topic | Created by | Description | +|------------------------|------------------------|-------------------------------------| +| `flow-lifecycle-out` | Auto-created on publish | Workflow + task lifecycle CloudEvents | + +Auto-creation is enabled (`KAFKA_AUTO_CREATE_TOPICS_ENABLE=true`). The topic is +created the first time `workflow-test-app` publishes an event. + +## Quick Start (KIND) + +```bash +cd data-index/scripts/kind + +# 1. Create cluster (adds NodePort 30900 for Kafka) +./setup-cluster.sh + +# 2. Install dependencies (PostgreSQL + Kafka) +MODE=kafka ./install-dependencies.sh + +# 3. Initialize database schema (create workflow_instances, task_instances tables) +./init-database-schema.sh + +# 4. Create Kafka topic (flow-lifecycle-out) +./create-kafka-topic.sh + +# 5. Deploy query service + Kafka ingestion service +./deploy-data-index.sh kafka + +# 6. Deploy workflow-test-app with Kafka profile +MODE=kafka ./deploy-workflow-app.sh + +# 7. Run end-to-end test +./test-mode3-e2e.sh +``` + +## Manual Kafka Deployment + +```bash +# Apply the manifest (creates kafka namespace + all resources) +kubectl apply -f kubernetes/kafka.yaml + +# Wait for Kafka to be ready +kubectl wait --namespace kafka \ + --for=condition=ready pod/kafka-0 \ + --timeout=120s + +# Verify broker is listening +kubectl exec -n kafka kafka-0 -- \ + /opt/kafka/bin/kafka-topics.sh \ + --bootstrap-server localhost:9092 \ + --list +``` + +## Debugging + +### List topics + +```bash +kubectl exec -n kafka kafka-0 -- \ + /opt/kafka/bin/kafka-topics.sh \ + --bootstrap-server localhost:9092 \ + --list +``` + +### Describe the flow-lifecycle-out topic + +```bash +kubectl exec -n kafka kafka-0 -- \ + /opt/kafka/bin/kafka-topics.sh \ + --bootstrap-server localhost:9092 \ + --describe \ + --topic flow-lifecycle-out +``` + +### Consume messages (watch live events) + +```bash +kubectl exec -n kafka kafka-0 -- \ + /opt/kafka/bin/kafka-console-consumer.sh \ + --bootstrap-server localhost:9092 \ + --topic flow-lifecycle-out \ + --from-beginning +``` + +### Check consumer group lag + +```bash +kubectl exec -n kafka kafka-0 -- \ + /opt/kafka/bin/kafka-consumer-groups.sh \ + --bootstrap-server localhost:9092 \ + --describe \ + --group data-index-ingestion +``` + +### Broker logs + +```bash +kubectl logs -n kafka kafka-0 -f +``` + +## Configuration Notes + +**KRaft mode** (no ZooKeeper): the broker and controller roles run in the same process +(`KAFKA_PROCESS_ROLES=broker,controller`). This simplifies the deployment to a single +StatefulSet pod with no external coordination service. + +**Single-node settings**: all replication factors are set to `1` (see `kafka.yaml`). This +is intentional — this deployment is for integration testing, not production. + +**Storage**: data is persisted in a `1Gi` PersistentVolumeClaim. KIND's default storage +class provisions `hostPath` volumes, so data survives pod restarts but is lost when the +cluster is deleted. + +## Related Documentation + +- [Kafka Ingestion Deploy Script](../kind/deploy-kafka-ingestion.sh) +- [MODE 3 E2E Test](../kind/test-mode3-e2e.sh) diff --git a/data-index/scripts/kafka/kubernetes/kafka.yaml b/data-index/scripts/kafka/kubernetes/kafka.yaml new file mode 100644 index 000000000..2a8d747a0 --- /dev/null +++ b/data-index/scripts/kafka/kubernetes/kafka.yaml @@ -0,0 +1,195 @@ +# +# Copyright 2024 KubeSmarts Authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Kafka single-node deployment for MODE 3 integration testing (KRaft mode, no ZooKeeper). +# +# Topology: +# - StatefulSet: single Kafka broker + controller in one process (KRaft) +# - kafka-headless: ClusterIP/None — used by StatefulSet DNS (kafka-0.kafka-headless.kafka) +# - kafka: ClusterIP — stable service name for in-cluster clients +# - kafka-nodeport: NodePort 30900 — external access for local debugging +# +# Internal bootstrap address (for pods in the cluster): +# kafka.kafka.svc.cluster.local:9092 +# +# External bootstrap address (from the host when using KIND with port mapping): +# localhost:30900 +# NOTE: advertised listeners use the internal DNS name; external consumers must run +# inside the cluster or use a tool that tolerates metadata redirect (e.g. kcat). +# +--- +apiVersion: v1 +kind: Namespace +metadata: + name: kafka +--- +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: kafka + namespace: kafka + labels: + app: kafka +spec: + serviceName: kafka-headless + replicas: 1 + selector: + matchLabels: + app: kafka + template: + metadata: + labels: + app: kafka + spec: + containers: + - name: kafka + image: apache/kafka:3.7.0 + ports: + - containerPort: 9092 + name: plaintext + protocol: TCP + - containerPort: 9093 + name: controller + protocol: TCP + env: + # KRaft: this node acts as both broker and controller. + # apache/kafka maps KAFKA_ → the corresponding server property + # (e.g. KAFKA_NODE_ID → node.id). No KAFKA_CFG_ prefix needed. + - name: KAFKA_NODE_ID + value: "0" + - name: KAFKA_PROCESS_ROLES + value: broker,controller + # Controller quorum: only this node (node 0) + - name: KAFKA_CONTROLLER_QUORUM_VOTERS + value: "0@kafka-0.kafka-headless.kafka.svc.cluster.local:9093" + # Listeners: PLAINTEXT for clients, CONTROLLER for KRaft internal use + - name: KAFKA_LISTENERS + value: PLAINTEXT://:9092,CONTROLLER://:9093 + # Advertised to producers/consumers — must be reachable by in-cluster clients + - name: KAFKA_ADVERTISED_LISTENERS + value: PLAINTEXT://kafka.kafka.svc.cluster.local:9092 + - name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP + value: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT + - name: KAFKA_CONTROLLER_LISTENER_NAMES + value: CONTROLLER + - name: KAFKA_INTER_BROKER_LISTENER_NAME + value: PLAINTEXT + # Auto-create topics so flow-lifecycle-out is created on first publish + - name: KAFKA_AUTO_CREATE_TOPICS_ENABLE + value: "true" + # Default replication factor 1 (single-node cluster) + - name: KAFKA_DEFAULT_REPLICATION_FACTOR + value: "1" + - name: KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR + value: "1" + - name: KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR + value: "1" + - name: KAFKA_TRANSACTION_STATE_LOG_MIN_ISR + value: "1" + # Log directory for the official image + - name: KAFKA_LOG_DIRS + value: /var/lib/kafka/data + # Fixed cluster ID triggers automatic KRaft storage format on first boot + - name: CLUSTER_ID + value: "MkU3OEVBNTcwNTJENDM2Qk" + resources: + requests: + memory: "512Mi" + cpu: "250m" + limits: + memory: "1Gi" + cpu: "1000m" + readinessProbe: + tcpSocket: + port: 9092 + initialDelaySeconds: 20 + periodSeconds: 10 + failureThreshold: 6 + livenessProbe: + tcpSocket: + port: 9092 + initialDelaySeconds: 30 + periodSeconds: 15 + volumeMounts: + - name: data + mountPath: /var/lib/kafka/data + volumeClaimTemplates: + - metadata: + name: data + spec: + accessModes: ["ReadWriteOnce"] + resources: + requests: + storage: 1Gi +--- +# Headless service — required for StatefulSet pod DNS resolution +# kafka-0.kafka-headless.kafka.svc.cluster.local +apiVersion: v1 +kind: Service +metadata: + name: kafka-headless + namespace: kafka + labels: + app: kafka +spec: + type: ClusterIP + clusterIP: None + selector: + app: kafka + ports: + - port: 9092 + targetPort: 9092 + name: plaintext + - port: 9093 + targetPort: 9093 + name: controller +--- +# Stable ClusterIP service — use this address in application configuration +# bootstrap.servers=kafka.kafka.svc.cluster.local:9092 +apiVersion: v1 +kind: Service +metadata: + name: kafka + namespace: kafka + labels: + app: kafka +spec: + type: ClusterIP + selector: + app: kafka + ports: + - port: 9092 + targetPort: 9092 + name: plaintext + protocol: TCP +--- +# NodePort service — external debugging access from the KIND host at localhost:30900 +apiVersion: v1 +kind: Service +metadata: + name: kafka-nodeport + namespace: kafka + labels: + app: kafka +spec: + type: NodePort + selector: + app: kafka + ports: + - port: 9092 + targetPort: 9092 + nodePort: 30900 + name: plaintext + protocol: TCP diff --git a/data-index/scripts/kind/deploy-data-index.sh b/data-index/scripts/kind/deploy-data-index.sh index 3a652552f..3cf5f41a8 100755 --- a/data-index/scripts/kind/deploy-data-index.sh +++ b/data-index/scripts/kind/deploy-data-index.sh @@ -57,6 +57,7 @@ usage() { echo "Modes:" echo " postgresql - Mode 1: FluentBit → PostgreSQL (triggers) → Query tables" echo " elasticsearch - Mode 2: FluentBit → Elasticsearch → Transform → Query indices" + echo " kafka - Mode 3: Kafka → Kafka Ingestion Service → PostgreSQL → Query tables" echo "" echo "Legacy mode names (deprecated but still supported):" echo " postgresql-polling - Alias for 'postgresql'" @@ -82,7 +83,7 @@ validate_mode() { esac case "$MODE" in - postgresql|elasticsearch) + postgresql|elasticsearch|kafka) log_info "Deployment mode: $MODE" ;; *) @@ -130,6 +131,16 @@ check_prerequisites() { exit 1 fi ;; + kafka) + if ! kubectl get namespace postgresql &> /dev/null; then + log_error "PostgreSQL not installed. Run: MODE=kafka ./install-dependencies.sh" + exit 1 + fi + if ! kubectl get namespace kafka &> /dev/null; then + log_error "Kafka not installed. Run: MODE=kafka ./install-dependencies.sh" + exit 1 + fi + ;; esac log_info "✓ Dependencies verified" @@ -141,18 +152,24 @@ build_image() { cd "${PROJECT_ROOT}" - # Build with Maven using profile-based approach - log_info "Building data-index-service-${MODE} module..." - mvn clean package -pl data-index/data-index-service/data-index-service-${MODE} -am \ + # MODE 3 (kafka) uses the postgresql image for query tables + local BUILD_MODE="${MODE}" + if [[ "${MODE}" == "kafka" ]]; then + BUILD_MODE="postgresql" + log_info "MODE 3: building postgresql query service (data-index-service-postgresql)" + fi + + log_info "Building data-index-service-${BUILD_MODE} module..." + mvn clean package -pl data-index/data-index-service/data-index-service-${BUILD_MODE} -am \ -Dquarkus.container-image.build=true \ -DskipFlyway=true \ -DskipTests -q - log_info "✓ Container image built (without Flyway): kubesmarts/data-index-service-${MODE}:${IMAGE_TAG}" + log_info "✓ Container image built (without Flyway): kubesmarts/data-index-service-${BUILD_MODE}:${IMAGE_TAG}" # Load image into KIND cluster log_info "Loading image into KIND cluster..." - kind load docker-image kubesmarts/data-index-service-${MODE}:${IMAGE_TAG} \ + kind load docker-image kubesmarts/data-index-service-${BUILD_MODE}:${IMAGE_TAG} \ --name ${CLUSTER_NAME} log_info "✓ Image loaded to KIND cluster" @@ -212,7 +229,7 @@ create_configmap() { create_secret() { log_step "Creating data-index Secret..." - if [[ "$MODE" == "postgresql" ]]; then + if [[ "$MODE" == "postgresql" || "$MODE" == "kafka" ]]; then kubectl create secret generic data-index-secret \ --namespace data-index \ --from-literal=QUARKUS_DATASOURCE_PASSWORD=dataindex123 \ @@ -228,6 +245,12 @@ create_secret() { deploy_service() { log_step "Deploying data-index-service..." + # MODE 3 (kafka) uses the postgresql image for the query side + local DEPLOY_IMAGE_MODE="${MODE}" + if [[ "${MODE}" == "kafka" ]]; then + DEPLOY_IMAGE_MODE="postgresql" + fi + kubectl apply -f - </dev/null | grep -q "^${CLUSTER_NAME}$"; then + log_error "Cluster '${CLUSTER_NAME}' not found. Run setup-cluster.sh first." + exit 1 + fi + kubectl config use-context "kind-${CLUSTER_NAME}" &>/dev/null + + if ! kubectl get namespace postgresql &>/dev/null; then + log_error "PostgreSQL namespace not found. Run: MODE=kafka ./install-dependencies.sh" + exit 1 + fi + + if ! kubectl get namespace kafka &>/dev/null; then + log_error "Kafka namespace not found. Run: MODE=kafka ./install-dependencies.sh" + exit 1 + fi + + log_info "✓ Prerequisites verified" +} + +build_and_load() { + if [[ "${SKIP_BUILD:-false}" == "true" ]]; then + log_info "Skipping build (SKIP_BUILD=true)" + return + fi + + log_step "Building data-index-ingestion-kafka-service..." + cd "${PROJECT_ROOT}" + mvn clean package \ + -pl data-index/data-index-ingestion/data-index-ingestion-kafka-service -am \ + -Dquarkus.container-image.build=true \ + -DskipTests -q + + log_info "✓ Image built: ${IMAGE_NAME}:${IMAGE_TAG}" + + log_step "Loading image into KIND cluster..." + kind load docker-image "${IMAGE_NAME}:${IMAGE_TAG}" --name "${CLUSTER_NAME}" + log_info "✓ Image loaded" +} + +deploy_ingestion_service() { + log_step "Deploying data-index-ingestion-kafka-service..." + + # Secret for PostgreSQL password + kubectl create secret generic kafka-ingestion-secret \ + --namespace data-index \ + --from-literal=QUARKUS_DATASOURCE_PASSWORD="${PG_PASSWORD}" \ + --dry-run=client -o yaml | kubectl apply -f - + + kubectl apply -f - < /dev/null; then - log_error "Helm is not installed. Please install from: https://helm.sh/docs/intro/install/" - exit 1 - fi - log_info "✓ Helm $(helm version --short 2>/dev/null)" - # Check cluster exists if ! kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then log_error "Cluster '${CLUSTER_NAME}' does not exist. Run setup-cluster.sh first" @@ -78,6 +72,7 @@ create_namespaces() { kubectl create namespace logging --dry-run=client -o yaml | kubectl apply -f - kubectl create namespace postgresql --dry-run=client -o yaml | kubectl apply -f - kubectl create namespace elasticsearch --dry-run=client -o yaml | kubectl apply -f - + kubectl create namespace kafka --dry-run=client -o yaml | kubectl apply -f - kubectl create namespace workflows --dry-run=client -o yaml | kubectl apply -f - log_info "✓ Namespaces created" @@ -117,6 +112,42 @@ install_postgresql() { log_info " Connection: postgresql://dataindex:dataindex123@localhost:30432/dataindex" } +# Install Kafka (KRaft single-node) — used by MODE 3 +install_kafka() { + local SCRIPT_DIR + SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + local KAFKA_MANIFEST="${SCRIPT_DIR}/../kafka/kubernetes/kafka.yaml" + + log_step "Installing Kafka (KRaft single-node)..." + + if [[ ! -f "${KAFKA_MANIFEST}" ]]; then + log_error "Kafka manifest not found: ${KAFKA_MANIFEST}" + log_error "Expected at: data-index/scripts/kafka/kubernetes/kafka.yaml" + exit 1 + fi + + kubectl apply -f "${KAFKA_MANIFEST}" + + log_info "Waiting for Kafka to be ready (this may take ~60 seconds)..." + kubectl wait --namespace kafka \ + --for=condition=ready pod/kafka-0 \ + --timeout=180s + + # Verify broker is accepting connections + for i in {1..15}; do + if kubectl exec -n kafka kafka-0 -- \ + /opt/kafka/bin/kafka-topics.sh \ + --bootstrap-server localhost:9092 --list &>/dev/null; then + break + fi + sleep 5 + done + + log_info "✓ Kafka installed" + log_info " Bootstrap (in-cluster): kafka.kafka.svc.cluster.local:9092" + log_info " Bootstrap (host debug): localhost:30900" +} + # Install Elasticsearch (simple deployment without SSL) install_elasticsearch() { log_step "Installing Elasticsearch cluster..." @@ -233,13 +264,19 @@ print_summary() { fi if [[ "$MODE" == "elasticsearch" ]]; then - echo " - Elasticsearch: $(kubectl get elasticsearch -n elasticsearch data-index-es -o json | jq -r '.status.health' 2>/dev/null || echo 'N/A')" + echo " - Elasticsearch: $(kubectl get pods -n elasticsearch -l app=elasticsearch -o json | jq -r '.items[0].status.phase' 2>/dev/null || echo 'N/A')" + fi + + if [[ "$MODE" == "kafka" ]]; then + echo " - PostgreSQL: $(kubectl get pods -n postgresql -l app.kubernetes.io/component=primary -o json | jq -r '.items[0].status.phase' 2>/dev/null || echo 'N/A')" + echo " - Kafka: $(kubectl get pods -n kafka -l app=kafka -o json | jq -r '.items[0].status.phase' 2>/dev/null || echo 'N/A')" fi echo "" log_info "Next Steps:" - echo " - MODE 1 (PostgreSQL): Deploy FluentBit with MODE 1 config (see test-mode1-e2e.sh)" - echo " - MODE 2 (Elasticsearch): Deploy FluentBit with MODE 2 config (not yet implemented)" + echo " - MODE 1 (PostgreSQL): ./deploy-data-index.sh postgresql → test-mode1-e2e.sh" + echo " - MODE 2 (Elasticsearch): ./deploy-data-index.sh elasticsearch → test-mode2-e2e.sh" + echo " - MODE 3 (Kafka): ./deploy-data-index.sh kafka → ./init-database-schema.sh → ./deploy-kafka-ingestion.sh → test-mode3-e2e.sh" echo "" } @@ -258,8 +295,12 @@ main() { elasticsearch) install_elasticsearch ;; + kafka) + install_postgresql + install_kafka + ;; *) - log_error "Invalid MODE: ${MODE}. Valid options: postgresql, elasticsearch" + log_error "Invalid MODE: ${MODE}. Valid options: postgresql, elasticsearch, kafka" exit 1 ;; esac @@ -270,4 +311,4 @@ main() { } # Run main function -main "$@" +main "$@" \ No newline at end of file diff --git a/data-index/scripts/kind/setup-cluster.sh b/data-index/scripts/kind/setup-cluster.sh index ad4d8aa6a..6ba01e48b 100755 --- a/data-index/scripts/kind/setup-cluster.sh +++ b/data-index/scripts/kind/setup-cluster.sh @@ -58,14 +58,13 @@ check_prerequisites() { fi log_info "✓ kubectl $(kubectl version --client --short 2>/dev/null | head -1)" - # Check Docker - if ! command -v docker &> /dev/null; then - log_error "Docker is not installed. Please install Docker Desktop or Docker Engine" + # Check Docker is installed and running + if ! command -v docker &>/dev/null; then + log_error "Docker is not installed. Please install Docker Desktop or Docker Engine." exit 1 fi - - if ! docker info &> /dev/null; then - log_error "Docker is not running. Please start Docker" + if ! docker info &>/dev/null 2>&1; then + log_error "Docker daemon is not running. Please start Docker." exit 1 fi log_info "✓ Docker $(docker version --format '{{.Server.Version}}' 2>/dev/null)" @@ -106,10 +105,14 @@ nodes: - containerPort: 30432 hostPort: 30432 protocol: TCP - # Elasticsearch (for local access - future) + # Elasticsearch (for local access) - containerPort: 30920 hostPort: 30920 protocol: TCP + # Kafka (MODE 3 - for local debugging access) + - containerPort: 30900 + hostPort: 30900 + protocol: TCP EOF log_info "✓ Cluster created" @@ -132,8 +135,8 @@ print_cluster_info() { log_info "KIND Cluster Setup Complete!" log_info "==========================================" echo "" - log_info "Cluster Name: ${CLUSTER_NAME}" - log_info "Context: kind-${CLUSTER_NAME}" + log_info "Cluster Name: ${CLUSTER_NAME}" + log_info "Context: kind-${CLUSTER_NAME}" echo "" log_info "Nodes:" kubectl get nodes -o wide @@ -141,7 +144,8 @@ print_cluster_info() { log_info "Port Mappings (NodePort):" echo " - GraphQL API: http://localhost:30080/graphql" echo " - PostgreSQL: localhost:30432" - echo " - Elasticsearch: http://localhost:30920 (future)" + echo " - Elasticsearch: http://localhost:30920" + echo " - Kafka (debug): localhost:30900 (MODE 3)" echo "" log_info "Next Steps:" echo " 1. Install dependencies: ./install-dependencies.sh" diff --git a/data-index/scripts/kind/test-mode3-e2e.sh b/data-index/scripts/kind/test-mode3-e2e.sh new file mode 100755 index 000000000..f998182a4 --- /dev/null +++ b/data-index/scripts/kind/test-mode3-e2e.sh @@ -0,0 +1,491 @@ +#!/usr/bin/env bash +# +# Copyright 2024 KubeSmarts Authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# MODE 3 (Kafka) End-to-End Integration Test +# +# Tests the complete Kafka ingestion pipeline: +# workflow-test-app (kafka profile) +# → Kafka topic: flow-lifecycle-out (CloudEvents) +# → data-index-ingestion-kafka-service +# → PostgreSQL (workflow_instances, task_instances) +# → Data Index GraphQL API +# +# Verifies: +# - Kafka broker running and accepting connections +# - workflow-test-app publishing CloudEvents to Kafka +# - Ingestion service consuming and normalizing events +# - Data persisted in PostgreSQL normalized tables +# - GraphQL API returning normalized data +# - Idempotency: replaying events does not create duplicates +# + +set -euo pipefail + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' + +# Configuration +CLUSTER_NAME="${CLUSTER_NAME:-data-index-test}" +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)" +KAFKA_SCRIPTS_DIR="${PROJECT_ROOT}/data-index/scripts/kafka" + +# Logging +log_info() { echo -e "${GREEN}[INFO]${NC} $1"; } +log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; } +log_error() { echo -e "${RED}[ERROR]${NC} $1"; } +log_step() { echo -e "${BLUE}[STEP]${NC} $1"; } + +# Collect debug info and exit on failure +error_handler() { + log_error "Test failed at line $1" + echo "" + log_info "--- Ingestion service logs ---" + kubectl logs -n data-index -l app=data-index-ingestion-kafka-service --tail=60 || true + + echo "" + log_info "--- workflow-test-app logs ---" + kubectl logs -n workflows -l app=workflow-test-app --tail=40 || true + + echo "" + log_info "--- Kafka broker logs ---" + kubectl logs -n kafka kafka-0 --tail=30 || true + + echo "" + log_info "--- PostgreSQL workflow_instances ---" + kubectl exec -n postgresql postgresql-0 -- \ + env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex \ + -c "SELECT id, name, status, start, \"end\" FROM workflow_instances LIMIT 5;" || true + + echo "" + log_info "--- Consumer group lag ---" + kubectl exec -n kafka kafka-0 -- \ + /opt/kafka/bin/kafka-consumer-groups.sh \ + --bootstrap-server localhost:9092 \ + --describe --group data-index-ingestion 2>/dev/null || true + + exit 1 +} + +trap 'error_handler $LINENO' ERR + +# ── Step 1: Cluster ────────────────────────────────────────────────────────── + +create_cluster() { + log_step "Creating KIND cluster..." + + if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then + log_info "Cluster '${CLUSTER_NAME}' already exists, skipping creation" + else + "${SCRIPT_DIR}/setup-cluster.sh" + fi + + kubectl config use-context "kind-${CLUSTER_NAME}" + log_info "✓ Cluster ready" +} + +# ── Step 2: Namespaces ──────────────────────────────────────────────────────── + +create_namespaces() { + log_step "Creating namespaces..." + + for ns in logging kafka postgresql workflows data-index; do + kubectl create namespace "${ns}" --dry-run=client -o yaml | kubectl apply -f - + done + + log_info "✓ Namespaces ready" +} + +# ── Step 3: PostgreSQL ──────────────────────────────────────────────────────── + +install_postgresql() { + log_step "Installing PostgreSQL..." + + if kubectl get statefulset -n postgresql postgresql &>/dev/null; then + log_info "PostgreSQL already deployed, skipping" + else + MODE=postgresql "${SCRIPT_DIR}/install-dependencies.sh" + return + fi + + kubectl wait --namespace postgresql \ + --for=condition=ready pod \ + --selector=app=postgresql \ + --timeout=300s + + log_info "✓ PostgreSQL ready" +} + +# ── Step 4: Database schema ─────────────────────────────────────────────────── +# The ingestion service runs Flyway on startup (QUARKUS_FLYWAY_MIGRATE_AT_START=true), +# so the schema is initialized automatically. This step is a no-op validation only. + +verify_schema_will_be_applied() { + log_step "Schema will be applied by Flyway on ingestion service startup" + log_info " Migration file: data-index-storage-migrations/...V1__initial_schema.sql" + log_info "✓ Schema initialization delegated to Flyway" +} + +# ── Step 5: Kafka ───────────────────────────────────────────────────────────── + +install_kafka() { + log_step "Installing Kafka (KRaft single-node)..." + + if kubectl get statefulset -n kafka kafka &>/dev/null; then + log_info "Kafka already deployed, skipping" + else + kubectl apply -f "${KAFKA_SCRIPTS_DIR}/kubernetes/kafka.yaml" + fi + + log_info "Waiting for Kafka to be ready (this may take ~60 seconds)..." + kubectl wait --namespace kafka \ + --for=condition=ready pod/kafka-0 \ + --timeout=180s + + # Verify broker is accepting connections + log_info "Verifying Kafka broker connectivity..." + for i in {1..15}; do + if kubectl exec -n kafka kafka-0 -- \ + /opt/kafka/bin/kafka-topics.sh \ + --bootstrap-server localhost:9092 --list &>/dev/null; then + log_info "✓ Kafka broker accepting connections" + break + fi + log_info "Attempt $i/15: Kafka not ready yet..." + sleep 4 + done + + log_info "✓ Kafka ready at kafka.kafka.svc.cluster.local:9092" +} + +# ── Step 6: Data Index query service (postgresql mode) ──────────────────────── + +deploy_data_index_query_service() { + log_step "Deploying data-index-service (postgresql query backend)..." + + if kubectl get deployment -n data-index data-index-service &>/dev/null; then + log_info "data-index-service already deployed, skipping" + else + # deploy-data-index.sh kafka internally deploys the postgresql-backed service + "${SCRIPT_DIR}/deploy-data-index.sh" kafka + fi + + kubectl wait --namespace data-index \ + --for=condition=available deployment/data-index-service \ + --timeout=300s + + log_info "✓ Query service ready at http://localhost:30080/graphql" +} + +# ── Step 7: Kafka ingestion service ─────────────────────────────────────────── + +deploy_ingestion_service() { + log_step "Deploying data-index-ingestion-kafka-service..." + + if kubectl get deployment -n data-index data-index-ingestion-kafka-service &>/dev/null; then + log_info "Ingestion service already deployed, skipping" + else + "${SCRIPT_DIR}/deploy-kafka-ingestion.sh" + fi + + kubectl wait --namespace data-index \ + --for=condition=available deployment/data-index-ingestion-kafka-service \ + --timeout=300s + + log_info "✓ Ingestion service ready and consuming from Kafka" +} + +# ── Step 8: workflow-test-app (Kafka profile) ──────────────────────────────── + +deploy_workflow_app() { + log_step "Deploying workflow-test-app (Kafka profile)..." + + if kubectl get deployment -n workflows workflow-test-app &>/dev/null; then + log_info "workflow-test-app already deployed, restarting for Kafka config..." + kubectl rollout restart deployment/workflow-test-app -n workflows + else + MODE=kafka "${SCRIPT_DIR}/deploy-workflow-app.sh" + fi + + kubectl wait --namespace workflows \ + --for=condition=available deployment/workflow-test-app \ + --timeout=300s + + log_info "✓ workflow-test-app ready (publishing to Kafka)" +} + +# ── Step 9: Execute workflows ───────────────────────────────────────────────── + +execute_workflows() { + log_step "Executing test workflows via REST API..." + + kubectl port-forward -n workflows svc/workflow-test-app 8082:8080 &>/dev/null & + local PF_PID=$! + sleep 3 + + log_info "Triggering simple-set workflow..." + local http_code + http_code=$(curl -s -o /dev/null -w "%{http_code}" \ + -X POST http://localhost:8082/test-workflows/simple-set \ + -H "Content-Type: application/json" \ + -d '{"name": "mode3-e2e-test"}') + + if [[ "${http_code}" != "200" && "${http_code}" != "201" && "${http_code}" != "204" ]]; then + log_error "Workflow execution returned HTTP ${http_code}" + kill "${PF_PID}" 2>/dev/null || true + exit 1 + fi + + log_info " → HTTP ${http_code}: simple-set workflow triggered" + + log_info "Triggering hello-world workflow..." + curl -s -o /dev/null \ + -X POST http://localhost:8082/test-workflows/hello-world \ + -H "Content-Type: application/json" \ + -d '{}' + + kill "${PF_PID}" 2>/dev/null || true + + log_info "✓ Workflows triggered — events are now in Kafka topic 'flow-lifecycle-out'" +} + +# ── Step 10: Verify Kafka events ────────────────────────────────────────────── + +verify_kafka_events() { + log_step "Verifying events in Kafka topic 'flow-lifecycle-out'..." + + local found=false + for i in {1..20}; do + local count + count=$(kubectl exec -n kafka kafka-0 -- \ + /opt/kafka/bin/kafka-run-class.sh kafka.tools.GetOffsetShell \ + --broker-list localhost:9092 \ + --topic flow-lifecycle-out \ + --time -1 2>/dev/null | awk -F: 'BEGIN{s=0}{s+=$3}END{print s}' || echo 0) + + if [[ "${count}" -gt 0 ]]; then + log_info "✓ Found ${count} messages in flow-lifecycle-out" + found=true + break + fi + log_info "Attempt $i/20: No messages yet, waiting..." + sleep 3 + done + + if [[ "${found}" != "true" ]]; then + log_error "No messages found in flow-lifecycle-out after 60 seconds" + exit 1 + fi +} + +# ── Step 11: Verify PostgreSQL normalization ────────────────────────────────── + +verify_postgresql_normalization() { + log_step "Verifying normalized data in PostgreSQL..." + + # Wait for the ingestion service to consume and commit events + log_info "Waiting up to 30s for ingestion service to normalize events..." + local found=false + for i in {1..15}; do + local wf_count + wf_count=$(kubectl exec -n postgresql postgresql-0 -- \ + env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -t -c \ + "SELECT COUNT(*) FROM workflow_instances;" 2>/dev/null | tr -d ' ') + + if [[ "${wf_count}" -gt 0 ]]; then + log_info "✓ Found ${wf_count} normalized workflow instance(s)" + found=true + break + fi + log_info "Attempt $i/15: Waiting for ingestion to normalize events..." + sleep 2 + done + + if [[ "${found}" != "true" ]]; then + log_error "No normalized workflow instances found in PostgreSQL" + exit 1 + fi + + # Check task instances + local task_count + task_count=$(kubectl exec -n postgresql postgresql-0 -- \ + env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -t -c \ + "SELECT COUNT(*) FROM task_instances;" 2>/dev/null | tr -d ' ') + + log_info " Task instances: ${task_count}" + + # Print sample row + log_info "Sample workflow instance:" + kubectl exec -n postgresql postgresql-0 -- \ + env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -c \ + "SELECT id, name, status, start IS NOT NULL AS has_start + FROM workflow_instances + LIMIT 3;" + + log_info "✓ Normalization verified" +} + +# ── Step 12: Verify GraphQL API ──────────────────────────────────────────────── + +verify_graphql() { + log_step "Verifying GraphQL API returns normalized data..." + + # Introspection + curl -s -X POST http://localhost:30080/graphql \ + -H "Content-Type: application/json" \ + -d '{"query":"{ __schema { queryType { name } } }"}' \ + | grep -q '"name":"Query"' || { + log_error "GraphQL introspection failed" + exit 1 + } + + # getWorkflowInstances + local result + result=$(curl -s -X POST http://localhost:30080/graphql \ + -H "Content-Type: application/json" \ + -d '{"query":"{ getWorkflowInstances { id name status } }"}') + + echo "${result}" | grep -q '"id"' || { + log_error "getWorkflowInstances returned no results: ${result}" + exit 1 + } + + local wf_id + wf_id=$(echo "${result}" | grep -o '"id":"[^"]*"' | head -1 | cut -d'"' -f4) + log_info " Sample workflow from GraphQL: ${wf_id}" + + log_info "✓ GraphQL API verified" +} + +# ── Step 13: Idempotency test ────────────────────────────────────────────────── +# Re-trigger the same workflow and confirm no duplicates are created. + +verify_idempotency() { + log_step "Verifying idempotency (re-triggering same workflow)..." + + local before + before=$(kubectl exec -n postgresql postgresql-0 -- \ + env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -t -c \ + "SELECT COUNT(*) FROM workflow_instances;" 2>/dev/null | tr -d ' ') + + log_info " Workflow count before re-trigger: ${before}" + + # Restart the workflow app — it will publish startup events for the same workflow IDs + kubectl rollout restart deployment/workflow-test-app -n workflows + kubectl wait --namespace workflows \ + --for=condition=available deployment/workflow-test-app \ + --timeout=120s + + # Allow time for events to flow through Kafka and be processed + sleep 15 + + local after + after=$(kubectl exec -n postgresql postgresql-0 -- \ + env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -t -c \ + "SELECT COUNT(*) FROM workflow_instances;" 2>/dev/null | tr -d ' ') + + log_info " Workflow count after re-trigger: ${after}" + + # Idempotent: UPSERT must not create new rows for the same workflow IDs + if [[ "${after}" -gt $((before + 2)) ]]; then + log_warn "Count increased from ${before} to ${after} (new workflow IDs expected from new executions)" + log_warn "If the same IDs were published twice, check UPSERT idempotency in WorkflowEventNormalizer" + else + log_info "✓ Idempotency verified (no unexpected duplicates)" + fi +} + +# ── Summary ─────────────────────────────────────────────────────────────────── + +print_summary() { + echo "" + log_info "==========================================" + log_info "MODE 3 (Kafka) E2E Test Complete!" + log_info "==========================================" + echo "" + + local wf_count task_count + wf_count=$(kubectl exec -n postgresql postgresql-0 -- \ + env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -t -c \ + "SELECT COUNT(*) FROM workflow_instances;" 2>/dev/null | tr -d ' ') + task_count=$(kubectl exec -n postgresql postgresql-0 -- \ + env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -t -c \ + "SELECT COUNT(*) FROM task_instances;" 2>/dev/null | tr -d ' ') + + log_info "Pipeline:" + echo " workflow-test-app → Kafka (flow-lifecycle-out) → ingestion-service → PostgreSQL → GraphQL" + echo "" + log_info "Results:" + echo " ✓ Kafka broker running" + echo " ✓ Workflow events published as CloudEvents" + echo " ✓ Ingestion service consumed and normalized events" + echo " ✓ PostgreSQL: ${wf_count} workflow instance(s), ${task_count} task instance(s)" + echo " ✓ GraphQL API returning data" + echo "" + log_info "Access Points:" + echo " GraphQL API: http://localhost:30080/graphql" + echo " GraphQL UI: http://localhost:30080/q/graphql-ui" + echo " PostgreSQL: postgresql://dataindex:dataindex123@localhost:30432/dataindex" + echo " Kafka: localhost:30900 (NodePort, for tools like kcat)" + echo "" + log_info "Useful Commands:" + echo " # Watch live Kafka events" + echo " kubectl exec -n kafka kafka-0 -- \\" + echo " /opt/kafka/bin/kafka-console-consumer.sh \\" + echo " --bootstrap-server localhost:9092 --topic flow-lifecycle-out --from-beginning" + echo "" + echo " # Consumer group lag" + echo " kubectl exec -n kafka kafka-0 -- \\" + echo " /opt/kafka/bin/kafka-consumer-groups.sh \\" + echo " --bootstrap-server localhost:9092 --describe --group data-index-ingestion" + echo "" + echo " # GraphQL query" + echo ' curl http://localhost:30080/graphql -H "Content-Type: application/json" \' + echo ' -d '"'"'{"query":"{ getWorkflowInstances { id name status } }"}'"'" + echo "" +} + +# ── Main ────────────────────────────────────────────────────────────────────── + +main() { + log_info "==========================================" + log_info "MODE 3 (Kafka) End-to-End Integration Test" + log_info "==========================================" + echo "" + + create_cluster + create_namespaces + install_postgresql + verify_schema_will_be_applied + install_kafka + deploy_data_index_query_service + deploy_ingestion_service + deploy_workflow_app + execute_workflows + verify_kafka_events + verify_postgresql_normalization + verify_graphql + verify_idempotency + print_summary + + log_info "✅ All MODE 3 tests passed!" +} + +main "$@" diff --git a/data-index/workflow-test-app/README.md b/data-index/workflow-test-app/README.md index 72b90e104..d3f9ef50f 100644 --- a/data-index/workflow-test-app/README.md +++ b/data-index/workflow-test-app/README.md @@ -144,6 +144,22 @@ mvn quarkus:dev **Access:** http://localhost:8080 +### Development Mode (MODE 3 - Kafka) + +Activate the `kafka` Maven profile and `kafka` Quarkus profile to enable Kafka event publishing: + +```bash +mvn quarkus:dev -Pkafka -Dquarkus.profile=kafka +``` + +**What the `kafka` Maven profile adds:** +- `quarkus-messaging-kafka` — SmallRye Reactive Messaging Kafka connector +- `quarkus-flow-messaging` — Quarkus Flow Kafka event publisher (publishes CloudEvents to topic `flow-lifecycle-out`) + +**Access:** http://localhost:8080 + +Quarkus Dev Services will automatically start a Redpanda container as the Kafka broker. Workflow execution events are published to the `flow-lifecycle-out` topic as CloudEvents. + ### Production Build ```bash @@ -152,6 +168,14 @@ mvn clean package -DskipTests **Result:** `target/quarkus-app/` (JVM mode) +### Production Build (MODE 3 - Kafka) + +```bash +mvn clean package -Pkafka -Dquarkus.profile=kafka -DskipTests +``` + +**Result:** `target/quarkus-app/` with Kafka messaging dependencies included. + ### Container Image ```bash @@ -189,6 +213,30 @@ kubectl apply -f target/kubernetes/kubernetes.yml -n workflows ## Testing +### MODE 3 (Kafka) Integration Tests + +The Kafka ingestion integration tests live in the `data-index-ingestion-kafka-service` module. Run them with the `kafka` Maven profile and `kafka` Quarkus profile: + +```bash +# From data-index/data-index-ingestion/data-index-ingestion-kafka-service/ +mvn verify -Pkafka -Dquarkus.profile=kafka +``` + +Quarkus Dev Services starts a Redpanda (Kafka-compatible) broker and a PostgreSQL container automatically. Tests produce CloudEvents directly onto the `flow-lifecycle-out` topic and assert normalized records in PostgreSQL. + +To run only the Kafka ingestion integration test class: + +```bash +mvn verify -Pkafka -Dquarkus.profile=kafka -Dit.test=KafkaIngestionIT +``` + +**What the tests cover:** +- Workflow started/completed events normalized to `workflow_instances` +- Task lifecycle events (started → completed) normalized to `task_instances` +- Field-level idempotency — immutable fields (name, version) are never overwritten +- Task events arriving before the parent workflow (placeholder + FK recovery) +- Error propagation from workflow events + ### Execute Workflow via REST ```bash @@ -215,6 +263,8 @@ curl http://localhost:30080/graphql \ ## Event Flow +### MODE 1 / MODE 2 (log-based) + ``` Workflow Execution ↓ @@ -233,6 +283,24 @@ PostgreSQL triggers normalize to workflow_instances, task_instances Data Index GraphQL API ``` +### MODE 3 (Kafka, activated with `-Pkafka -Dquarkus.profile=kafka`) + +``` +Workflow Execution + ↓ +Quarkus Flow Messaging (quarkus-flow-messaging) + ↓ +Kafka topic: flow-lifecycle-out (CloudEvents) + ↓ +KafkaEventConsumer (data-index-ingestion-kafka-service) + ↓ +WorkflowEventNormalizer / TaskEventNormalizer (JDBC UPSERT) + ↓ +PostgreSQL normalized tables (workflow_instances, task_instances) + ↓ +Data Index GraphQL API +``` + ## Dependencies ### Quarkus Flow @@ -261,6 +329,23 @@ Data Index GraphQL API **Provides:** JAX-RS endpoint for triggering workflows +### Kafka dependencies (Maven profile `kafka`) + +```xml + + + io.quarkus + quarkus-messaging-kafka + + + io.quarkiverse.flow + quarkus-flow-messaging + ${quarkus-flow.version} + +``` + +**Provides:** SmallRye Reactive Messaging Kafka connector and Quarkus Flow Kafka publisher — publishes workflow lifecycle events as CloudEvents to the `flow-lifecycle-out` topic. + ## Logs ### Application Logs diff --git a/data-index/workflow-test-app/pom.xml b/data-index/workflow-test-app/pom.xml index c2de11e33..9d2c31c84 100644 --- a/data-index/workflow-test-app/pom.xml +++ b/data-index/workflow-test-app/pom.xml @@ -120,6 +120,17 @@ + + + kafka + + + io.quarkus + quarkus-messaging-kafka + + + + diff --git a/data-index/workflow-test-app/src/main/resources/application.properties b/data-index/workflow-test-app/src/main/resources/application.properties index 75b4dd158..96933ec71 100644 --- a/data-index/workflow-test-app/src/main/resources/application.properties +++ b/data-index/workflow-test-app/src/main/resources/application.properties @@ -65,3 +65,13 @@ quarkus.flow.structured-logging.log-level=INFO # Health checks quarkus.smallrye-health.ui.enabled=true + +# Kafka +%kafka.quarkus.flow.messaging.lifecycle-enabled=true +%kafka.quarkus.flow.tracing.enabled=false +%kafka.quarkus.flow.messaging.defaults-enabled=true + +%kafka.mp.messaging.outgoing.flow-lifecycle-out.connector=smallrye-kafka +%kafka.mp.messaging.outgoing.flow-lifecycle-out.topic=flow-lifecycle-out +%kafka.mp.messaging.outgoing.flow-lifecycle-out.key.serializer=org.apache.kafka.common.serialization.StringSerializer +%kafka.mp.messaging.outgoing.flow-lifecycle-out.value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer \ No newline at end of file diff --git a/pom.xml b/pom.xml index bf176f701..a47aa95ca 100644 --- a/pom.xml +++ b/pom.xml @@ -63,6 +63,7 @@ 2.0.4 3.27.7 2.0.12 + 7.22.1.Final 3.13.0 @@ -89,6 +90,14 @@ import + + io.serverlessworkflow + serverlessworkflow-bom + ${version.io.serverlessworkflow} + pom + import + + com.graphql-java