Skip to content

chore: Integrate prometheus for metrics WPB-21843#46

Merged
bbaarriiss merged 7 commits intomainfrom
chore/integrate-prometheus-for-metrics-WPB-21843
Nov 26, 2025
Merged

chore: Integrate prometheus for metrics WPB-21843#46
bbaarriiss merged 7 commits intomainfrom
chore/integrate-prometheus-for-metrics-WPB-21843

Conversation

@bbaarriiss
Copy link
Copy Markdown


PR Submission Checklist for internal contributors

  • The PR Title

    • conforms to the style of semantic commits messages¹ supported in Wire's Github Workflow²
    • contains a reference JIRA issue number like SQPIT-764
    • answers the question: If merged, this PR will: ... ³
  • The PR Description

    • is free of optional paragraphs and you have filled the relevant parts to the best of your ability

What's new in this PR?

Issues

We need usage stats for Poll App. So that we can know which command is called how many times.

Solutions

Integrate Prometheus metrics and add some basic usage stats metrics for some commands.

How to Test

Locally -> Run the application and send command on staging. Then verify the metrics with their counts from http://localhost:8080/metrics

Stating -> After the deployment on staging, we should see the metrics on Grafana if all the K8s related additions are correct.

Attachments (Optional)

Here is the sample output can be seen from http://localhost:8080/metrics
image

- Refactor KtorInstallation.kt to have better readability.
- Update dependency injection
@bbaarriiss bbaarriiss self-assigned this Nov 25, 2025
@bbaarriiss bbaarriiss requested a review from a team as a code owner November 25, 2025 14:44
@claude
Copy link
Copy Markdown

claude Bot commented Nov 25, 2025

PR Review: Prometheus Metrics Integration

Thanks for implementing metrics tracking! This is a solid foundation for monitoring Poll App usage. Here's my detailed review:

✅ Strengths

  1. Clean Integration: The Prometheus/Micrometer integration is well-structured and follows Ktor best practices
  2. Good Test Coverage: All tests have been updated to include the new UsageMetrics dependency - great attention to detail!
  3. Proper DI Usage: The dependency injection setup is clean and consistent with the existing architecture
  4. Kubernetes Ready: The ServiceMonitor is properly configured for Prometheus Operator

🔍 Code Quality & Best Practices

1. Missing Metrics for Key Operations (Priority: Medium)

You're tracking /poll help and /poll creation, but missing other important commands:

  • /poll version (line 73-75 in MessagesHandlingService.kt) - should track version requests
  • Vote actions (line 41-46) - critical user engagement metric
  • Show results actions (line 48-54) - another important user interaction

Recommendation: Add metrics for these commands to get a complete picture of usage patterns.

2. Metric Naming Convention (Priority: Low)

Your metric names follow Prometheus conventions well (pollapp_*_total), but consider:

  • Adding a common prefix for all metrics (e.g., pollapp_commands_*)
  • This makes it easier to query and group related metrics in Grafana

3. Code Organization in KtorInstallation.kt (Priority: Low)

The refactoring into separate functions (setupHealthEndpoint, setupEventRouting, setupMetrics) is great! However:

  • Consider extracting the multiple routing {} blocks into a single block with multiple routes
  • This reduces the number of routing configuration passes

Current:

fun Application.setupHealthEndpoint() {
    routing { get("/health") { ... } }
}
fun Application.setupMetrics() {
    install(MicrometerMetrics) { ... }
    routing { get("/metrics") { ... } }
}

Suggested:

fun Application.setupRouting() {
    routing {
        get("/health") { call.response.status(HttpStatusCode.OK) }
        get("/metrics") { 
            val prometheusRegistry by closestDI().instance<PrometheusMeterRegistry>()
            call.respond(prometheusRegistry.scrape()) 
        }
    }
}

🐛 Potential Issues

1. ServiceMonitor Port Reference (Priority: High)

In helm/pollapp/templates/servicemonitor.yaml:9, you reference port: http. Verify that:

  • The service definition exposes a port named http (confirmed ✅ in service.yaml:13)
  • The Deployment/Pod has a container port named http

Action: Please verify the Deployment has containerPort with name: http

2. No Error Handling in Metrics Endpoint (Priority: Medium)

The /metrics endpoint at KtorInstallation.kt:57-59 could potentially throw an exception if the registry is not properly initialized. While unlikely with proper DI, consider adding error handling for production robustness.

🔒 Security Considerations

  1. Metrics Endpoint Exposure (Priority: Medium)

    • The /metrics endpoint is publicly accessible without authentication
    • For internal tools, this is typically acceptable, but verify this aligns with your security policies
    • Consider adding authentication if metrics could leak sensitive information (currently looks safe with just counters)
  2. ServiceMonitor RBAC (Priority: Low)

    • Ensure the Prometheus Operator has RBAC permissions to create and manage ServiceMonitors
    • This is typically configured at the cluster level, but worth verifying in your staging/production environments

⚡ Performance Considerations

  1. Counter Performance: ✅ Excellent choice using Micrometer Counters - they're thread-safe and have minimal overhead
  2. Registry Initialization: ✅ Good use of singleton pattern in DI - registry is created once
  3. Metrics Collection: ✅ No performance concerns - increments are O(1) operations

🧪 Test Coverage

Excellent work here! The test updates are thorough:

  • All test cases properly mock UsageMetrics
  • Verification of metric calls where expected
  • Verification of no metric calls where not expected

Minor suggestion: Consider adding a dedicated test for UsageMetrics class itself to verify:

  • Counter registration
  • Increment operations
  • Metric names and descriptions

📝 Additional Recommendations

  1. Documentation: Consider adding a README section or comment explaining:

    • What metrics are tracked
    • How to access metrics locally
    • Grafana dashboard queries (if any)
  2. Future Enhancements (not blocking):

    • Add labels to metrics (e.g., success/failure for operations)
    • Track timing metrics (e.g., poll creation duration)
    • Add gauge metrics for active polls or users
  3. Dependency Version: You're using micrometer-registry-prometheus:1.16.0 - this is recent (Jan 2025), which is good. Verify this version is compatible with your Kubernetes/Grafana setup.

🎯 Summary

This is a well-implemented PR that adds valuable observability to the Poll App. The code quality is good, tests are comprehensive, and the integration is clean.

Required before merge:

  • ✅ Verify Deployment has container port named http

Recommended improvements:

  • Add metrics for /poll version, vote actions, and show results actions
  • Consider consolidating routing blocks
  • Add metrics for easter egg ("good app") if you want to track engagement

Approval Status: Approved with minor recommendations

Great work! 🎉


Review generated with assistance from Claude Code

import io.micrometer.core.instrument.Counter
import io.micrometer.core.instrument.MeterRegistry

class UsageMetrics(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a counter for buttons being clicked ? Differentiate between a Delete button and clicking on an option

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spoonman01
For now, I am adding the very basic custom metrics of basic actions user take.

Regarding voting action: I thought about it but then I was not convinced about the benefit of it. Because people will vote they can change the vote and this means another voting action. So the metric will not show sth very clear. --> Maybe later we can find more meaningful reason for that. Wdyt?

Regarding delete poll: At the moment we don't have Delete functionality, but actually I can add the metric now, so it can be used when the feature implemented.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📣 Custom metric for "delete poll" is added although we don't have the functionality yet. Maybe this is not a good proactice to add unneeded things but since we are introducing the custom metrics first time in this app, I just wanted to make it ready from now.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right it's total votes and not by person, still better than nothing I think.
I will leave the decision to you

Copy link
Copy Markdown
Author

@bbaarriiss bbaarriiss Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not adding a metric for voting for now. Because it will not tell us anything meaningful. We can send metrics with the information of the pool itself (such as poll-id) but then it will be out of the purpose of metrics. Metrics are for general overview not for every instance of polls created.

Also just to keep here updated: I reverted the "delete poll" as we agreed as a team.

}

routing {
get("/metrics") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, apart from the business metrics you have defined, does the /metrics now export JVM or server data?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this automatically emits JVM and system metrics (https://micrometer.io/docs/ref/jvm) and also http metrics.

Some more info:

image

Also: https://chatgpt.com/s/t_6926c4491864819181ee50f4b42ec07d

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics_on_local_test.txt

This is how they look.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. If it's easy then as a test could you try to setup the full micrometer dashboard, just for this service? Not building it by hand, there should be some pre-made things in Grafana https://grafana.com/grafana/dashboards/4701-jvm-micrometer/

@claude
Copy link
Copy Markdown

claude Bot commented Nov 26, 2025

Pull Request Review - PR #46: Integrate Prometheus for Metrics

Summary

This PR integrates Prometheus metrics into the Poll App to track command usage. The implementation is generally solid with good test coverage, but there are several areas that could be improved.

✅ Strengths

  1. Good test coverage: All new metrics functionality is properly tested with mocks
  2. Clean dependency injection: The UsageMetrics class is properly integrated into the DI container
  3. Proper separation of concerns: Metrics logic is isolated in its own package
  4. Kubernetes integration: ServiceMonitor resource is correctly configured for Prometheus scraping

🔍 Code Quality & Best Practices

1. Unused Metric Counter ⚠️

Location: src/main/kotlin/com/wire/apps/polls/setup/metrics/UsageMetrics.kt:19-22

The deletePollCommandCounter is defined but never called. The onDeletePollCommand() method at line 32 is not invoked anywhere in the codebase.

Recommendation: Either implement delete poll functionality and call this metric, or remove it to avoid dead code.

2. Inconsistent Metric Tracking ⚠️

Location: src/main/kotlin/com/wire/apps/polls/services/MessagesHandlingService.kt:73-74

The /poll version command doesn't track metrics, but /poll help and /poll (create) do. This creates inconsistent observability.

Recommendation: Consider adding a metric for version commands or document why certain commands are excluded from tracking.

3. Code Organization - Multiple Routing Blocks 💡

Location: src/main/kotlin/com/wire/apps/polls/setup/KtorInstallation.kt:35-61

The code creates three separate routing { } blocks for health, events, and metrics. While functional, this could be consolidated.

Recommendation: Consider consolidating routing blocks for better maintainability:

fun Application.setupRouting() {
    routing {
        get("/health") {
            call.response.status(HttpStatusCode.OK)
        }
        get("/metrics") {
            val prometheusRegistry by closestDI().instance<PrometheusMeterRegistry>()
            call.respond(prometheusRegistry.scrape())
        }
        events()
    }
}

4. Metric Naming Convention

The metric names follow Prometheus best practices:

  • Using _total suffix for counters
  • Using snake_case
  • Using descriptive prefixes (pollapp_)

🐛 Potential Bugs

1. Missing Error Handling ⚠️

Location: src/main/kotlin/com/wire/apps/polls/setup/KtorInstallation.kt:57-59

The /metrics endpoint calls prometheusRegistry.scrape() without any error handling. If the registry fails to scrape metrics, the endpoint will return a 500 error.

Recommendation: Add try-catch or let Ktor's default error handling manage it (current approach is acceptable but consider logging).

2. Metrics Incremented Before Operation Success ⚠️

Location: src/main/kotlin/com/wire/apps/polls/services/MessagesHandlingService.kt:83-84

Metrics are incremented before the actual operation (e.g., usageMetrics.onCreatePollCommand() before pollService.createPoll()). If createPoll() fails, the metric will still show the command was executed.

Recommendation: Consider tracking both attempts and successes, or move metric increment after successful operation.

⚡ Performance Considerations

  1. Counter Performance: ✅ Using Micrometer counters is performant and thread-safe
  2. Registry Scraping: ✅ The /metrics endpoint is appropriate for Prometheus pull model
  3. Overhead: ✅ Counter increment operations have negligible performance impact

🔒 Security Concerns

1. Metrics Endpoint Exposure ⚠️

Location: src/main/kotlin/com/wire/apps/polls/setup/KtorInstallation.kt:57

The /metrics endpoint is publicly accessible without authentication. While common for Prometheus setups, it exposes internal application metrics.

Recommendation:

  • Document that the endpoint should be restricted at the network/ingress level
  • Consider adding authentication if metrics contain sensitive information
  • The ServiceMonitor configuration suggests this is handled at the K8s level, which is appropriate

2. Information Disclosure 💡

Metrics will expose usage patterns (command frequencies), which could reveal information about user behavior.

Recommendation: This is generally acceptable for internal metrics, but ensure compliance with your data privacy policies.

🧪 Test Coverage

Excellent test coverage

  • All new metrics calls are verified in tests
  • Proper use of mocks for UsageMetrics
  • Tests verify metrics are called exactly once per command
  • Tests verify metrics are NOT called for commands that shouldn't track them

Minor suggestion: Consider adding integration tests for the /metrics endpoint itself.

📝 Additional Recommendations

  1. Version Dependency: The PR uses micrometer-registry-prometheus:1.16.0 (latest stable). Consider verifying compatibility with ktor-server-metrics-micrometer:3.2.3.

  2. Helm Chart: The ServiceMonitor resource looks correct, but ensure your Kubernetes cluster has the Prometheus Operator installed for this resource type to be recognized.

  3. Documentation: Consider adding metrics documentation:

    • What metrics are tracked
    • How to access them in Grafana
    • Alert thresholds (if applicable)
  4. Future Enhancement: Consider adding:

    • Response time histograms
    • Active polls gauge
    • Error rate counters
    • Vote action metrics (noticed handlePollAction doesn't track metrics)

📊 Files Changed - Quality Assessment

  • build.gradle.kts: Clean dependency additions
  • helm/pollapp/templates/servicemonitor.yaml: Proper K8s ServiceMonitor
  • ⚠️ src/main/kotlin/com/wire/apps/polls/services/MessagesHandlingService.kt: Good, but metrics timing concern
  • src/main/kotlin/com/wire/apps/polls/setup/DependencyInjection.kt: Clean DI setup
  • 💡 src/main/kotlin/com/wire/apps/polls/setup/KtorInstallation.kt: Good, minor organization suggestion
  • ⚠️ src/main/kotlin/com/wire/apps/polls/setup/metrics/UsageMetrics.kt: Unused delete metric
  • src/test/kotlin/com/wire/apps/polls/services/MessagesHandlingServiceTest.kt: Excellent test coverage

Verdict

Recommended: Approve with minor fixes

The PR successfully integrates Prometheus metrics with good test coverage and follows Kotlin/Prometheus best practices. The main concerns are:

  1. Remove or implement the unused deletePollCommandCounter
  2. Consider documenting the metrics endpoint security model
  3. Consider tracking command success vs. attempts

These can be addressed in this PR or as follow-up improvements. Great work overall!

@bbaarriiss bbaarriiss requested a review from a team November 26, 2025 09:23
@claude
Copy link
Copy Markdown

claude Bot commented Nov 26, 2025

Pull Request Review: Prometheus Metrics Integration

Thank you for adding Prometheus metrics to the Poll App! This is a valuable addition for monitoring usage. Here's my detailed review:

✅ Positive Aspects

  1. Good test coverage: The tests were properly updated to include the new UsageMetrics dependency
  2. Clean separation of concerns: The UsageMetrics class is well-isolated and follows single responsibility principle
  3. Proper DI integration: Metrics are correctly wired through Kodein dependency injection
  4. Code refactoring: The KtorInstallation.kt refactoring improves readability with separate setup functions
  5. Kubernetes integration: ServiceMonitor configuration looks correct for Prometheus operator

🔍 Issues and Recommendations

1. Critical: Missing Metrics for Vote Actions ⚠️

The PR tracks help and create poll commands, but doesn't track voting actions, which are likely the most frequent user interaction. Consider adding:

// In UsageMetrics.kt
private val voteActionCounter: Counter = Counter
    .builder("pollapp_vote_actions_total")
    .description("Number of poll votes received")
    .register(registry)

fun onVoteAction() {
    voteActionCounter.increment()
}

Then call it in MessagesHandlingService.kt:41 when handling VoteAction.

2. Bug: Incorrect Metric Increment for "/poll version" 🐛

In MessagesHandlingService.kt:73-74, the /poll version command doesn't increment any metrics, but the test at line 70 verifies that usageMetrics wasNot Called. However, this means you're not tracking version requests. Consider if this should be tracked.

3. Security: Missing Metrics Endpoint Authentication 🔒

The /metrics endpoint in KtorInstallation.kt:57-59 has no authentication. While metrics endpoints are often public in internal networks, consider:

  • Is this endpoint accessible externally?
  • Should it require authentication or be restricted to internal IPs?
  • Document security assumptions in comments

4. Performance: Counter Initialization on Every Request

The Counter.builder() pattern creates counters during class initialization, which is good. However, consider lazy initialization or adding thread-safety guarantees if PrometheusMeterRegistry could be accessed concurrently during startup.

5. Code Quality: Duplicate Comment

In MessagesHandlingService.kt:72 and 76, both comments say "send version when asked" but line 72 handles version and line 76 handles help. Fix the comment on line 76:

// send help when asked

6. Missing Metrics for Important Commands

Consider tracking:

  • "show results" actions (ShowResultsAction)
  • "good app" easter egg (for fun!)
  • Failed/invalid commands (error rate tracking)
  • Poll deletions (if supported)

7. Helm Chart: ServiceMonitor Configuration

The ServiceMonitor looks correct, but verify:

  • Does the Prometheus operator in your cluster have the right RBAC permissions?
  • Is there a scrape interval configured? Consider adding:
    spec:
      endpoints:
        - port: http
          path: /metrics
          interval: 30s  # Add scrape interval

8. Dependency Version

micrometer-registry-prometheus version 1.16.0 is from late 2024. Verify this is compatible with your Ktor version 3.2.3 and consider checking for any known issues.

9. Potential Enhancement: Add Labels

Consider adding labels to metrics for richer data:

private val createPollCommandCounter: Counter = Counter
    .builder("pollapp_create_poll_commands_total")
    .description("Number of Poll creation commands received")
    .tag("version", appVersion) // Track by version
    .register(registry)

10. Documentation

Consider adding:

  • A comment in UsageMetrics.kt explaining the naming convention (pollapp_*_total)
  • Documentation on how to access metrics in staging/production
  • Dashboard/alert recommendations

📊 Test Coverage Assessment

Test coverage is good but could be enhanced:

  • ✅ All commands that increment metrics have corresponding tests
  • ✅ Commands that shouldn't increment metrics are verified
  • ⚠️ Missing tests for metrics endpoint itself (e.g., does /metrics return valid Prometheus format?)
  • ⚠️ No test verifying counter values actually increment correctly

🎯 Priority Recommendations

High Priority:

  1. Add authentication/security consideration for /metrics endpoint
  2. Fix duplicate comment on line 76
  3. Add metrics for vote actions (the most important user action!)

Medium Priority:
4. Add metrics for show results actions
5. Consider adding labels for richer metrics
6. Add scrape interval to ServiceMonitor

Low Priority:
7. Document metrics conventions
8. Add integration test for /metrics endpoint
9. Consider tracking error rates

Summary

This is a solid implementation of basic metrics! The code is clean, well-tested, and properly integrated. With the suggested enhancements, especially tracking vote actions and securing the metrics endpoint, this will be production-ready. Great work! 🎉


Review generated with assistance from Claude Code

@bbaarriiss bbaarriiss merged commit c67bdc4 into main Nov 26, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants