Skip to content

Modernize LGTMP stack and add ntopng, AI/GPU, and OPNsense monitoring integrations#8

Merged
acester822 merged 3 commits into
mainfrom
copilot/update-lgtmp-stack-monitoring
May 18, 2026
Merged

Modernize LGTMP stack and add ntopng, AI/GPU, and OPNsense monitoring integrations#8
acester822 merged 3 commits into
mainfrom
copilot/update-lgtmp-stack-monitoring

Conversation

Copilot AI commented May 18, 2026

Copy link
Copy Markdown
Contributor

This PR updates the LGTMP stack to current stable container images and extends observability coverage with ntopng network telemetry, AI/LLM GPU metrics, and OPNsense firewall monitoring. It wires new integrations end-to-end across Compose, Alloy modules, Grafana provisioning, nginx routing, and README guidance while preserving backward compatibility defaults.

  • Stack image refresh (May 2026 baseline)

    • Updated core images in compose.yaml:
      • Alloy, Loki, Tempo, Mimir, Pyroscope, Grafana, nginx-unprivileged, MinIO
    • Kept image selection override-friendly via existing env-based image vars.
  • ntopng network monitoring

    • Added ntopng service in compose.yaml with:
      • persistent volume (ntopng_data)
      • scrape labels (metrics.grafana.com/*)
      • healthcheck
      • UI exposure on 3001:3000
    • Added nginx gateway route:
      • config/ntopng/gateway_ntopng.conf
      • mounted via gateway service + NTOPNG_HOST env default
    • Added Alloy scrape module:
      • config/alloy/modules/integrations/ntopng.alloy
  • AI/LLM monitoring capabilities

    • Added optional dcgm-exporter service under Compose gpu profile.
    • Extended node-exporter collectors for GPU-adjacent host telemetry (drm, hwmon).
    • Added Alloy AI module:
      • config/alloy/modules/integrations/ai-monitoring.alloy
      • supports GPU utilization/memory/temp, inference latency, token metrics, and LLM API endpoint health targets via env-configured endpoints.
    • Added sample AI dashboard:
      • config/grafana/provisioning/dashboards/monitoring/ai-monitoring-sample.json
  • OPNsense integration

    • Added Alloy OPNsense module:
      • config/alloy/modules/integrations/opnsense.alloy
      • supports firewall, traffic, gateway, and system-health metric families.
    • Added sample OPNsense dashboard:
      • config/grafana/provisioning/dashboards/monitoring/opnsense-monitoring-sample.json
  • Alloy and Grafana provisioning updates

    • Updated config/alloy/master.alloy to import and instantiate:
      • component_ntopng
      • component_ai_monitoring
      • component_opnsense
    • Added Grafana datasources provisioning for new integrations:
      • config/grafana/provisioning/datasources/integrations-datasources.yaml
      • NTOPNG Metrics, AI Metrics, OPNsense Metrics
  • Examples and docs

    • Added integration module docs and examples:
      • config/alloy/modules/integrations/README.md
      • config/alloy/modules/integrations/examples/ai-monitoring-example.alloy
      • config/alloy/modules/integrations/examples/opnsense-example.alloy
    • Updated README.md with:
      • new port mappings (ntopng, dcgm-exporter)
      • setup/config for ntopng, AI/LLM monitoring, OPNsense exporter
      • AI framework notes (OpenAI API, LangChain, LlamaIndex, Ollama)
      • refreshed Done/To Do checklist entries

Example (new Alloy module wiring in config/alloy/master.alloy):

import.file "integrations" {
  filename = coalesce(sys.env("ALLOY_MODULES_FOLDER"), "/etc/alloy/modules") + "/integrations"
}

integrations.component_ntopng "default" {
  forward_to      = [provider.self_hosted_stack.compose.metrics_receiver]
  scrape_interval = "30s"
}
Original prompt

Update LGTMP Stack with New Monitoring Capabilities

Overview

Modernize the LGTMP (Loki, Grafana, Tempo, Mimir, Pyroscope) monitoring stack and add new integrations for AI monitoring, network monitoring via ntopng, and OPNsense firewall monitoring.

Required Updates

1. Update Container Images to Latest Stable Versions

The current images are from 2024. Update to the latest stable versions (as of May 2026):

  • Grafana Alloy: Update from v1.9.2 to latest stable
  • Loki: Update from 3.5.1 to latest stable
  • Tempo: Update from 2.8.1 to latest stable
  • Mimir: Update from 2.16.1 to latest stable
  • Pyroscope: Update from 1.14.0 to latest stable
  • Grafana: Update from 12.0.2 to latest stable
  • Nginx: Update from 1.27-alpine to latest stable
  • MinIO: Update to latest release

2. Add ntopng Network Monitoring

Add ntopng service to monitor network traffic with:

  • Service definition in compose.yaml
  • Appropriate volumes for data persistence
  • Expose web UI on port (e.g., 3001)
  • Prometheus exporter integration for metrics scraping
  • Configure Alloy to scrape ntopng metrics
  • Add nginx gateway configuration for ntopng
  • Create datasource configuration in Grafana provisioning

3. Add AI/LLM Monitoring Capabilities

Add monitoring for AI workloads including:

  • NVIDIA DCGM Exporter service for GPU metrics (if applicable)
  • Node exporter with GPU collector for basic GPU stats
  • Create Alloy module for AI metrics collection (config/alloy/modules/integrations/ai-monitoring.alloy) that includes:
    • GPU utilization, memory, temperature metrics
    • Model inference latency metrics
    • Token generation metrics
    • LLM API endpoint monitoring
  • Add sample dashboard for AI monitoring in config/grafana/provisioning/dashboards/
  • Document configuration for common AI frameworks (OpenAI API, LangChain, LlamaIndex, Ollama)

4. Add OPNsense Integration

Add OPNsense firewall monitoring with:

  • Documentation on enabling Prometheus exporter in OPNsense
  • Alloy module for scraping OPNsense metrics (config/alloy/modules/integrations/opnsense.alloy) including:
    • Firewall rules metrics
    • Traffic statistics
    • Gateway status
    • System health metrics
  • Sample OPNsense dashboard in Grafana provisioning
  • Configuration examples in README

5. Update Alloy Configuration

Update config/alloy/master.alloy to import and configure new modules:

  • Import ntopng metrics module
  • Import AI monitoring module
  • Import OPNsense monitoring module
  • Ensure proper service discovery and labeling

6. Add Gateway Configuration

Create nginx gateway configurations for new services:

  • config/ntopng/gateway_ntopng.conf
  • Update config/nginx/nginx.conf to include ntopng routing

7. Update Documentation

Update README.md with:

  • New services in port mapping table (ntopng, DCGM exporter)
  • Setup instructions for ntopng
  • Setup instructions for AI monitoring (GPU setup, metrics endpoints)
  • Setup instructions for OPNsense integration
  • Updated "Done" checklist items
  • New "To Do" items if applicable
  • Updated installation instructions
  • Configuration examples for each new integration

8. Add Example Configuration Files

Create example configuration files:

  • config/alloy/modules/integrations/examples/ai-monitoring-example.alloy - Example AI metrics scraping
  • config/alloy/modules/integrations/examples/opnsense-example.alloy - Example OPNsense integration
  • Documentation in each module's directory explaining usage

Technical Requirements

  • Maintain backward compatibility with existing configurations
  • Follow existing patterns for service labels (metrics.grafana.com/, logs.grafana.com/)
  • Ensure all new services have proper health checks
  • Add appropriate volume mounts for data persistence
  • Use environment variables for configurable parameters
  • Follow the existing modular Alloy structure

Expected Outcomes

  • Fully updated and modernized LGTMP stack
  • Working ntopng integration with metrics flowing to Mimir
  • AI/GPU monitoring capabilities with sample dashboards
  • OPNsense firewall metrics integration with documentation
  • Clear documentation for setup and configuration
  • All services properly integrated with the nginx gateway
  • Alloy properly configured to collect all new metrics

This pull request was created from Copilot chat.

Copilot AI and others added 2 commits May 18, 2026 16:40
Copilot AI changed the title [WIP] Update LGTMP stack with new monitoring capabilities Modernize LGTMP stack and add ntopng, AI/GPU, and OPNsense monitoring integrations May 18, 2026
Copilot AI requested a review from acester822 May 18, 2026 16:42
@acester822 acester822 marked this pull request as ready for review May 18, 2026 20:31
Copilot AI review requested due to automatic review settings May 18, 2026 20:31
@acester822 acester822 merged commit 9db8cb0 into main May 18, 2026
1 of 2 checks passed
Copilot AI removed the request for review from Copilot May 18, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants