Skip to content

Add Seagate FARM log metrics, vdev-label support, deploy tooling, and Grafana dashboard#357

Open
mpyne1 wants to merge 16 commits into
prometheus-community:masterfrom
mpyne1:master
Open

Add Seagate FARM log metrics, vdev-label support, deploy tooling, and Grafana dashboard#357
mpyne1 wants to merge 16 commits into
prometheus-community:masterfrom
mpyne1:master

Conversation

@mpyne1

@mpyne1 mpyne1 commented Jun 17, 2026

Copy link
Copy Markdown

This PR adds Seagate FARM (Field Accessible Reliability Metrics) log support to smartctl_exporter, along with deployment tooling and a purpose-built Grafana dashboard.

FARM Log Metrics

Parses Seagate FARM log data from smartctl -j --log=farm and exposes the following Prometheus metric families:

  • Environment: temperature (current/min/max/highest), 12V and 5V rail voltage (current/min/max), motor power
  • Workload: read/write command counters
  • Error tracking: reallocated sectors, CRC errors, command timeouts, unrecoverable read/write errors
  • Reliability: MR head resistance per head, reallocated sectors per head, error rate, seek error rate, high priority unload events

FARM log collection is enabled via the --smartctl.farm-log flag and handles errors gracefully with limited concurrency to avoid overloading storage controllers.

vdev-label Support

Adds --smartctl.vdev-label flag that reads udev ID_VDEV properties to use ZFS vdev labels (e.g. enclosure slot identifiers) as the device label in metrics, making it easier to correlate metrics with physical disk locations.

Grafana Dashboard

Includes a comprehensive smartctl-farm-dashboard.json with panels for:

  • Device inventory table
  • Temperature (current and all types)
  • Read/write command rates
  • Error counters (reallocated sectors, CRC errors, command timeouts)
  • Unrecoverable read/write errors
  • 12V and 5V voltage rails with separate current, min, and max charts
  • Per-head MR resistance and reallocated sectors
  • Error rate / seek error rate and high priority unload events
  • Motor power

Template variables for instance, device, temperature type, and voltage type filtering.

Deploy Script

deploy.sh automates end-to-end deployment to one or more target hosts:

  • Builds a static binary (CGO_ENABLED=0)
  • Installs smartmontools if missing, copies binary and systemd service
  • Auto-detects ID_VDEV udev data and enables --smartctl.vdev-label
  • Auto-detects non-standard smartctl paths
  • Imports the Grafana dashboard via API if Grafana is running on the target
  • Configures Prometheus scrape targets (supports both scrape_config_files and inline prometheus.yml styles)

mpyne1 and others added 16 commits May 28, 2026 17:15
Add opt-in collection of Seagate FARM (Field Accessible Reliability
Metrics) log data via new --smartctl.farm-log CLI flag.

When enabled, passes --log=farm to smartctl and exports metrics from:
- Page 2: workload statistics (read/write commands, sectors read/written)
- Page 3: error statistics (unrecoverable errors, reallocated sectors,
  CRC errors, command timeouts) including per-head breakdowns
- Page 4: environment (temperature, humidity, voltage, motor power)
- Page 5: reliability (error rate, seek error rate, unload events,
  helium pressure) including per-head reallocations, skip-write detect,
  and MR head resistance

Requires smartmontools 7.4+ for FARM JSON output support.
- deploy.sh: builds and deploys to multiple hosts, auto-detects Grafana
- smartctl-farm-dashboard.json: Grafana dashboard for FARM metrics
- smartctl_exporter.yml: Prometheus scrape config
- systemd/smartctl_exporter.service: updated ExecStart with --smartctl.farm-log
- Install smartmontools if missing (dnf/apt)
- Print smartctl version to confirm FARM support (>=7.4)
- Check if Prometheus exists before updating scrape config
- Remove empty smartctl_exporter.yml (config generated dynamically)
- Add note that all hosts must be passed each run (config is replaced)
- Update dashboard JSON with instance variable and localhost filter
Uses udevadm to look up ID_VDEV property for each device, providing
physical slot numbers (e.g. 2-1, 3-15) as Prometheus device labels
instead of kernel device names. Falls back to /dev/sdX name if no
ID_VDEV is found.

Also updates deploy.sh with stop-before-copy, smartctl path detection,
Grafana dashboard import improvements, and dual Prometheus config support.
dashboard: split 12V and 5V voltage charts into separate current/min/…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant