Skip to content

Leader memory leak when metrics are exposed but not consumed #27013

@MattJustMatt

Description

@MattJustMatt

Nomad version

1.10.3

Operating system and Environment details

AlmaLinux 9

Issue

Nomad servers (leaders) leak memory over time when metrics aren't consumed

Reproduction steps

Enable Telemetry in server leader config
telemetry {
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}

Stop scraping metrics (our Promethus node went down for 24 hours)

Expected Result

No memory change

Actual Result

We started seeing leaders fill up on memory and fail. As soon as Promethus started scraping again memory usage stabilized

(Node was elected as a leader at 18:00, Promethus came back online and started scraping at 21:30)
Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Needs Roadmapping

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions