Skip to content

Conversation

@jrasell
Copy link
Member

@jrasell jrasell commented Oct 24, 2025

Note: new improvement, do not merge until after 1.11 GA.

Nomad 1.11 introduced a new peer cache to store information about Serf peers in order to perform server version checking. This sub-system provides a good place to store all cached Serf peer information which is also stored in the server object.

This change therefore moves the peer and local peer information from the server object into the peer cache backend. It means we have a single place to query this information as well as a single place to reflect Serf membership updates.

Links

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

Nomad 1.11 introduced a new peer cache to store information about
Serf peers in order to perform server version checking. This
sub-system provides a good place to store all cached Serf peer
information which is also stored in the server object.

This change therefore moves the peer and local peer information
from the server object into the peer cache backend. It means we
have a single place to query this information as well as a single
place to reflect Serf membership updates.
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up that you might want to look at the ENT code base for consumers of these fields, in particular the Autopilot ENT features.

@jrasell jrasell added this to the 1.11.x milestone Oct 27, 2025
@jrasell jrasell marked this pull request as ready for review November 12, 2025 09:54
@jrasell jrasell requested review from a team as code owners November 12, 2025 09:54
// If we reached this point then it's a new member, so append it to the
// exiting array.
p.peers[parts.Region] = append(existing, parts)
// Add ot the list if not known
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to :)


for _, m := range event.Members {
if ok, parts := IsNomadServer(m); ok {
p.peerDeleteLocked(p.allPeers, parts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to delete from alivePeers or localPeers here as well?

// PeerSet adds or updates the given parts in the cache. This should be called
// when a new peer is detected or an existing peer changes is status.
func (p *PeerCache) PeerSet(parts *Parts) {
func (p *PeerCache) PeerSet(parts *Parts, localRegion string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this UpdatePeerSet or SetPeers or something like that to make it clear this is a write? When seeing it at the call site (ex. serf.go) it was a little unexpected.

local1 := len(s1.peersCache.LocalPeers())
if n1 != 2 {
return false, fmt.Errorf("bad: %#v", n1)
return false, fmt.Errorf("bad1: %#v", n1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the lines of your edit here, I don't love these old "bad: %#v" test errors because they don't tell someone who's less familiar with the code why this is bad. They're unfortunately really common in the older parts of the code. Any chance we can improve these messages as long we're touching them?

Datacenter: v.Datacenter,
})
}
reply.Servers = n.srv.peersCache.LocalPeersServerInfo()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having to hold the lock for the rest of this function is a nice improvement 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants