Skip to content

Conversation

@schmikei
Copy link
Contributor

@schmikei schmikei commented Nov 20, 2025

This PR modernizes the mixin to use g.libsonnet and commonlib where applicable.

It also migrates over to the signals API.

As of submission mostly validated against the git diff for changes.

@schmikei schmikei marked this pull request as ready for review November 25, 2025 21:32
@schmikei schmikei requested a review from a team as a code owner November 25, 2025 21:32
Copy link
Contributor

@aalhour aalhour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, the only real issue is the alerts, the other stuff are minor.

dashboards+:
{
logs+:
root.applyCommon(super.logs.templating.list, uid=uid + '-logs', tags=tags, links=links { couchdbLogs+:: {} }, annotations=annotations, timezone=timezone, refresh=refresh, period=period),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: couchdbLogs --> sapHanaLogs


sapHanaInstance:
link.link.new('SAP HANA instance overview', '/d/' + this.grafana.dashboards['sap-hana-instance-overview.json'].uid)
+ link.link.options.withKeepTime(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing withIncludeVars(true) in link.

new(this): {
sapHanaOverview:
link.link.new('SAP HANA system overview', '/d/' + this.grafana.dashboards['sap-hana-system-overview.json'].uid)
+ link.link.options.withKeepTime(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing withIncludeVars(true) in link.

vars.multiInstance,
uid + '-system-overview',
tags,
links,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should exclude the current dashboards from the links to avoid self-referencing issues?
{ sapHanaOverview+:: {} }

vars.multiInstance,
uid + '-instance-overview',
tags,
links,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about excluding this dashboard from its own links.

Comment on lines 35 to 37
+ g.panel.timeSeries.standardOptions.thresholds.withSteps([
])
+ g.panel.timeSeries.fieldConfig.defaults.custom.withThresholdsStyle({ mode: 'line' }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add thresholds here? In case it's empty?

@@ -0,0 +1,221 @@
function(this) {
local aggregationLabels = std.join(',', this.groupLabels + this.instanceLabels),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: sometimes we use ',' and some other times we use ', ' (with a space), can we use one?

@@ -0,0 +1,163 @@
function(this) {
local aggregationLabels = std.join(', ', this.groupLabels + this.instanceLabels),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: sometimes we use ',' and some other times we use ', ' (with a space), can we use one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use without(...) instead of by(...), from the docs:

sum by (job, sid, host) drops labels; prefer sum without (...) to preserve labels for alert routing

@aalhour aalhour self-assigned this Dec 11, 2025
Copy link
Member

@Dasomeone Dasomeone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments here, I think we could make better use of the common-lib, even if just as a replacement for the base panels (stat/table specifically)

local this = self,

// Filtering and label configuration
filteringSelector: 'job="integrations/sap-hana"',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop integration job label

{
alert: 'SapHanaHighSqlExecutionTime',
expr: |||
avg without (database_name, port, service, sql_type) (hanadb_sql_service_elap_per_exec_avg_ms{%(filteringSelector)s}) / 1000 > %(alertsCriticalHighSqlExecutionTime)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor thing, but in conjunction with my other comment on the config:

Suggest we swap the alert threshold variables to milliseconds to match the native unit of the metric itself.

Would apply for both:

  • alertsCriticalHighSqlExecutionTime
  • alertsCriticalHighReplicationShippingTime

Comment on lines +24 to +25
alertsCriticalHighSqlExecutionTime: 1, // second
alertsCriticalHighReplicationShippingTime: 1, // second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a thought: Given the native metrics use milliseconds we could just match that here, would simplify our alert expressions a bit and lower chance of issues during maintenance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to specify x,y throughout this file

this.grafana.rows.instanceResourcesRow,
this.grafana.rows.instanceIORow,
this.grafana.rows.instancePerformanceRow,
this.grafana.rows.instanceAlertsRow,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking we'd want the alerts row up top

Comment on lines +28 to +32
this.grafana.rows.systemReplicationRow,
this.grafana.rows.systemResourcesRow,
this.grafana.rows.systemIORow,
this.grafana.rows.systemPerformanceRow,
this.grafana.rows.systemAlertsRow,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to reorganise this with:

  1. resources
  2. alerts
  3. replication
  4. performance
  5. io

Thoughts?

I'm not 100% on the order of rep/per/io specifically, but I do think resources and alerts should be at the top

Comment on lines +54 to +56
this.grafana.rows.instanceResourcesRow,
this.grafana.rows.instanceIORow,
this.grafana.rows.instancePerformanceRow,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as the system dashboard, generally speaking I think:

  1. Resources
  2. Alerts
  3. the remainder, in a similar pattern as whatever we agree on for system above

Comment on lines +10 to +26
systemReplicaStatusPanel:
g.panel.stat.new('Replica status')
+ g.panel.stat.queryOptions.withTargets([
signals.system.replicaStatus.asTarget(),
])
+ signals.system.replicaStatus.asOverride()
+ g.panel.stat.panelOptions.withDescription('State of the replicas in the SAP HANA system')
+ g.panel.stat.options.withColorMode('value')
+ g.panel.stat.options.withGraphMode('none')
+ g.panel.stat.options.withTextMode('auto')
+ g.panel.stat.standardOptions.thresholds.withMode('absolute')
+ g.panel.stat.standardOptions.thresholds.withSteps([
{ color: 'green', value: null },
{ color: 'green', value: 0 },
{ color: 'yellow', value: 1 },
{ color: 'red', value: 3 },
]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're doing a lot of overrides here, does none of the base ones provided in common-lib come close?

I'd try to refrain from invoking g.panel.stat.new directly, and instead go via commonlib.panels.stat.base.new instead, so as to inherit any best practices (which you can then override, if needed)

Would help pull in future changes to our base panels

+ g.panel.timeSeries.standardOptions.withUnit('ms'),

systemActiveConnectionsPanel:
g.panel.stat.new('Active connections')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, would refrain from invoking g.panel.stat.new directly unless absolutely necessary.

Applies to all the stat panels in here

+ g.panel.stat.options.withTextMode('auto'),

systemRecentAlertsPanel:
g.panel.table.new('Recent alerts')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@Dasomeone Dasomeone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meant to hit "request changes" not just comment 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants