-
Notifications
You must be signed in to change notification settings - Fork 177
chore: SAP HANA mixin modernization #1550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
aalhour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, the only real issue is the alerts, the other stuff are minor.
sap-hana-mixin/dashboards.libsonnet
Outdated
| dashboards+: | ||
| { | ||
| logs+: | ||
| root.applyCommon(super.logs.templating.list, uid=uid + '-logs', tags=tags, links=links { couchdbLogs+:: {} }, annotations=annotations, timezone=timezone, refresh=refresh, period=period), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: couchdbLogs --> sapHanaLogs
sap-hana-mixin/links.libsonnet
Outdated
|
|
||
| sapHanaInstance: | ||
| link.link.new('SAP HANA instance overview', '/d/' + this.grafana.dashboards['sap-hana-instance-overview.json'].uid) | ||
| + link.link.options.withKeepTime(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing withIncludeVars(true) in link.
sap-hana-mixin/links.libsonnet
Outdated
| new(this): { | ||
| sapHanaOverview: | ||
| link.link.new('SAP HANA system overview', '/d/' + this.grafana.dashboards['sap-hana-system-overview.json'].uid) | ||
| + link.link.options.withKeepTime(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing withIncludeVars(true) in link.
sap-hana-mixin/dashboards.libsonnet
Outdated
| vars.multiInstance, | ||
| uid + '-system-overview', | ||
| tags, | ||
| links, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should exclude the current dashboards from the links to avoid self-referencing issues?
{ sapHanaOverview+:: {} }
sap-hana-mixin/dashboards.libsonnet
Outdated
| vars.multiInstance, | ||
| uid + '-instance-overview', | ||
| tags, | ||
| links, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about excluding this dashboard from its own links.
sap-hana-mixin/panels.libsonnet
Outdated
| + g.panel.timeSeries.standardOptions.thresholds.withSteps([ | ||
| ]) | ||
| + g.panel.timeSeries.fieldConfig.defaults.custom.withThresholdsStyle({ mode: 'line' }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add thresholds here? In case it's empty?
| @@ -0,0 +1,221 @@ | |||
| function(this) { | |||
| local aggregationLabels = std.join(',', this.groupLabels + this.instanceLabels), | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: sometimes we use ',' and some other times we use ', ' (with a space), can we use one?
| @@ -0,0 +1,163 @@ | |||
| function(this) { | |||
| local aggregationLabels = std.join(', ', this.groupLabels + this.instanceLabels), | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: sometimes we use ',' and some other times we use ', ' (with a space), can we use one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use without(...) instead of by(...), from the docs:
sum by (job, sid, host) drops labels; prefer sum without (...) to preserve labels for alert routing
Dasomeone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments here, I think we could make better use of the common-lib, even if just as a replacement for the base panels (stat/table specifically)
sap-hana-mixin/config.libsonnet
Outdated
| local this = self, | ||
|
|
||
| // Filtering and label configuration | ||
| filteringSelector: 'job="integrations/sap-hana"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please drop integration job label
| { | ||
| alert: 'SapHanaHighSqlExecutionTime', | ||
| expr: ||| | ||
| avg without (database_name, port, service, sql_type) (hanadb_sql_service_elap_per_exec_avg_ms{%(filteringSelector)s}) / 1000 > %(alertsCriticalHighSqlExecutionTime)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor thing, but in conjunction with my other comment on the config:
Suggest we swap the alert threshold variables to milliseconds to match the native unit of the metric itself.
Would apply for both:
alertsCriticalHighSqlExecutionTimealertsCriticalHighReplicationShippingTime
| alertsCriticalHighSqlExecutionTime: 1, // second | ||
| alertsCriticalHighReplicationShippingTime: 1, // second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a thought: Given the native metrics use milliseconds we could just match that here, would simplify our alert expressions a bit and lower chance of issues during maintenance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to specify x,y throughout this file
| this.grafana.rows.instanceResourcesRow, | ||
| this.grafana.rows.instanceIORow, | ||
| this.grafana.rows.instancePerformanceRow, | ||
| this.grafana.rows.instanceAlertsRow, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking we'd want the alerts row up top
| this.grafana.rows.systemReplicationRow, | ||
| this.grafana.rows.systemResourcesRow, | ||
| this.grafana.rows.systemIORow, | ||
| this.grafana.rows.systemPerformanceRow, | ||
| this.grafana.rows.systemAlertsRow, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to reorganise this with:
- resources
- alerts
- replication
- performance
- io
Thoughts?
I'm not 100% on the order of rep/per/io specifically, but I do think resources and alerts should be at the top
| this.grafana.rows.instanceResourcesRow, | ||
| this.grafana.rows.instanceIORow, | ||
| this.grafana.rows.instancePerformanceRow, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as the system dashboard, generally speaking I think:
- Resources
- Alerts
- the remainder, in a similar pattern as whatever we agree on for system above
| systemReplicaStatusPanel: | ||
| g.panel.stat.new('Replica status') | ||
| + g.panel.stat.queryOptions.withTargets([ | ||
| signals.system.replicaStatus.asTarget(), | ||
| ]) | ||
| + signals.system.replicaStatus.asOverride() | ||
| + g.panel.stat.panelOptions.withDescription('State of the replicas in the SAP HANA system') | ||
| + g.panel.stat.options.withColorMode('value') | ||
| + g.panel.stat.options.withGraphMode('none') | ||
| + g.panel.stat.options.withTextMode('auto') | ||
| + g.panel.stat.standardOptions.thresholds.withMode('absolute') | ||
| + g.panel.stat.standardOptions.thresholds.withSteps([ | ||
| { color: 'green', value: null }, | ||
| { color: 'green', value: 0 }, | ||
| { color: 'yellow', value: 1 }, | ||
| { color: 'red', value: 3 }, | ||
| ]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're doing a lot of overrides here, does none of the base ones provided in common-lib come close?
I'd try to refrain from invoking g.panel.stat.new directly, and instead go via commonlib.panels.stat.base.new instead, so as to inherit any best practices (which you can then override, if needed)
Would help pull in future changes to our base panels
| + g.panel.timeSeries.standardOptions.withUnit('ms'), | ||
|
|
||
| systemActiveConnectionsPanel: | ||
| g.panel.stat.new('Active connections') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, would refrain from invoking g.panel.stat.new directly unless absolutely necessary.
Applies to all the stat panels in here
| + g.panel.stat.options.withTextMode('auto'), | ||
|
|
||
| systemRecentAlertsPanel: | ||
| g.panel.table.new('Recent alerts') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dasomeone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meant to hit "request changes" not just comment 😅
This PR modernizes the mixin to use g.libsonnet and commonlib where applicable.
It also migrates over to the signals API.
As of submission mostly validated against the git diff for changes.