Skip to content

Commit 133b9f2

Browse files
authored
chore: IIS Modernization (#1542)
* modernize the iis mxin * make dashboards_out * fix lint * use commonlib a tad bit more for panels * PR feedback/good catches on missing panels on the overview dashboard * modify config.libsonnet to be more easily understandable that its an expandable array * fix description after alert change * remove filteringSelector from public mixin; withKeepVars; increase with offsets
1 parent 7c3d9f0 commit 133b9f2

21 files changed

+2404
-4607
lines changed

microsoft-iis-mixin/README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@ The Microsoft IIS mixin contains the following dashboards:
99

1010
and the following alerts:
1111

12-
- MicrosoftIISHighNumberOfRejectedAsyncIORequests
13-
- MicrosoftIISHighNumberOf5xxRequestErrors
14-
- MicrosoftIISLowSuccessRateForWebsocketConnections
15-
- MicrosoftIISThreadpoolUtilizationNearingMax
16-
- MicrosoftIISHighNumberOfWorkerProcessFailures
12+
- MicrosoftIISRejectedAsyncIORequests
13+
- MicrosoftIIS5xxRequestErrors
14+
- MicrosoftIISSuccessRateForWebsocket
15+
- MicrosoftIISThreadpoolUtilization
16+
- MicrosoftIISWorkerProcessFailures
1717

1818
Default thresholds can be configured in `config,libsonnet`
1919

@@ -57,11 +57,11 @@ The Microsoft IIS applications dashboard provides details on worker requests, we
5757
![Screenshot3 of the applications dashboard](https://storage.googleapis.com/grafanalabs-integration-assets/iis/screenshots/application-3.png)
5858
## Alerts overview
5959

60-
MicrosoftIISHighNumberOfRejectedAsyncIORequests: There are a high number of rejected async I/O requests for a site.
61-
MicrosoftIISHighNumberOf5xxRequestErrors: There are a high number of 5xx request errors for an application.
62-
MicrosoftIISLowSuccessRateForWebsocketConnections: There is a low success rate for websocket connections for an application.
63-
MicrosoftIISThreadpoolUtilizationNearingMax: The thread pool utilization is nearing max capacity.
64-
MicrosoftIISHighNumberOfWorkerProcessFailures: There are a high number of worker process failures for an application.
60+
- `MicrosoftIISRejectedAsyncIORequests`: There are a high number of rejected async I/O requests for a site.
61+
- `MicrosoftIIS5xxRequestErrors`: There are a high number of 5xx request errors for an application.
62+
- `MicrosoftIISSuccessRateForWebsocket`: There is a low success rate for websocket connections for an application.
63+
- `MicrosoftIISThreadpoolUtilization`: The thread pool utilization is nearing max capacity.
64+
- `MicrosoftIISWorkerProcessFailures`: There are a high number of worker process failures for an application.
6565

6666
## Install Tools
6767

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
{
2+
new(this): {
3+
groups: [
4+
{
5+
name: 'MicrosoftIISAlerts',
6+
rules: [
7+
{
8+
alert: 'MicrosoftIISRejectedAsyncIORequests',
9+
expr: |||
10+
increase(windows_iis_rejected_async_io_requests_total{%(filteringSelector)s}[5m]) > %(alertsWarningHighRejectedAsyncIORequests)s
11+
||| % this.config,
12+
'for': '5m',
13+
labels: {
14+
severity: 'warning',
15+
},
16+
annotations: {
17+
summary: 'There are a high number of rejected async I/O requests for a site.',
18+
description:
19+
('The number of rejected async IO requests is {{ printf "%%.0f" $value }} over the last 5m on {{ $labels.instance }} - {{ $labels.site }}, ' +
20+
'which is above the threshold of %(alertsWarningHighRejectedAsyncIORequests)s.') % this.config,
21+
},
22+
},
23+
{
24+
alert: 'MicrosoftIIS5xxRequestErrors',
25+
expr: |||
26+
sum without (pid, status_code)(increase(windows_iis_worker_request_errors_total{status_code=~"5.."%(filteringSelector)s}[5m])) > %(alertsCriticalHigh5xxRequests)s
27+
||| % (this.config { filteringSelector: if this.config.filteringSelector != '' then ',' + this.config.filteringSelector else '' }),
28+
'for': '5m',
29+
labels: {
30+
severity: 'critical',
31+
},
32+
annotations: {
33+
summary: 'There are a high number of 5xx request errors for an application.',
34+
description:
35+
('The number of 5xx request errors is {{ printf "%%.0f" $value }} over the last 5m on {{ $labels.instance }} - {{ $labels.app }}, ' +
36+
'which is above the threshold of %(alertsCriticalHigh5xxRequests)s.') % this.config,
37+
},
38+
},
39+
{
40+
alert: 'MicrosoftIISSuccessRateForWebsocket',
41+
expr: |||
42+
sum without (pid) (increase(windows_iis_worker_websocket_connection_accepted_total{%(filteringSelector)s}[5m]) / clamp_min(increase(windows_iis_worker_websocket_connection_attempts_total{%(filteringSelector)s}[5m]),1)) * 100 < %(alertsCriticalLowWebsocketConnectionSuccessRate)s
43+
||| % this.config,
44+
'for': '5m',
45+
labels: {
46+
severity: 'critical',
47+
},
48+
annotations: {
49+
summary: 'There is a low success rate for websocket connections for an application.',
50+
description:
51+
('The success rate for websocket connections is {{ printf "%%.0f" $value }} over the last 5m on {{ $labels.instance }} - {{ $labels.app }}, ' +
52+
'which is below the threshold of %(alertsCriticalLowWebsocketConnectionSuccessRate)s.') % this.config,
53+
},
54+
},
55+
{
56+
alert: 'MicrosoftIISThreadpoolUtilization',
57+
expr: |||
58+
sum without (pid, state)(windows_iis_worker_threads{%(filteringSelector)s} / windows_iis_worker_max_threads{%(filteringSelector)s}) * 100 > %(alertsCriticalHighThreadPoolUtilization)s
59+
||| % this.config,
60+
'for': '5m',
61+
labels: {
62+
severity: 'critical',
63+
},
64+
annotations: {
65+
summary: 'The thread pool utilization is nearing max capacity.',
66+
description:
67+
('The threadpool utilization is at {{ printf "%%.0f" $value }} over the last 5m on {{ $labels.instance }} - {{ $labels.app }}, ' +
68+
'which is above the threshold of %(alertsCriticalHighThreadPoolUtilization)s.') % this.config,
69+
},
70+
},
71+
{
72+
alert: 'MicrosoftIISWorkerProcessFailures',
73+
expr: |||
74+
increase(windows_iis_total_worker_process_failures{%(filteringSelector)s}[5m]) > %(alertsWarningHighWorkerProcessFailures)s
75+
||| % this.config,
76+
'for': '5m',
77+
labels: {
78+
severity: 'warning',
79+
},
80+
annotations: {
81+
summary: 'There are a high number of worker process failures for an application.',
82+
description:
83+
('The number of worker process failures is at {{ printf "%%.0f" $value }} over the last 5m on {{ $labels.instance }} - {{ $labels.app }}, ' +
84+
'which is above the threshold of %(alertsWarningHighWorkerProcessFailures)s.') % this.config,
85+
},
86+
},
87+
],
88+
},
89+
],
90+
},
91+
}

microsoft-iis-mixin/alerts/alerts.libsonnet

Lines changed: 0 additions & 91 deletions
This file was deleted.
Lines changed: 33 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,38 @@
11
{
2-
_config+:: {
3-
dashboardTags: ['microsoft-iis-mixin'],
4-
dashboardPeriod: 'now-1h',
5-
dashboardTimezone: 'default',
6-
dashboardRefresh: '1m',
2+
local this = self,
3+
filteringSelector: '', // set to apply static filters to all queries and alerts, i.e. job="bar"
4+
groupLabels: ['job'],
5+
instanceLabels: ['instance'],
6+
logLabels: ['job', 'instance'],
77

8-
// alerts thresholds
9-
alertsWarningHighRejectedAsyncIORequests: 20,
10-
alertsCriticalHigh5xxRequests: 5,
11-
alertsCriticalLowWebsocketConnectionSuccessRate: 80,
12-
alertsCriticalHighThreadPoolUtilization: 90,
13-
alertsWarningHighWorkerProcessFailures: 10,
148

15-
enableLokiLogs: true,
9+
// Dashboard settings
10+
dashboardTags: [this.uid + '-mixin'],
11+
uid: 'microsoft-iis',
12+
dashboardNamePrefix: 'Microsoft IIS',
13+
dashboardPeriod: 'now-30m',
14+
dashboardTimezone: 'default',
15+
dashboardRefresh: '1m',
16+
17+
// Logs configuration
18+
enableLokiLogs: true,
19+
extraLogLabels: ['level'], // Required by logs-lib
20+
logsVolumeGroupBy: 'level',
21+
showLogsVolume: true,
22+
23+
// Alert thresholds
24+
alertsWarningHighRejectedAsyncIORequests: 20, // count
25+
alertsCriticalHigh5xxRequests: 5, // %
26+
alertsCriticalLowWebsocketConnectionSuccessRate: 80, // %
27+
alertsCriticalHighThreadPoolUtilization: 90, // %
28+
alertsWarningHighWorkerProcessFailures: 10, // count
29+
30+
// Metrics source
31+
metricsSource: ['prometheus'],
32+
33+
// Signal definitions grouped by dashboard
34+
signals+: {
35+
overview: (import './signals/overview.libsonnet')(this),
36+
applications: (import './signals/applications.libsonnet')(this),
1637
},
1738
}

0 commit comments

Comments
 (0)