|
2 | 2 |
|
3 | 3 | [](https://github.com/kubernetes-monitoring/kubernetes-mixin/actions/workflows/ci.yaml) |
4 | 4 |
|
5 | | -> NOTE: This project is *pre-release* stage. Flags, configuration, behaviour and design may change significantly in following releases. |
6 | | -
|
7 | 5 | A set of Grafana dashboards and Prometheus alerts for Kubernetes. |
8 | 6 |
|
| 7 | +## Local development |
| 8 | + |
| 9 | +Run the following command to setup a local [kind](https://kind.sigs.k8s.io) cluster: |
| 10 | + |
| 11 | +```shell |
| 12 | +make dev |
| 13 | +``` |
| 14 | + |
| 15 | +You should see the following output if successful: |
| 16 | + |
| 17 | +```shell |
| 18 | +╔═══════════════════════════════════════════════════════════════╗ |
| 19 | +║ 🚀 Development Environment Ready! 🚀 ║ |
| 20 | +║ ║ |
| 21 | +║ Run `make dev-port-forward` ║ |
| 22 | +║ Grafana will be available at http://localhost:3000 ║ |
| 23 | +║ ║ |
| 24 | +║ Data will be available in a few minutes. ║ |
| 25 | +║ ║ |
| 26 | +║ Dashboards will refresh every 10s, run `make generate` ║ |
| 27 | +║ and refresh your browser to see the changes. ║ |
| 28 | +║ ║ |
| 29 | +║ Alert and recording rules require `make dev-reload`. ║ |
| 30 | +║ ║ |
| 31 | +╚═══════════════════════════════════════════════════════════════╝ |
| 32 | +``` |
| 33 | + |
| 34 | +To delete the cluster, run the following: |
| 35 | + |
| 36 | +```shell |
| 37 | +make dev-down |
| 38 | +``` |
| 39 | + |
9 | 40 | ## Releases |
10 | 41 |
|
11 | 42 | > Note: Releases up until `release-0.12` are changes in their own branches. Changelogs are included in releases starting from [version-0.13.0](https://github.com/kubernetes-monitoring/kubernetes-mixin/releases/tag/version-0.13.0). |
@@ -33,7 +64,7 @@ Some alerts now use Prometheus filters made available in Prometheus 2.11.0, whic |
33 | 64 |
|
34 | 65 | Warning: This compatibility matrix was initially created based on experience, we do not guarantee the compatibility, it may be updated based on new learnings. |
35 | 66 |
|
36 | | -Warning: By default the expressions will generate *grafana 7.2+* compatible rules using the *$__rate_interval* variable for rate functions. If you need backward compatible rules please set *grafana72: false* in your *_config* |
| 67 | +Warning: By default the expressions will generate *grafana 7.2+* compatible rules using the *$\_\_rate_interval* variable for rate functions. If you need backward compatible rules please set *grafana72: false* in your *\_config* |
37 | 68 |
|
38 | 69 | ### Release steps |
39 | 70 |
|
@@ -75,6 +106,7 @@ node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate5m |
75 | 106 | This mixin is designed to be vendored into the repo with your infrastructure config. To do this, use [jsonnet-bundler](https://github.com/jsonnet-bundler/jsonnet-bundler): |
76 | 107 |
|
77 | 108 | You then have three options for deploying your dashboards |
| 109 | + |
78 | 110 | 1. Generate the config files and deploy them yourself |
79 | 111 | 2. Use ksonnet to deploy this mixin along with Prometheus and Grafana |
80 | 112 | 3. Use prometheus-operator to deploy this mixin (TODO) |
@@ -109,11 +141,12 @@ The `prometheus_alerts.yaml` and `prometheus_rules.yaml` file then need to passe |
109 | 141 | ### Dashboards for Windows Nodes |
110 | 142 |
|
111 | 143 | There exist separate dashboards for windows resources. |
112 | | -1) Compute Resources / Cluster(Windows) |
113 | | -2) Compute Resources / Namespace(Windows) |
114 | | -3) Compute Resources / Pod(Windows) |
115 | | -4) USE Method / Cluster(Windows) |
116 | | -5) USE Method / Node(Windows) |
| 144 | + |
| 145 | +1. Compute Resources / Cluster(Windows) |
| 146 | +2. Compute Resources / Namespace(Windows) |
| 147 | +3. Compute Resources / Pod(Windows) |
| 148 | +4. USE Method / Cluster(Windows) |
| 149 | +5. USE Method / Node(Windows) |
117 | 150 |
|
118 | 151 | These dashboards are based on metrics populated by [windows-exporter](https://github.com/prometheus-community/windows_exporter) from each Windows node. |
119 | 152 |
|
@@ -270,14 +303,14 @@ Same result can be achieved by modyfying the existing `config.libsonnet` with th |
270 | 303 |
|
271 | 304 | While the community has not yet fully agreed on alert severities and their to be used, this repository assumes the following paradigms when setting the severities: |
272 | 305 |
|
273 | | -* Critical: An issue, that needs to page a person to take instant action |
274 | | -* Warning: An issue, that needs to be worked on but in the regular work queue or for during office hours rather than paging the oncall |
275 | | -* Info: Is meant to support a trouble shooting process by informing about a non-normal situation for one or more systems but not worth a page or ticket on its own. |
| 306 | +- Critical: An issue, that needs to page a person to take instant action |
| 307 | +- Warning: An issue, that needs to be worked on but in the regular work queue or for during office hours rather than paging the oncall |
| 308 | +- Info: Is meant to support a trouble shooting process by informing about a non-normal situation for one or more systems but not worth a page or ticket on its own. |
276 | 309 |
|
277 | 310 | ### Architecture and Technical Decisions |
278 | 311 |
|
279 | | -* For more motivation, see "[The RED Method: How to instrument your services](https://kccncna17.sched.com/event/CU8K/the-red-method-how-to-instrument-your-services-b-tom-wilkie-kausal?iframe=no&w=100%&sidebar=yes&bg=no)" talk from CloudNativeCon Austin. |
280 | | -* For more information about monitoring mixins, see this [design doc](DESIGN.md). |
| 312 | +- For more motivation, see "[The RED Method: How to instrument your services](https://kccncna17.sched.com/event/CU8K/the-red-method-how-to-instrument-your-services-b-tom-wilkie-kausal?iframe=no&w=100%&sidebar=yes&bg=no)" talk from CloudNativeCon Austin. |
| 313 | +- For more information about monitoring mixins, see this [design doc](DESIGN.md). |
281 | 314 |
|
282 | 315 | ## Note |
283 | 316 |
|
|
0 commit comments