-
Notifications
You must be signed in to change notification settings - Fork 46
How volume latency metrics are calculated
The perf metric values are calculated from two consecutive polls (therefore,
no metrics are emitted after the first poll). The calculation algorithm depends on the property and base-counter
attributes of each metric, the following properties are supported:
| property | formula | description |
|---|---|---|
| raw | x = xi | no post-processing, value x is submitted as it is |
| delta | x = xi - xi-1 | delta of two poll values, xi and xi-1 |
| rate | x = (xi - xi-1) / (ti - ti-1) | delta divided by the interval of the two polls in seconds |
| average | x = (xi - xi-1) / (yi - yi-1) | delta divided by the delta of the base counter y |
| percent | x = 100 * (xi - xi-1) / (yi - yi-1) | average multiplied by 100 |
latency_io_reqd special field used for latency only.
| parameter | type | description | default |
|---|---|---|---|
latency_io_reqd |
int, optional | threshold of IOPs for calculating latency metrics (latencies based on very few IOPs are unreliable) | 10 |
In case of latency calculation, base-counter is the mandatory for further processing. This belongs to the case of average from the above table.
Step1:
We first take delta of latency counter of current poll with the previous poll.
Step2:
Take delta of base-counter of current poll with previous poll. There is slight change as thresholds also involved in the calculation.
Step3:
There is latency_io_reqd optional field which used for controlling threshold value, default value is 10.
Based on the latency_io_reqd field value, minimumBase value has been decided and if delta of base-counter is greater than this minimumBase, then only current latency counters are being processed further else set as 0(zero) because latencies based on very few IOPs are unreliable.
Step4:
If Step3 condition fulfilled then apply average formula as delta of latency counter/ delta of base-counter and export latency counters.
Step5:
In case, delta of latency counter value is less than 0 or delta of base-counter value is less than 0 or any other case than Step4, latency counters are not exported.
2. Applying the thresholding where any counter ends with latency and it has a property of average or percent
Latency calculation would belongs to average case as per the above table. Also, instead of normal delta division for average, here it's applying delta division with thresholds DivideWithThreshold function. As mentioned in section1, latency_io_reqd field would be used for deriving minimumBase and based on that it will decide latency counters need to be processed or not as latencies based on very few IOPs are unreliable
These are the links where this logic would be used in all perf collectors .
There are some special case for thresholds while calculating the latencies where minimumBase value would not help because there would be some objects like ontaps3_svm and few latency metrics like optimal_point_latency and scan_latency where base counter would be always has a few IOPS. So, intentionally we have made minimumBase as 0 to ensure it should proceed further and calculate the latencies appropriately.
This is the link where this logic would be used in Harvest codeflow.
4. Comparison of Prometheus and/or Grafana, ONTAP CLI statistics, PAS, and System Manager screenshots for the same time period
This is the detail of latencies of the workload osc_vol01-wid33577 at the same time interval and compared between Prometheus, Grafana, ONTAP CLI, PAS, System Manager.
read latency values would be range between 250-450µs, write latency is 0µs.
Prometheus:
qos_read_latency:

qos_write_latency:

qos_latency:

Grafana:
This is from Volume dashboard

This is from Workload dashboard which is exactly same as above

ONTAP CLI:
umeng-aff300-01-02::> qos statistics volume latency show -volume osc_vol01 -iterations 10 -vserver osc
Workload ID Latency Network Cluster Data Disk QoS Max QoS Min NVRAM Cloud FlexCache SM Sync VA AVSCAN
--------------- ------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
-total- - 562.00us 58.00us 0ms 504.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 565.00us 58.00us 0ms 507.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 422.00us 60.00us 0ms 362.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 421.00us 60.00us 0ms 361.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 526.00us 55.00us 1.00us 470.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 531.00us 55.00us 0ms 476.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 522.00us 57.00us 0ms 465.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 522.00us 57.00us 0ms 465.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 572.00us 65.00us 0ms 493.00us 0ms 0ms 0ms 0ms 0ms 14.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 508.00us 57.00us 0ms 451.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 447.00us 71.00us 89.00us 286.00us 0ms 0ms 0ms 0ms 0ms 1.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 444.00us 59.00us 0ms 385.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 399.00us 71.00us 107.00us 221.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 381.00us 59.00us 0ms 322.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 2.57ms 54.00us 79.00us 2.19ms 0ms 0ms 0ms 0ms 0ms 244.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 469.00us 58.00us 0ms 411.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 3.32ms 75.00us 1.00us 2.49ms 0ms 0ms 0ms 0ms 0ms 754.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 525.00us 54.00us 0ms 471.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 3.36ms 78.00us 0ms 2.34ms 0ms 0ms 0ms 0ms 0ms 942.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 506.00us 55.00us 0ms 451.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
PAS:
read_latency:

write_latency:

other_latency:

System Manager: