Skip to content

Commit d98635b

Browse files
ktx-krupaktx-abhishek
authored andcommitted
Add distinct values documentation and related images (#258)
1 parent c14d811 commit d98635b

File tree

4 files changed

+47
-1
lines changed

4 files changed

+47
-1
lines changed
417 KB
Loading
608 KB
Loading

docs/user-guide/actions/actions-in-openobserve.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ description: >-
66
This guide explains what Actions are, their types, and use cases.
77

88
!!! info "Availability"
9-
This feature is available in Enterprise Edition and Cloud. Not available in Open Source.
9+
This feature is available in Enterprise Edition. Not available in Open Source and Cloud.
1010

1111
## What are Actions
1212
Actions in OpenObserve are user-defined Python scripts that support custom automation workflows. They can be applied to log data directly from the Logs UI or used as alert destinations.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: Distinct Values Stream in OpenObserve
3+
description: ollects unique values during ingestion, stores them in metadata streams, and supports faster distinct queries in OpenObserve.
4+
---
5+
This document explains how the distinct values feature in OpenObserve works.
6+
## Overview
7+
The distinct values feature automatically collects unique values for a stream when data is ingested. The system writes these values to disk at a defined interval. Distinct values are stored in a special stream named `distinct_values`, which is used to accelerate distinct queries.
8+
!!! note "Who can access it"
9+
By default, the `Root` user has access. Access for other users is managed through **IAM** permissions in the **Metadata** module.
10+
11+
![access to distinct values stream](../../images/distinct-values-access.png)
12+
!!! note "Where to find it"
13+
Distinct values are written into automatically created metadata streams. The naming pattern is `distinct_values_<type>_<stream>`. For example, For example: `distinct_values_logs_default` and `distinct_values_logs_k8s_events`.
14+
## Environment Variables
15+
| Variable | Description | Default |
16+
| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
17+
| `ZO_DISTINCT_VALUES_INTERVAL` | Defines how often distinct values collected during ingestion are written from memory to the `distinct_values` stream on disk. This prevents frequent small writes by batching distinct values at the configured interval. | `10s` |
18+
| `ZO_DISTINCT_VALUES_HOURLY` | Enables hourly deduplication of distinct values stored in the `distinct_values` stream. When set to true, repeated values within one hour are merged into a single record, and a count of occurrences is logged. | `false` |
19+
## How it works
20+
1. During ingestion, OpenObserve automatically collects distinct values for each stream.
21+
2. These values are stored in memory and written to disk in the `distinct_values_<type>_<stream>` under **Streams > Metadata** at intervals defined by `ZO_DISTINCT_VALUES_INTERVAL`.
22+
![metadata distinct values](../../images/metadata-distinct-values.png)
23+
3. If `ZO_DISTINCT_VALUES_HOURLY` is enabled, values in the `distinct_values` stream are further deduplicated at the hourly level, with counts aggregated.
24+
- The `distinct_values` streams help accelerate `DISTINCT` queries by using pre-computed distinct values instead of scanning all ingested logs.
25+
## Example
26+
Ingested data:
27+
```json
28+
2025/09/10T10:00:01Z, job=test, level=info, service=test, request_id=123
29+
2025/09/10T10:00:02Z, job=test, level=info, service=test, request_id=124
30+
2025/09/10T10:01:03Z, job=test, level=info, service=test, request_id=123
31+
2025/09/10T10:10:00Z, job=test, level=info, service=test, request_id=123
32+
2025/09/10T11:10:00Z, job=test, level=info, service=test, request_id=123
33+
```
34+
With `ZO_DISTINCT_VALUES_INTERVAL=10s`, the system first collects values in memory and then writes to disk:
35+
```yaml
36+
2025/09/10T10:00:01Z request_id: 123, count: 2
37+
2025/09/10T10:00:02Z request_id: 124, count: 1
38+
2025/09/10T10:10:02Z request_id: 123, count: 1
39+
2025/09/10T11:10:02Z request_id: 123, count: 1
40+
```
41+
If `ZO_DISTINCT_VALUES_HOURLY=true`, the system merges values by hour:
42+
```yaml
43+
2025/09/10T10:00:01Z request_id: 123, count: 3
44+
2025/09/10T10:00:02Z request_id: 124, count: 1
45+
2025/09/10T11:10:02Z request_id: 123, count: 1
46+
```

0 commit comments

Comments
 (0)