|
| 1 | +--- |
| 2 | +title: Distinct Values Stream in OpenObserve |
| 3 | +description: ollects unique values during ingestion, stores them in metadata streams, and supports faster distinct queries in OpenObserve. |
| 4 | +--- |
| 5 | +This document explains how the distinct values feature in OpenObserve works. |
| 6 | +## Overview |
| 7 | +The distinct values feature automatically collects unique values for a stream when data is ingested. The system writes these values to disk at a defined interval. Distinct values are stored in a special stream named `distinct_values`, which is used to accelerate distinct queries. |
| 8 | +!!! note "Who can access it" |
| 9 | + By default, the `Root` user has access. Access for other users is managed through **IAM** permissions in the **Metadata** module. |
| 10 | + |
| 11 | +  |
| 12 | +!!! note "Where to find it" |
| 13 | + Distinct values are written into automatically created metadata streams. The naming pattern is `distinct_values_<type>_<stream>`. For example, For example: `distinct_values_logs_default` and `distinct_values_logs_k8s_events`. |
| 14 | +## Environment Variables |
| 15 | +| Variable | Description | Default | |
| 16 | +| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | |
| 17 | +| `ZO_DISTINCT_VALUES_INTERVAL` | Defines how often distinct values collected during ingestion are written from memory to the `distinct_values` stream on disk. This prevents frequent small writes by batching distinct values at the configured interval. | `10s` | |
| 18 | +| `ZO_DISTINCT_VALUES_HOURLY` | Enables hourly deduplication of distinct values stored in the `distinct_values` stream. When set to true, repeated values within one hour are merged into a single record, and a count of occurrences is logged. | `false` | |
| 19 | +## How it works |
| 20 | +1. During ingestion, OpenObserve automatically collects distinct values for each stream. |
| 21 | +2. These values are stored in memory and written to disk in the `distinct_values_<type>_<stream>` under **Streams > Metadata** at intervals defined by `ZO_DISTINCT_VALUES_INTERVAL`. |
| 22 | + |
| 23 | +3. If `ZO_DISTINCT_VALUES_HOURLY` is enabled, values in the `distinct_values` stream are further deduplicated at the hourly level, with counts aggregated. |
| 24 | +- The `distinct_values` streams help accelerate `DISTINCT` queries by using pre-computed distinct values instead of scanning all ingested logs. |
| 25 | +## Example |
| 26 | +Ingested data: |
| 27 | +```json |
| 28 | +2025/09/10T10:00:01Z, job=test, level=info, service=test, request_id=123 |
| 29 | +2025/09/10T10:00:02Z, job=test, level=info, service=test, request_id=124 |
| 30 | +2025/09/10T10:01:03Z, job=test, level=info, service=test, request_id=123 |
| 31 | +2025/09/10T10:10:00Z, job=test, level=info, service=test, request_id=123 |
| 32 | +2025/09/10T11:10:00Z, job=test, level=info, service=test, request_id=123 |
| 33 | +``` |
| 34 | +With `ZO_DISTINCT_VALUES_INTERVAL=10s`, the system first collects values in memory and then writes to disk: |
| 35 | +```yaml |
| 36 | +2025/09/10T10:00:01Z request_id: 123, count: 2 |
| 37 | +2025/09/10T10:00:02Z request_id: 124, count: 1 |
| 38 | +2025/09/10T10:10:02Z request_id: 123, count: 1 |
| 39 | +2025/09/10T11:10:02Z request_id: 123, count: 1 |
| 40 | +``` |
| 41 | +If `ZO_DISTINCT_VALUES_HOURLY=true`, the system merges values by hour: |
| 42 | +```yaml |
| 43 | +2025/09/10T10:00:01Z request_id: 123, count: 3 |
| 44 | +2025/09/10T10:00:02Z request_id: 124, count: 1 |
| 45 | +2025/09/10T11:10:02Z request_id: 123, count: 1 |
| 46 | +``` |
0 commit comments