Skip to content

Conversation

@xiangmy21
Copy link
Contributor

feat: show timeseries [order by timseries] clause

Support ORDER BY TIMESERIES in SHOW TIMESERIES and Optimize OFFSET with Subtree Measurement Statistics


Brief

This PR introduces ordering support for SHOW TIMESERIES by measurement name (lexicographical) and optimizes the performance of the OFFSET clause using subtree measurement statistics.

Currently, SHOW TIMESERIES supports the ORDER BY TIMESERIES clause.
When the query scope is within a single database, OFFSET X no longer requires traversing the first X measurements one by one. Instead, it skips subtrees in the metadata tree based on subtree measurement counts.

Example:

IoTDB> show timeseries root.dbtest.** order by timeseries desc limit 5 offset 5001
+--------------------------------+-----+-----------+--------+--------+-----------+----+----------+--------+------------------+--------+
|                      Timeseries|Alias|   Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType|
+--------------------------------+-----+-----------+--------+--------+-----------+----+----------+--------+------------------+--------+
|root.dbtest.plant9.device84.m188| null|root.dbtest|   INT32|     RLE|     SNAPPY|null|      null|    null|              null|    BASE|
|root.dbtest.plant9.device84.m187| null|root.dbtest|   INT32|     RLE|     SNAPPY|null|      null|    null|              null|    BASE|
|root.dbtest.plant9.device84.m186| null|root.dbtest|   INT32|     RLE|     SNAPPY|null|      null|    null|              null|    BASE|
|root.dbtest.plant9.device84.m185| null|root.dbtest|   INT32|     RLE|     SNAPPY|null|      null|    null|              null|    BASE|
|root.dbtest.plant9.device84.m184| null|root.dbtest|   INT32|     RLE|     SNAPPY|null|      null|    null|              null|    BASE|
+--------------------------------+-----+-----------+--------+--------+-----------+----+----------+--------+------------------+--------+
  • DESC: descending order
  • ASC: ascending order (default if not specified)

Design & Implementation Details

1. Syntax Extension

  • Updated the grammar of SHOW TIMESERIES to support orderByTimeseriesClause.
  • Added ordering-related fields to ShowTimeSeriesStatement to record sorting requirements.

2. Logical Plan Construction

  • Introduced a SortNode at the top of the logical plan to enable global ordering across partitions.
  • For single-partition queries, LIMIT and OFFSET are pushed down to the metadata tree traversal stage in LogicalPlanVisitor.

3. Limit/Offset Handling in Multi-Partition Queries

  • For multi-partition queries, OFFSET cannot be pushed down.
    Instead, limit' = limit + offset is used.
  • Fixed a bug in the previous implementation:
    when limit = 0 (which is treated as unlimited in IoTDB), limit' was incorrectly pushed down, causing unexpected truncation of results.

4. Ordered Traversal in Metadata Tree

  • In MTreeBelowSGMemoryImpl, the iteration strategy of child nodes in schemaReader is overridden under ordering mode.
  • Child nodes are traversed in lexicographical order by name.

5. Subtree Measurement Count

  • Implemented subtree measurement statistics (only for in-memory metadata tree):

    • Added subtreeMeasurementCount to IMemMNode and related classes.
    • Maintained subtreeMeasurementCount along ancestor paths during measurement insertion and deletion.
    • This value is not persisted; it is initialized via a DFS during metadata tree loading.
  • During schemaReader construction, acceptFullMatchedNode is overridden to skip subtree traversal when subtreeMeasurementCount < offset.


Tests

Functional Tests

  • Added IoTDBShowTimeseriesOrderByTimeseriesIT to verify correctness of ordering and offset behavior.

Performance Evaluation

  • Evaluated performance on a single-partition dataset with a typical three-level hierarchy:
    root.plantA.deviceB.measurementC
    
    with approximately 900,000 measurements.
  • Query pattern:
    SHOW TIMESERIES root.dbtest.** order by timeseries LIMIT 100 OFFSET X;
    where X ranges from 0 to 900,000 with a step of 5,000.
  • Compared the original implementation and the optimized version.
  • Results are shown in the figure below:
image
  • Detailed benchmark methodology and results are available at: [link].

Impact

  • Enables deterministic ordering for SHOW TIMESERIES.
  • Significantly reduces traversal cost for large OFFSET values in single-partition scenarios $O(limit+offset)\to O(limit+TreeHeight)$.
  • Improves correctness of limit/offset handling in multi-partition queries.
  • Introduces minimal overhead to metadata maintenance (in-memory only).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds ORDER BY TIMESERIES [ASC|DESC] support to SHOW TIMESERIES and introduces an in-memory metadata-tree optimization to speed up large OFFSET values by pruning whole subtrees using cached subtree measurement counts.

Changes:

  • Extends SQL grammar + statement model to represent ORDER BY TIMESERIES (ASC/DESC).
  • Updates planning/execution to support ordering (global sort in multi-region; ordered traversal pushdown in single region) and optimizes OFFSET using subtree measurement statistics.
  • Adds/updates schema-region read plan plumbing and introduces an integration test for ordering behavior.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
iotdb-core/antlr/src/main/antlr4/org/apache/iotdb/db/qp/sql/IoTDBSqlParser.g4 Adds orderByTimeseriesClause to SHOW TIMESERIES grammar.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/parser/ASTVisitor.java Parses ORDER BY TIMESERIES and enforces semantic constraints.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/statement/metadata/ShowTimeSeriesStatement.java Stores order-by-timeseries flags and ordering direction.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/statement/metadata/ShowStatement.java Adds getLimitWithOffset() helper for multi-region limit/offset handling.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/LogicalPlanVisitor.java Wires ordering and limit/offset handling into logical plan; adds global SortNode when needed.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/LogicalPlanBuilder.java Extends schema source planning to pass ordering flags; adjusts limit handling.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/metadata/read/TimeSeriesSchemaScanNode.java Persists order-by-timeseries flags through plan-node serialization.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/OperatorTreeGenerator.java Passes ordering flags down into schema source creation.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/distribution/SimpleFragmentParallelPlanner.java Ensures type provider generation for order-by-timeseries scenarios.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/schema/source/SchemaSourceFactory.java Threads ordering flags into TimeSeriesSchemaSource creation.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/schema/source/TimeSeriesSchemaSource.java Threads ordering flags into schema-region read plans.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/schema/source/LogicalViewSchemaSource.java Updates show-timeseries plan invocation signature.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/read/req/IShowTimeSeriesPlan.java Adds region-level order-by-timeseries flags.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/read/req/SchemaRegionReadPlanFactory.java Extends plan factory API for ordering flags.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/read/req/impl/ShowTimeSeriesPlanImpl.java Implements the new ordering-flag accessors and equality updates.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/mtree/impl/mem/mnode/IMemMNode.java Adds subtree measurement count getters/setters to mem-nodes.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/mtree/impl/mem/mnode/basic/BasicMNode.java Adds cached subtreeMeasurementCount field and size estimation adjustment.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/mtree/impl/mem/mnode/impl/{AboveDatabaseMNode,DatabaseMNode,MeasurementMNode}.java Delegates subtree measurement count storage to BasicMNode.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/mtree/impl/mem/MTreeBelowSGMemoryImpl.java Maintains subtree counts, rebuilds on snapshot load, prunes traversal for OFFSET, and sorts children for ordered traversal.
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/queryengine/execution/operator/schema/SchemaQueryScanOperatorTest.java Updates test wiring for new plan parameters.
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/metadata/schemaRegion/SchemaRegionTestUtil.java Updates test plan factory calls with new params.
integration-test/src/test/java/org/apache/iotdb/db/it/schema/IoTDBShowTimeseriesOrderByTimeseriesIT.java Adds IT coverage for ordering + offset/limit and conflict semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1482 to 1507
private long remainingOffset = Math.max(0, showTimeSeriesPlan.getOffset());

private boolean shouldPruneSubtree(final IMemMNode node) {
if (remainingOffset <= 0) {
return false;
}
final long subtreeCount = node.getSubtreeMeasurementCount();
if (subtreeCount <= remainingOffset) {
remainingOffset -= subtreeCount;
return true;
}
return false;
}

@Override
protected boolean acceptFullMatchedNode(final IMemMNode node) {
if (!node.isMeasurement()) {
return false;
}
if (remainingOffset > 0) {
// skip this measurement
remainingOffset--;
return false;
}
return true;
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In single-schema-region queries, OFFSET is now consumed inside MeasurementCollector (remainingOffset/shouldPruneSubtree/acceptFullMatchedNode) before TimeseriesReaderWithViewFetch applies schemaFilter. This changes semantics when a timeseries WHERE clause is present: OFFSET should apply to the filtered result set, not to the pre-filter traversal order. Consider disabling the offset-pruning pushdown when showTimeSeriesPlan.getSchemaFilter() is non-null (keep SchemaReaderLimitOffsetWrapper offset), or push schemaFilter evaluation into the collector so remainingOffset is decremented/pruned only for matched results.

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +209
final Template template = ClusterTemplateManager.getInstance().getTemplate(templateId);
return template == null ? 0L : template.getMeasurementNumber();
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtreeMeasurementCount is adjusted on template activation/deactivation using the current template measurement count (getTemplateMeasurementCount). Since templates can be extended at runtime (see ClusterSchemaManager.extendSchemaTemplate), the measurement count can change after activation; later deactivation will subtract a different value than was added, and the cached subtree counts can become stale or even negative. This can break OFFSET pruning correctness. Consider recording the activated template measurement count per device (so add/subtract are consistent), and/or updating subtreeMeasurementCount for all affected paths when a template is extended, or skipping subtree-count pruning for nodes affected by templates.

Suggested change
final Template template = ClusterTemplateManager.getInstance().getTemplate(templateId);
return template == null ? 0L : template.getMeasurementNumber();
// NOTE:
// Templates can be extended at runtime, which changes their measurement count.
// Using the current measurement count here for both activation and deactivation
// can cause inconsistent deltas on subtreeMeasurementCount (e.g., subtracting
// more/less than was previously added), leading to stale or even negative
// subtree counts and incorrect OFFSET pruning.
//
// To avoid this correctness issue without per-device activation-time tracking,
// we deliberately do *not* adjust subtreeMeasurementCount based on template
// measurements and return 0 here. This effectively disables subtree-count
// pruning for template-driven paths but preserves correctness.
return 0L;

Copilot uses AI. Check for mistakes.
Comment on lines +1160 to +1161
while (iterator.hasNext()) {
count += rebuildSubtreeMeasurementCountFromNode(iterator.next());
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebuildSubtreeMeasurementCountFromNode() obtains an IMNodeIterator from store.getChildrenIterator(node) but never calls iterator.close(). Today the in-memory iterator close is mostly a no-op, but other IMNodeIterator implementations can hold resources; closing in a finally block would make this method safer and consistent with other iterator usages.

Suggested change
while (iterator.hasNext()) {
count += rebuildSubtreeMeasurementCountFromNode(iterator.next());
try {
while (iterator.hasNext()) {
count += rebuildSubtreeMeasurementCountFromNode(iterator.next());
}
} finally {
iterator.close();

Copilot uses AI. Check for mistakes.
private IMemMNode parent;
private final BasicMNodeInfo basicMNodeInfo;

/** Cached count of measurements in this node's subtree, restored on restart. */
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new subtreeMeasurementCount field is described as "restored on restart", but this value is not persisted and is rebuilt in memory (e.g., via rebuildSubtreeMeasurementCount()). Consider rewording to avoid implying persistence (e.g., "rebuilt on restart" / "recomputed on load").

Suggested change
/** Cached count of measurements in this node's subtree, restored on restart. */
/** Cached count of measurements in this node's subtree, rebuilt on restart. */

Copilot uses AI. Check for mistakes.
public long getLimitWithOffset() {
if (limit <= 0) {
return limit;
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLimitWithOffset() returns limit + offset without overflow handling. With large LIMIT/OFFSET literals this can overflow to a negative value and lead to incorrect planning. Consider using Math.addExact with a friendly error, or clamping to Long.MAX_VALUE when the sum would overflow.

Suggested change
}
}
// Guard against overflow when adding limit and offset.
if (offset > 0 && limit > Long.MAX_VALUE - offset) {
// Clamp to Long.MAX_VALUE to avoid negative or incorrect values due to overflow.
return Long.MAX_VALUE;
}

Copilot uses AI. Check for mistakes.
Comment on lines 109 to 188
@Test
public void testOrderAscWithoutLimit() throws Exception {
prepareComplexSchema();
List<String> expected = new ArrayList<>(BASE_TIMESERIES);
Collections.sort(expected);

List<String> actual = queryTimeseries("show timeseries root.db*.** order by timeseries");
assertEquals(expected, actual);
}

@Test
public void testOrderDescWithOffsetLimit() throws Exception {
prepareComplexSchema();
List<String> expected = new ArrayList<>(BASE_TIMESERIES);
Collections.sort(expected);
Collections.reverse(expected);
expected = expected.subList(2, 6); // offset 2 limit 4

List<String> actual =
queryTimeseries("show timeseries root.db*.** order by timeseries desc offset 2 limit 4");
assertEquals(expected, actual);
}

@Test
public void testInsertThenQueryOrder() throws Exception {
prepareComplexSchema();
try (Connection connection = EnvFactory.getEnv().getConnection();
Statement statement = connection.createStatement()) {
statement.execute(
"create timeseries root.db0.devX.a with datatype=INT32, encoding=RLE, compression=SNAPPY");
}

List<String> expected = new ArrayList<>(BASE_TIMESERIES);
expected.add("root.db0.devX.a");
Collections.sort(expected);

List<String> actual = queryTimeseries("show timeseries root.db*.** order by timeseries");
assertEquals(expected, actual);
}

@Test
public void testDeleteSubtreeThenQueryOrder() throws Exception {
prepareComplexSchema();
try (Connection connection = EnvFactory.getEnv().getConnection();
Statement statement = connection.createStatement()) {
statement.execute("delete timeseries root.db2.devC.**");
}

List<String> expected = new ArrayList<>(BASE_TIMESERIES);
expected.remove("root.db2.devC.m0");
expected.remove("root.db2.devC.m3");
Collections.sort(expected);

List<String> actual = queryTimeseries("show timeseries root.db*.** order by timeseries");
assertEquals(expected, actual);
}

@Test
public void testOffsetLimitAfterDeletesAndAdds() throws Exception {
prepareComplexSchema();
try (Connection connection = EnvFactory.getEnv().getConnection();
Statement statement = connection.createStatement()) {
statement.execute("delete timeseries root.db1.devB.x");
statement.execute(
"create timeseries root.db1.devC.m0 with datatype=INT32, encoding=RLE, compression=SNAPPY");
statement.execute(
"create timeseries root.db4.devZ.z with datatype=INT32, encoding=RLE, compression=SNAPPY");
}

List<String> expected = new ArrayList<>(BASE_TIMESERIES);
expected.remove("root.db1.devB.x");
expected.add("root.db1.devC.m0");
expected.add("root.db4.devZ.z");
Collections.sort(expected);
expected = expected.subList(5, 10); // offset 5 limit 5

List<String> actual =
queryTimeseries("show timeseries root.db*.** order by timeseries offset 5 limit 5");
assertEquals(expected, actual);
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests validate basic ordering/offset/limit behavior, but they don't cover the interaction of ORDER BY TIMESERIES with a timeseries WHERE clause + OFFSET (schemaFilter). Given offset is now pushed down in single-region queries, adding a case like "SHOW TIMESERIES ... WHERE timeseries contains ... ORDER BY TIMESERIES OFFSET ..." would help prevent regressions in offset semantics under filtering.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant