Sync upstream dev into fork#1
Merged
Merged
Conversation
… step CompactParquetFiles detected CompactionExcludeColumns once, globally, across the union schema of every source file in a group. It then applied that "* EXCLUDE (col)" clause to each pair in the pairwise merge. query_plan_text was added to query_store_stats in migration v13 (2026-02-23). A reporter's archive contains both pre-v13 files (no column) and post-v13 files (column present). The global DESCRIBE saw the column in the newer files, so every merge step ran with "* EXCLUDE (query_plan_text)" — including the steps that merged two pre-v13 files, which fail with: Binder Error: Column "query_plan_text" in EXCLUDE list not found in FROM clause Extract the schema detection into BuildSelectClause(table, paths) and call it per merge-set instead of once globally — with the actual pair in the pairwise path, and with all sources in the small-group path. A pair that doesn't carry an exclude-column now merges with a plain "*". Verified against DuckDB CLI v1.5.2: DESCRIBE of an [old, old] pair correctly omits the column, and "* EXCLUDE (query_plan_text)" on that pair reproduces the reporter's exact Binder Error. Cost is one extra DESCRIBE per merge step — parquet footer reads, not data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nsiently for COPY erikdarlingdata#933's titled complaint is "Memory usage on client": Lite holds ~2.7-2.9 GB after 10 minutes with 4 servers. The compaction OOMs everyone has been chasing in this thread are a downstream symptom — by the time compaction runs the app already holds 2.7 GB, leaving little headroom on the reporter's 16 GB / ~1.6 GB-free machine. Root cause: the main DuckDB ConnectionString set no memory_limit, so the buffer pool ran at the DuckDB default of 80% of system RAM (~12.8 GB on a 16 GB box). With archive parquet files accumulating on disk, every UI query over an archive view caches pages and the buffer pool grows freely. The fix has to navigate one wrinkle: parquet COPY in DuckDB v1.5.2 hits a buffer-manager-bypass pre-reservation that needs ~2-4 GB headroom. Capping the main connection at 1 GB statically would break ExportToParquet and the two COPY paths in ArchiveAllAndResetAsync. So: - ConnectionString: memory_limit=1GB (caps resting buffer pool — addresses the actual complaint by stopping the archive-page cache from growing unbounded). - Around each parquet COPY on the main connection: SET memory_limit='4GB', run the COPY, SET back to '1GB'. Factored into a WithRaisedCopyMemoryLimit helper so the three call sites stay consistent (ExportToParquet, and the two COPYs in ArchiveAllAndResetAsync). - Compaction connections (separate :memory: instances) keep their 4 GB cap from erikdarlingdata#952. Verified against DuckDB CLI v1.5.2 with synthetic query_snapshots-shaped data: - COPY table→parquet at 256MB/512MB/1GB: OOMs (pre-reservation, matches the read_parquet→parquet path we saw in erikdarlingdata#952 testing). - COPY table→parquet at 2GB/4GB: succeeds, peak RSS well under cap. - INSERT (Appender) and SELECT (including GROUP BY across 11k rows) work fine at 256MB cap — confirms collectors and UI queries don't have the pre-reservation behavior and aren't affected by the resting cap. Tradeoff: the resting cap forces buffer-pool eviction of cached archive parquet pages. Long-range historical UI queries that re-scan many parquet files will do more disk I/O. Live/recent-data queries against the hot DB are unaffected (hot DB is small enough to fit in 1 GB easily). Plus the per-merge-step BuildSelectClause from the previous commit fixes the separate query_store_stats Binder Error on archives that span the v13 schema change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…33-compaction-binder-and-adaptive-memory Fix erikdarlingdata#933: cap main DuckDB memory_limit, per-pair compaction exclude detection
Lite has had no internal record of its own memory usage. Bug reporters (see erikdarlingdata#933) had to read it off Task Manager, and we had no historical trace for diagnosing growth patterns. After every collection cycle — which is also after archival and retention run, so it captures the quiescent state — log: Process memory: WS=XXX MB, Private=XXX MB, GC heap=XXX MB WS is Working Set (what Task Manager shows). Private is private bytes (unique to this process, the more honest "actual RAM cost" number). GC heap is .NET-managed memory only — together with WS-Private this splits managed vs native vs shared. One INFO log line per minute. Three property reads — negligible overhead. Errors swallowed at DEBUG level (don't ever break the collection loop because we couldn't read memory stats). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…og-process-memory-per-cycle Log process memory at the end of each collection cycle
e51a5e4 to
9bd7915
Compare
9bd7915 to
36d9d3d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR merges the latest
upstream/devfromerikdarlingdata/PerformanceMonitorinto this fork'smain.Purpose:
Review notes: