Skip to content

fix: add last active column to step concurrency table#4286

Open
mnafees wants to merge 13 commits into
mainfrom
nafees/step-concurrency-last-active
Open

fix: add last active column to step concurrency table#4286
mnafees wants to merge 13 commits into
mainfrom
nafees/step-concurrency-last-active

Conversation

@mnafees

@mnafees mnafees commented Jun 26, 2026

Copy link
Copy Markdown
Member

Description

This PR adds a new last_active column to v1_step_concurrency to mirror v1_queue

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

Changes have been:

  • Tested (unit, integration, or manually with steps specified)
  • Linted and formatted
  • Documented (where applicable)

🤖 AI Disclosure
  • I acknowledge that an LLM was used in the creation of this Pull Request, in accordance with Hatchet's AI_POLICY.md.
  • Details: [e.g. generating tests, writing docs]

@mnafees mnafees requested review from abelanger5 and mrkaye97 June 26, 2026 11:05
@mnafees mnafees self-assigned this Jun 26, 2026
@vercel

vercel Bot commented Jun 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hatchet-docs Ready Ready Preview, Comment Jun 29, 2026 9:14am

Request Review

@github-actions github-actions Bot added the engine Related to the core Hatchet engine label Jun 26, 2026
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Benchmark results

goos: linux
goarch: amd64
pkg: github.com/hatchet-dev/hatchet/pkg/scheduling/v1
cpu: AMD Ryzen 9 7950X3D 16-Core Processor          
              │ /tmp/old.txt │         /tmp/new.txt          │
              │    sec/op    │    sec/op     vs base         │
RateLimiter-8   49.14µ ± 15%   49.35µ ± 38%  ~ (p=0.589 n=6)

              │ /tmp/old.txt │         /tmp/new.txt          │
              │     B/op     │     B/op      vs base         │
RateLimiter-8   137.7Ki ± 0%   137.7Ki ± 0%  ~ (p=0.660 n=6)

              │ /tmp/old.txt │          /tmp/new.txt          │
              │  allocs/op   │  allocs/op   vs base           │
RateLimiter-8    1.022k ± 0%   1.022k ± 0%  ~ (p=1.000 n=6) ¹
¹ all samples are equal

Compared against main (f34b10a)

v1_step_concurrency strategy ON strategy.workflow_id = cs.workflow_id AND strategy.workflow_version_id = cs.workflow_version_id AND strategy.id = cs.strategy_id
WHERE
strategy.is_active = FALSE
OR strategy.last_active < NOW() - INTERVAL '1 hour'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also want to update the v1_step_concurrency when the last_active has been stale for more than an hour - hence a very hot strat will only update the last_active every hour rather than every time.

@mnafees mnafees marked this pull request as ready for review June 26, 2026 12:29
Comment thread cmd/hatchet-migrate/migrate/migrations/20260626125722_v1_0_119.sql Outdated
Comment thread pkg/repository/sqlcv1/concurrency.sql Outdated
Base automatically changed from nafees/cold-start-concurrency-lease to main June 26, 2026 19:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a last_active_at timestamp on v1_step_concurrency to better track recent strategy activity (mirroring v1_queue) and uses it to gate stale strategy deactivation, reducing unnecessary cold-start churn.

Changes:

  • Add last_active_at to v1_step_concurrency (schema + migration) and include it in sqlc-generated models/queries.
  • Update the concurrency-slot insert trigger to refresh last_active_at when strategies are reactivated / periodically (throttled) on activity.
  • Re-enable the periodic “deactivate stale step concurrency” controller operation and update the deactivation query to only deactivate strategies inactive for >1 day.

Reviewed changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sql/schema/v1-core.sql Adds last_active_at to v1_step_concurrency and refreshes it in the slot-insert trigger logic.
cmd/hatchet-migrate/migrate/migrations/20260629110253_v1_0_120.sql Migration to add last_active_at and update the trigger function for existing installs.
pkg/repository/sqlcv1/models.go Extends V1StepConcurrency with LastActiveAt.
pkg/repository/sqlcv1/workflows.sql.go Updates RETURNING columns to include last_active_at for step concurrency creation flows.
pkg/repository/sqlcv1/concurrency.sql Adds last_active_at gating to stale deactivation query.
pkg/repository/sqlcv1/concurrency.sql.go Regenerated sqlc output reflecting the updated stale deactivation and selects.
internal/services/controllers/task/controller.go Re-enables the periodic stale step-concurrency deactivation operation pool.
internal/services/controllers/task/deactivate_stale_step_concurrency.go Implements the operation method that invokes the repository deactivation query.
pkg/scheduling/v1/tenant_manager.go Threads ctx into concurrency-strategy updates (currently unused in the methods).
pkg/scheduling/v1/concurrency.go Minor formatting-only change.
Files not reviewed (3)
  • pkg/repository/sqlcv1/concurrency.sql.go: Generated file
  • pkg/repository/sqlcv1/models.go: Generated file
  • pkg/repository/sqlcv1/workflows.sql.go: Generated file
Comments suppressed due to low confidence (1)

sql/schema/v1-core.sql:913

  • The trigger’s inactive_strategies CTE can include the same v1_step_concurrency row multiple times when a single INSERT statement creates multiple slots for the same strategy (new_table has duplicates). Deduplicating before FOR UPDATE avoids extra join work/locking overhead and keeps the UPDATE deterministic.
    WITH inactive_strategies AS (
        SELECT
            strategy.*
        FROM
            new_table cs

Comment thread sql/schema/v1-core.sql
Comment on lines +240 to +244
-- last_active_at is refreshed whenever a slot is created for this strategy. The stale-deactivation
-- sweep only deactivates a strategy once it has had no slots AND last_active_at is over a day old, so
-- a strategy used periodically is never deactivated (and so never pays a cold start). Mirrors how
-- v1_queue.last_active gates the 1-day queue activity window.
last_active_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
Comment on lines +63 to +76
WITH inactive_strategies AS (
SELECT
strategy.*
FROM
new_table cs
JOIN
v1_step_concurrency strategy ON strategy.workflow_id = cs.workflow_id AND strategy.workflow_version_id = cs.workflow_version_id AND strategy.id = cs.strategy_id
WHERE
strategy.is_active = FALSE
OR strategy.last_active_at < NOW() - INTERVAL '1 hour'
ORDER BY
strategy.id
FOR UPDATE
)
Comment on lines 802 to 806
FROM v1_step_concurrency sc
WHERE sc.tenant_id = @tenantId::UUID
AND sc.is_active = TRUE
AND sc.last_active_at < NOW() - INTERVAL '1 day'
AND NOT EXISTS (
Comment on lines 803 to 806
WHERE sc.tenant_id = @tenantId::UUID
AND sc.is_active = TRUE
AND sc.last_active_at < NOW() - INTERVAL '1 day'
AND NOT EXISTS (
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Optional test failure: The load-deadlock job failed on this PR (optimistic-scheduling=true). This check is non-mandatory and does not block merging, but may be worth investigating. View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine Related to the core Hatchet engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants