apriel2 modeling bug #450

oleksost · 2026-01-16T20:31:48Z

✨ Description

Bugs in the recurrent step path used in generation.

This makes GSM8k generations actually sensical.

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Change A
Change B

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:

🗒️ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.

tscholak

important fixes!

tscholak · 2026-01-18T21:47:28Z

fast_llm_external_models/apriel2/modeling_apriel2.py

+                initial_state=recurrent_state,
+                output_final_state=past_key_values is not None,


great catch!

tscholak · 2026-01-18T21:48:24Z

fast_llm_external_models/apriel2/modeling_apriel2.py

+            4. Update state: S = S + k ⊗ delta
+            5. Output: o = S @ q (scaled)
        """
+        input_dtype = query.dtype


this is the torch implementation, right? under normal circumstances we wouldn't hit this code path because we're using the FLA kernel, no?

The torch fallback is torch_chunk_gated_delta_rule. The _recurrent_gated_delta_rule is used at decoding time in recurrent mode (i.e. after prefill)

there's an FLA kernel for chunk_gated_delta_rule that we can use. I think you saw that too now. thanks for approving!

tscholak · 2026-01-18T22:56:57Z

when I looked at these changes, I realized that it doesn't make sense to have a torch fallback. I decided to just remove that code. The outcome is #451

Co-authored-by: Claude Opus 4.5 <[email protected]>

tscholak

good stuff

apriel2 modeling bug

e8b93e0

oleksost requested a review from tscholak January 16, 2026 20:31

kda fix

303206b

tscholak approved these changes Jan 18, 2026

View reviewed changes

oleksost and others added 2 commits January 19, 2026 14:15

chunked prefil mode recurrent state

b6449db

Fix GDN/KDA bugs, require CUDA kernels, add cache-aware tests (#451)

c3a6b44

Co-authored-by: Claude Opus 4.5 <[email protected]>

oleksost closed this Jan 19, 2026

oleksost reopened this Jan 19, 2026

tscholak approved these changes Jan 19, 2026

View reviewed changes

Merge branch 'main' into oo/apriel_modeling_bug

f6f133b

tscholak merged commit b5cca16 into main Jan 19, 2026
1 of 2 checks passed

tscholak deleted the oo/apriel_modeling_bug branch January 19, 2026 14:50

tscholak added a commit that referenced this pull request Jan 19, 2026

Merge main: apriel2 modeling bug fixes (#450)

5575466

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

apriel2 modeling bug #450

apriel2 modeling bug #450

Uh oh!

oleksost commented Jan 16, 2026

Uh oh!

tscholak left a comment

Uh oh!

tscholak Jan 18, 2026

Uh oh!

tscholak Jan 18, 2026

Uh oh!

oleksost Jan 19, 2026

Uh oh!

tscholak Jan 19, 2026

Uh oh!

tscholak commented Jan 18, 2026

Uh oh!

tscholak left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		initial_state=recurrent_state,
		output_final_state=past_key_values is not None,

apriel2 modeling bug #450

apriel2 modeling bug #450

Uh oh!

Conversation

oleksost commented Jan 16, 2026

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

Uh oh!

tscholak left a comment

Choose a reason for hiding this comment

Uh oh!

tscholak Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

tscholak Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

oleksost Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

tscholak Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

tscholak commented Jan 18, 2026

Uh oh!

tscholak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants