Add thinking-budget support (max_thinking_tokens) for reasoning-capable chat models

### Feature request

A built-in way to cap how many tokens a reasoning model spends inside its ``<think> … </think>`` block. Today, we can only control the total response length via ``max_new_tokens``. No parameter limits the internal reasoning segment when ``enable_thinking=True``.

### Motivation

- Reasoning models (e.g., Qwen3 series) often produce very long thought blocks, which can blow past latency budgets before the final answer starts.
- Users need a simple, model-agnostic control to bound that “thinking” cost without disabling reasoning entirely.
- The Qwen docs (https://qwen.readthedocs.io/en/latest/getting_started/quickstart.html#thinking-budget) already describe a brute-force approach (two-step generation) to implement “thinking budgets”.

### Your contribution

I want to submit a PR that:

- Extends ``GenerationConfig`` with:
``max_thinking_tokens``: integer budget for reasoning tokens.
``begin_thinking_token_id / end_thinking_token_id``: marker IDs so generation knows where the thinking span begins/ends.
- Add a ``MaxThinkingTokensLogitsProcessor`` that watches the active ``<think>`` block. Once the budget is reached, it forces end_thinking_token_id, ensuring the model exits reasoning and continues with the final response.
- Document the new parameter in reasoning-model guides (EXAONE, CWM, etc.) and show how to wire the thinking-token IDs until configs do it automatically.
- Provide unit coverage so ``_get_logits_processor`` injects the new processor whenever the config is fully specified.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add thinking-budget support (max_thinking_tokens) for reasoning-capable chat models #42111

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add thinking-budget support (max_thinking_tokens) for reasoning-capable chat models #42111

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions