Skip to content

Conversation

@nmarasoiu
Copy link

@nmarasoiu nmarasoiu commented Dec 10, 2025

Note: Not sure if this is useful.

In case of Claude Code, using keywords like think, think hard & ultrathink go a long way of setting a thinking budget.
For non Claude thinking models, the http parameters might be different so not sure this would work beyond Anthropic.

Add a new hook that manages Claude's extended thinking budget_tokens parameter:

  • Inject default budget_tokens when thinking is enabled but budget is missing
  • Override budget_tokens when below configurable minimum threshold
  • Ensure budget < max_tokens (API constraint)
  • Optionally inject thinking configuration for thinking-capable models

Key design decisions:

  • Trust caller: if request has thinking, adjust budget regardless of model
  • Model filter only applies to inject_if_missing mode
  • Anthropic-specific: non-Anthropic providers will ignore thinking field
  • Simple config via hook params (no env var complexity)

Configuration:

  • budget_default: Default budget (10000)
  • budget_min: Minimum threshold (1024)
  • inject_if_missing: Auto-inject thinking (false)
  • log_modifications: Log changes (true)

@nmarasoiu nmarasoiu force-pushed the feature/thinking-budget-hook branch from 6b20e94 to 4ffd06a Compare December 10, 2025 10:46
Add a new hook that manages Claude's extended thinking budget_tokens parameter:

- Inject default budget_tokens when thinking is enabled but budget is missing
- Override budget_tokens when below configurable minimum threshold
- Ensure budget < max_tokens (API constraint)
- Optionally inject thinking configuration for thinking-capable models

Key design decisions:
- Trust caller: if request has thinking, adjust budget regardless of model
- Model filter only applies to inject_if_missing mode
- Anthropic-specific: non-Anthropic providers will ignore thinking field
- Simple config via hook params (no env var complexity)

Configuration:
- budget_default: Default budget (10000)
- budget_min: Minimum threshold (1024)
- inject_if_missing: Auto-inject thinking (false)
- log_modifications: Log changes (true)
@nmarasoiu nmarasoiu force-pushed the feature/thinking-budget-hook branch from 4ffd06a to 09bb994 Compare December 10, 2025 10:59
@starbased-co
Copy link
Owner

Thanks for the PR, and another thank you for your patience. After looking through this, I kind of see where you're going with it, but first thing I'd want from you is a more focused value proposition.

Some things to note:

  • LiteLLM provides a unified reasoning support check via supports_reasoning(model)
  • You can configure params for thinking, max tokens, etc. per model in the litellm config
  • LiteLLM already supports some transformations between providers, i.e. gemini and anthropic thinking params are basically interchangeable between each other, reasoning_effort for some is converted to a token budget, other times it's dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants