Skip to content

Extended thinking (Anthropic)

What this page covers

Extended thinking gives Claude a dedicated token budget to reason internally before generating its final response. Reasoning tokens are billed separately and don't count toward max_output_tokens, but max_output_tokens must still be strictly greater than the thinking budget.

NucleusIQ exposes this through a single ergonomic knob on AnthropicLLMParams โ€” thinking="low" | "medium" | "high" | "max" โ€” resolved to a concrete {"type": "enabled", "budget_tokens": N} payload at wire time.

When to use it

Extended thinking pays off whenever the task benefits from deliberate, multi-step reasoning rather than a single forward pass:

  • ๐Ÿงฎ Math / logic / algorithmic problems where one-shot guessing fails.
  • ๐Ÿงช Code review / debugging where the model needs to enumerate cases.
  • ๐Ÿ“Š Multi-document analysis with several constraints to reconcile.
  • ๐Ÿง  Planning / autonomous agents that decompose tasks.

How to enable it

from nucleusiq.llms.llm_params import LLMParams
from nucleusiq_anthropic import AnthropicLLMParams, BaseAnthropic

llm = BaseAnthropic(
    model_name="claude-sonnet-4-5-20250929",
    async_mode=True,
    llm_params=AnthropicLLMParams(thinking="medium"),
)

# Sampling โ€” temperature MUST be 1.0 and max_output_tokens MUST exceed the budget.
sampling = LLMParams(temperature=1.0, max_output_tokens=16_384)

thinking accepts three forms

Form What it produces
bool True โ†’ minimal thinking enabled; False โ†’ disabled
"low" \| "medium" \| "high" \| "max" (ThinkingEffort) Resolved via _THINKING_EFFORT_BUDGETS
dict Passthrough โ€” full control of {"type": "...", "budget_tokens": ...}
Effort Budget tokens
"low" 2 000
"medium" 8 000
"high" 32 000
"max" 64 000

Hard constraints

Anthropic enforces these โ€” 400 errors otherwise

  • temperature MUST be 1.0. Any other value returns 400 invalid_request_error.
  • max_output_tokens MUST be strictly greater than thinking.budget_tokens. A medium budget of 8 000 needs max_output_tokens >= 16_384 to leave room for the final answer.
  • Model support โ€” extended thinking requires Claude Sonnet 4 / Opus 4 / 3.7-Sonnet or newer. Older SKUs return 400 model_not_supported.

Reading the result

AnthropicLLMResponse.stop_reason and the framework's LLMCallRecord.stop_reason reflect whether the model finished cleanly:

result = await agent.execute(task)
for rec in result.llm_calls:
    print(
        f"round={rec.round}  stop_reason={rec.stop_reason}  "
        f"prompt={rec.prompt_tokens}  completion={rec.completion_tokens}"
    )

Common stop_reason values with thinking enabled:

  • end_turn โ€” natural completion within the budget.
  • max_tokens โ€” the model hit max_output_tokens before finishing (raise the cap).
  • tool_use โ€” model paused to call a (server or local) tool.

Streaming with thinking

When you stream a thinking call, NucleusIQ emits StreamEventType.THINKING events for the reasoning trace and the regular TOKEN events for the final answer. The terminal COMPLETE event still carries stop_reason, request_id, and cache token splits in event.metadata.

async for ev in agent.execute_stream(task):
    if ev.type is StreamEventType.THINKING:
        print(".", end="", flush=True)
    elif ev.type is StreamEventType.TOKEN:
        print(ev.content, end="", flush=True)
    elif ev.type is StreamEventType.COMPLETE:
        print("\n--- done:", ev.metadata.get("stop_reason"))

Live demo

examples/agents/12_anthropic_extended_thinking.py in the monorepo runs the same probability problem (3 red / 5 blue / 7 green marbles โ†’ P(both same colour)) under thinking="low" and thinking="medium". Verified live against claude-sonnet-4-5-20250929 during the v0.7.12 release; both efforts return 34/105.

Live integration tests

@pytest.mark.asyncio
@pytest.mark.parametrize("effort", ["low", "medium"])
async def test_live_extended_thinking_completes(effort: str) -> None:
    llm = BaseAnthropic(
        model_name="claude-sonnet-4-5-20250929",
        async_mode=True,
        llm_params=AnthropicLLMParams(thinking=effort),
    )
    result = await llm.call(
        model="claude-sonnet-4-5-20250929",
        messages=[{"role": "user", "content": "...hard problem..."}],
        max_output_tokens=16_384,   # must exceed budget
        temperature=1.0,            # required
    )
    assert result.stop_reason
    assert (result.choices[0].message.content or "").strip()

Lives at src/providers/llms/anthropic/tests/integration/test_anthropic_phase_b_live.py::test_live_extended_thinking_completes.

See also