Observability
Track and optimize runtime behavior, token usage, costs, and context management.
NucleusIQ provides built-in observability primitives that let you inspect what happened during an agent execution — every LLM call, every tool invocation, token counts, estimated dollar costs, and context window management metrics.
New in v0.7.12 — provider-agnostic native-tool observability
Every traced tool call now carries a ToolCallRecord.executed_by: Literal["local", "provider"] field that distinguishes locally-run tools (@tool functions, MCPTool, FileReadTool, …) from server-executed ones (Anthropic web_search, OpenAI code_interpreter, Gemini google_search, Groq compound tools, …).
LLMCallRecord gained six new fields populated by every provider in this release: provider, request_id, organization_id, stop_reason, cache_read_input_tokens, cache_creation_input_tokens, and a generic metadata dict.
All additive — no breaking changes. See Native-tool observability (v0.7.12) below.
ObservabilityConfig
New in v0.7.6
Unified observability configuration that replaces the separate verbose and enable_tracing fields:
from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.agents.config.observability_config import ObservabilityConfig
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq_openai import BaseOpenAI
agent = Agent(
name="analyst",
prompt=ZeroShotPrompt().configure(
system="You are a data analyst.",
),
llm=BaseOpenAI(model_name="gpt-4.1-mini"),
config=AgentConfig(
execution_mode=ExecutionMode.STANDARD,
observability=ObservabilityConfig(
tracing=True,
verbose=True,
log_level="DEBUG",
log_llm_calls=True,
log_tool_results=True,
),
),
)
| Field | Default | Description |
|---|---|---|
tracing |
False |
Enable execution tracing (populates AgentResult.tool_calls, .llm_calls) |
verbose |
False |
Enable debug-level logging |
log_level |
"INFO" |
Logger level string |
log_llm_calls |
False |
Log detailed LLM call info |
log_tool_results |
False |
Log tool execution results |
When observability is set on AgentConfig, it takes precedence over the legacy verbose and enable_tracing fields. Both approaches are backward compatible — use whichever you prefer.
Legacy approach (still works)
config = AgentConfig(
enable_tracing=True,
verbose=True,
)
ExecutionTracer
New in v0.7.4
ExecutionTracer records every LLM call and tool call that occurs during an agent execution. Enable it via AgentConfig:
import asyncio
from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq.tools.decorators import tool
from nucleusiq_openai import BaseOpenAI
@tool
def search(query: str) -> str:
"""Search for information."""
return f"Results for: {query}"
async def main():
agent = Agent(
name="analyst",
prompt=ZeroShotPrompt().configure(
system="You are a research analyst. Use tools to gather data.",
),
llm=BaseOpenAI(model_name="gpt-4.1-mini"),
tools=[search],
config=AgentConfig(
execution_mode=ExecutionMode.STANDARD,
enable_tracing=True,
),
)
result = await agent.execute({"id": "o1", "objective": "Research AI trends"})
# Inspect traced tool calls
for tc in result.tool_calls:
print(f"Tool: {tc['name']}, Duration: {tc['duration_ms']}ms")
# Inspect traced LLM calls
for lc in result.llm_calls:
print(f"LLM call: {lc['purpose']}, Tokens: {lc['usage']}")
print(f"Prompt technique: {lc.get('prompt_technique', 'n/a')}") # v0.7.6
# Warnings
for w in result.warnings:
print(f"Warning: {w}")
asyncio.run(main())
The trace captures:
| Field | Type | Description |
|---|---|---|
result.tool_calls |
tuple[dict, ...] |
Tool invocations with name, arguments, duration, and return value |
result.llm_calls |
tuple[dict, ...] |
LLM calls with purpose, model, token usage, latency, and prompt technique |
result.warnings |
tuple[str, ...] |
Warnings emitted during execution (e.g. retries, fallback behavior) |
Full observability wiring (v0.7.5)
With tracing enabled, AgentResult now captures the complete execution picture:
result = await agent.execute(task)
# Plugin events — every hook with timing
for pe in result.plugin_events:
print(f"Plugin: {pe.hook_name}, Duration: {pe.duration_ms:.1f}ms")
# Memory snapshot — conversation state at execution end
if result.memory_snapshot:
print(f"Messages: {result.memory_snapshot.message_count}")
print(f"Tokens: {result.memory_snapshot.token_count}")
# Autonomous detail (only in AUTONOMOUS mode)
if result.autonomous:
print(f"Sub-tasks: {result.autonomous.sub_task_names}")
for v in result.autonomous.validations:
print(f" Validation: score={v.score}, passed={v.passed}")
| Field | Type | Description |
|---|---|---|
result.plugin_events |
tuple[PluginEvent, ...] |
Every plugin hook fired — hook name, duration, and payload |
result.memory_snapshot |
MemorySnapshot | None |
Conversation messages and token count at execution end |
result.autonomous |
AutonomousDetail | None |
Decomposition, sub-task names, validation records, critic scores |
result.llm_calls[].prompt_technique |
str | None |
Which prompt strategy was used (e.g. zero_shot) |
Native-tool observability (v0.7.12)
New in v0.7.12 — ToolCallRecord.executed_by + six new fields on LLMCallRecord.
Splitting local vs server-executed tools
result = await agent.execute(task)
local = [tc for tc in result.tool_calls if tc.executed_by == "local"]
provider = [tc for tc in result.tool_calls if tc.executed_by == "provider"]
print(f"Local-run tools: {len(local)} (cost: your compute)")
print(f"Provider-run tools: {len(provider)} (cost: LLM tokens / premium)")
| Field | Type | Description |
|---|---|---|
tool_call.executed_by |
Literal["local", "provider"] |
"local" for @tool, MCPTool, built-ins. "provider" for any tool surfaced via server_tool_calls on the LLM response (Anthropic, OpenAI, Gemini, Groq). |
tool_call.tool_name |
str |
Canonical tool name (provider-side suffixes like _call are stripped). |
tool_call.tool_call_id |
str |
Provider-side id (srvtoolu_… for Anthropic, etc.) — useful for cross-system correlation. |
Enriched LLMCallRecord fields
Every traced LLM call now carries provider-agnostic enrichment fields:
for rec in result.llm_calls:
print(
f"round={rec.round} provider={rec.provider} "
f"stop_reason={rec.stop_reason} "
f"prompt={rec.prompt_tokens} "
f"cache_read={rec.cache_read_input_tokens} "
f"cache_create={rec.cache_creation_input_tokens} "
f"request_id={rec.request_id}"
)
| Field | Type | Populated by |
|---|---|---|
provider |
str \| None |
"anthropic", "openai", "google", "groq", "ollama" — set centrally in base_mode.py via get_provider_from_llm() |
request_id |
str \| None |
Provider-side response id (Anthropic message.id, OpenAI response.id, …) — useful for cross-system correlation |
organization_id |
str \| None |
Best-effort header extraction (Anthropic anthropic-organization-id, OpenAI openai-organization) |
stop_reason |
str \| None |
Provider-reported finish reason (end_turn / max_tokens / tool_use / stop / …) |
cache_read_input_tokens |
int |
Anthropic prompt-cache reads; bucketed separately from prompt_tokens |
cache_creation_input_tokens |
int |
Anthropic prompt-cache writes |
metadata |
dict[str, Any] |
Generic bag for provider-specific extras (e.g. Gemini safetyRatings, OpenAI logprobs) |
How it's wired
The core nucleusiq/core/agents/modes/base_mode.py agent loop centrally detects the provider for every LLM call (get_provider_from_llm(agent.llm)) and threads it into build_llm_call_record / build_llm_call_record_from_stream. The same hook pulls server_tool_calls off every LLM response and feeds them through the generic build_server_tool_call_records() helper, so adding a new provider only needs to populate its own server_tool_calls list — observability is automatic.
See also
- Native server tools — how each provider populates
server_tool_calls - Prompt caching —
cache_read_input_tokens,cache_creation_input_tokens - Extended thinking —
stop_reasonreading
Context Telemetry
New in v0.7.6
When context management is configured, AgentResult includes a context_telemetry field:
tel = result.context_telemetry
if tel:
print(f"Peak utilization: {tel.peak_utilization:.1%}")
print(f"Tokens saved: {tel.tokens_masked:,}")
print(f"Estimated savings: ${tel.estimated_cost_savings:.4f}")
See Context management for configuration and telemetry details.
Run-local state metadata (v0.7.8)
Separate from context_telemetry (context window engine), AgentResult.metadata is a dict that may summarize run-local workspace / evidence / corpus state and activation telemetry:
result = await agent.execute(task)
meta = result.metadata or {}
for key in ("workspace", "evidence", "document_search", "phase_control", "context_activation", "synthesis_package"):
if key in meta:
print(f"{key}: {meta[key]}")
| Key | Typical contents |
|---|---|
workspace |
Stats from InMemoryWorkspace (entry counts, limits). |
evidence |
Stats from InMemoryEvidenceDossier. |
document_search |
DocumentSearchStats — chunks indexed, searches, promotions to evidence. |
phase_control |
Phase durations, evidence-gate outcome, flags (synthesis_used_package, critic_used_package, …). |
context_activation |
ContextActivationMetrics counters from ContextStateActivator. |
synthesis_package |
Metadata dict from the last SynthesisPackage built during the run (if any). |
Details: Run-local context state.
Display
AgentResult.display() renders a human-readable summary including all traced fields:
result.display()
--- AgentResult ---
Status: success
Duration: 31,452 ms
Output: The temperature in Tokyo is 15C (59F)...
LLM calls (6):
[main ] gpt-4.1-mini 4,879ms tokens_in=473 tokens_out=18
[tool_loop ] gpt-4.1-mini 4,505ms tokens_in=562 tokens_out=32
[synthesis ] gpt-4.1-mini 8,231ms tokens_in=1204 tokens_out=2048
...
Tool calls (5):
Round 1: google_search success=True 3,306ms
Round 2: unit_converter success=True 0.1ms
...
Context:
Peak utilization: 67.2%
Observations masked: 5
Tokens saved: 12,450
Usage tracking
Token usage is tracked automatically on every execution — no configuration needed. Access aggregated counts — broken down by purpose (main, planning, tool loop, critic, refiner, synthesis) and origin (user vs framework) — via agent.last_usage.
result = await agent.execute(task)
usage = agent.last_usage
print(usage.display())
# Programmatic access
print(f"Total tokens: {usage.total.total_tokens}")
print(f"LLM calls: {usage.call_count}")
# By purpose
for purpose, bucket in usage.by_purpose.items():
print(f" {purpose}: {bucket.total_tokens} tokens, {bucket.call_count} calls")
# Export for logging/dashboards
log_payload = usage.summary()
See Usage tracking for the full schema, programmatic access, and logging examples.
Cost estimation
Convert token usage into dollar costs using the built-in CostTracker, which ships with pricing tables for OpenAI and Gemini models.
from nucleusiq.agents.usage import CostTracker
tracker = CostTracker()
cost = tracker.estimate(agent.last_usage, model="gpt-4.1-mini")
print(f"Estimated cost: ${cost.total_cost:.6f}")
print(f" Prompt: ${cost.prompt_cost:.6f}")
print(f" Completion: ${cost.completion_cost:.6f}")
print(cost.display())
See Cost estimation for built-in pricing tables, custom model registration, and prefix matching.
Zero overhead by default
Tracing is off by default. When enable_tracing is not set (or set to False), result.tool_calls, result.llm_calls, and result.warnings are empty tuples — no runtime overhead. Usage tracking (agent.last_usage) is always available regardless of tracing.
See also
- Context management — Context window management guide
- Usage tracking — Token usage by purpose and origin
- Cost estimation — Dollar cost tracking with built-in pricing tables
- Agent Config guide — ObservabilityConfig configuration