OpenAI Provider — User-Facing Design Guide
How NucleusIQ keeps Agent creation simple while giving power users full control over 40+ OpenAI parameters.
Last updated: 10 March 2026
Implementation Status
| Feature | Status | Files |
|---|---|---|
| Three-layer design (Agent / LLM Config / Direct) | DONE | agent.py, base.py |
Type-safe LLMParams base class |
DONE | core/llms/llm_params.py |
Type-safe OpenAILLMParams subclass |
DONE | nucleusiq_openai/llm_params.py |
AgentConfig.llm_params (typed field) |
DONE | agent_config.py |
Agent.execute(llm_params=...) per-task overrides |
DONE | agent.py |
| Merge chain (LLM < AgentConfig < per-execute) | DONE | agent.py -> _resolve_llm_params() |
All 5 llm.call() sites wired with overrides |
DONE | agent.py |
All Chat Completions + Responses API params in call() |
DONE | base.py |
| Multimodal attachments in task | DONE | Framework-level Task.attachments (7 types) + OpenAI native processing (v0.4.0) |
Streaming via execute_stream() |
DONE | agent.py, base_mode.py, stream_adapters.py |
Streaming via call_stream() |
DONE | base_llm.py, nb_openai/base.py |
| Usage / token telemetry | DONE | _shared/response_models.py |
| Platform APIs (Embeddings, Audio, Files, Realtime, Fine-tuning, Batch) | NOT YET | See full matrix in gap analysis |
Readiness Snapshot
What is solid today
- Text + tool orchestration with Chat Completions and Responses API
- Native tool routing (
web_search,code_interpreter,file_search,image_generation,mcp,computer_use) - Structured output parsing and typed internal message models
- Type-safe parameter merge chain (
BaseOpenAIdefaults <AgentConfig.llm_params<execute(..., llm_params=...)) - End-to-end streaming (all 3 execution modes, both OpenAI backends)
- Usage telemetry (prompt tokens, completion tokens, reasoning tokens)
What shipped in v0.4.0
- Framework-level
Task.attachmentsfor multimodal inputs (7 attachment types) - OpenAI native attachment processing (server-side PDF/XLSX/CSV)
- 4 built-in file tools (FileReadTool, FileSearchTool, DirectoryListTool, FileExtractTool)
- AttachmentGuardPlugin for policy-based attachment validation
- File-aware memory (metadata in all 5 strategies)
- UsageTracker with CallPurpose enum
What is intentionally out of scope (for now)
- Dedicated Embeddings API class
- Audio endpoint clients (STT/TTS/translations)
- Files and Vector Store management clients
- Realtime, Batch, Fine-tuning, Moderation endpoint wrappers
The Problem
OpenAI's Chat Completions API has 25+ parameters. The Responses API has 20+ parameters. Exposing all of them to someone who just wants to create an Agent would be overwhelming and a terrible user experience.
The Solution: Three Layers
NucleusIQ uses a layered design where each layer adds more control:
Layer 1: Agent (Simple) — 90% of users stop here
Layer 2: LLM Config (Medium) — Power users who need tuning
Layer 3: Direct API (Advanced) — Full control, raw access
How It Works Today (Inside the Agent)
When a user calls agent.execute(task), here is exactly what happens internally:
User
│
▼
agent.execute({"id": "q1", "objective": "What is GDP of Japan?"})
│
├─ 1. Agent._build_messages(task)
│ → Combines prompt.system + prompt.user + task.objective
│ → Returns: [{"role": "system", ...}, {"role": "user", ...}]
│
├─ 2. Agent converts tools → tool_specs
│ → llm.convert_tool_specs(self.tools)
│
├─ 3. Agent calls llm.call() ◄── THIS IS WHERE PARAMS GO
│ │
│ │ call_kwargs = {
│ │ "model": llm.model_name,
│ │ "messages": messages,
│ │ "tools": tool_specs,
│ │ "max_tokens": config.llm_max_tokens,
│ │ }
│ │ response = await llm.call(**call_kwargs)
│ │
│ ▼
│ BaseOpenAI.call() uses DEFAULTS from __init__:
│ temperature → self.temperature (0.7)
│ top_p → 1.0 (hardcoded default)
│ freq_penalty → 0.0 (hardcoded default)
│ stream → False (hardcoded default)
│
├─ 4. If LLM returns tool_calls → Executor runs tools → loop back to step 3
│
└─ 5. Return final text response
Design: How Configuration Flows (IMPLEMENTED)
The Configuration Merge Chain
Priority (highest wins):
┌─────────────────────────────────────────────┐
│ 3. Per-execute overrides (task-level) │ ← Highest priority
│ agent.execute(task, llm_params=OpenAILLMParams(...)) │
├─────────────────────────────────────────────┤
│ 2. AgentConfig.llm_params (agent-level) │ ← Agent-wide defaults
│ AgentConfig(llm_params=OpenAILLMParams(...)) │
├─────────────────────────────────────────────┤
│ 1. BaseOpenAI.__init__ (LLM-level) │ ← Global defaults
│ BaseOpenAI(temperature=0.3, seed=42) │
└─────────────────────────────────────────────┘
Each layer merges with the one below it. Per-execute overrides beat AgentConfig, which beats LLM defaults.
Concrete Examples: Every User Level
Example 1: Beginner — "Just make it work" (5 lines) — WORKS TODAY
from nucleusiq.agents import Agent
from nucleusiq.prompts.factory import PromptFactory, PromptTechnique
from nucleusiq_openai import BaseOpenAI, OpenAITool
llm = BaseOpenAI(model_name="gpt-4o")
agent = Agent(
name="Researcher",
role="Research Assistant",
objective="Find information using web search.",
llm=llm,
prompt=PromptFactory.create_prompt(PromptTechnique.ZERO_SHOT).configure(
system="You are a helpful assistant.",
user="Answer questions accurately.",
),
tools=[OpenAITool.web_search()],
)
result = await agent.execute({"id": "q1", "objective": "What is the GDP of Japan?"})
What happens behind the scenes:
- temperature=0.7, max_tokens=2048, top_p=1.0 (all defaults)
- Auto-routes to Responses API (because web_search is native)
- Retries on rate limits
- User sees none of this
Example 2: Power User — "I need cost control + determinism" — WORKS TODAY
llm = BaseOpenAI(
model_name="o3",
temperature=0.2,
seed=42,
service_tier="flex",
reasoning_effort="low",
)
agent = Agent(
name="CostEfficientBot",
role="Analyst",
objective="Analyze data efficiently.",
llm=llm,
prompt=prompt,
tools=[OpenAITool.code_interpreter()],
)
result = await agent.execute({"id": "q1", "objective": "Calculate quarterly revenue."})
Example 3: Different Settings Per Agent (Same LLM, Different Config) — WORKS TODAY
llm = BaseOpenAI(model_name="gpt-4o", temperature=0.5)
from nucleusiq_openai import OpenAILLMParams
# Agent A: Creative writing — override temperature higher
agent_writer = Agent(
name="Writer",
role="Creative Writer",
objective="Write creative content.",
llm=llm,
prompt=writer_prompt,
config=AgentConfig(
llm_max_tokens=4096,
llm_params=OpenAILLMParams(
temperature=0.9,
presence_penalty=0.6,
),
),
)
# Agent B: Data analysis — override temperature lower
agent_analyst = Agent(
name="Analyst",
role="Data Analyst",
objective="Analyze data precisely.",
llm=llm,
prompt=analyst_prompt,
tools=[OpenAITool.code_interpreter()],
config=AgentConfig(
llm_max_tokens=2048,
llm_params=OpenAILLMParams(
temperature=0.1,
seed=42,
),
),
)
# Writer uses temp=0.9, Analyst uses temp=0.1 — same underlying LLM
await agent_writer.execute({"id": "w1", "objective": "Write a poem about AI."})
await agent_analyst.execute({"id": "a1", "objective": "Sum column B of this data."})
Example 4: Per-Task Overrides — "This specific task needs more reasoning" — WORKS TODAY
agent = Agent(
name="FlexBot",
llm=BaseOpenAI(model_name="o3", reasoning_effort="low"),
...
)
# Normal task — uses default low reasoning
await agent.execute({"id": "q1", "objective": "What is 2+2?"})
# Complex task — override to high reasoning just for this call
await agent.execute(
{"id": "q2", "objective": "Prove the Riemann hypothesis."},
llm_params=OpenAILLMParams(reasoning_effort="high", max_tokens=8192),
)
# Back to low reasoning for the next task
await agent.execute({"id": "q3", "objective": "Summarize this text."})
Example 5: Streaming — "Show results as they come" — WORKS TODAY (v0.3.0)
from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.streaming.events import StreamEventType
from nucleusiq_openai import BaseOpenAI, OpenAITool
llm = BaseOpenAI(model_name="gpt-4o")
agent = Agent(
name="StreamBot",
role="Assistant",
objective="Help users with real-time responses.",
llm=llm,
prompt=prompt,
tools=[OpenAITool.web_search()],
config=AgentConfig(execution_mode=ExecutionMode.STANDARD),
)
# Streaming — async generator of StreamEvent objects
async for event in agent.execute_stream({"id": "s1", "objective": "Write a long essay on AI."}):
match event.type:
case StreamEventType.TOKEN:
print(event.token, end="", flush=True)
case StreamEventType.TOOL_CALL_START:
print(f"\n[Calling {event.tool_name}...]")
case StreamEventType.TOOL_CALL_END:
print(f"[Result from {event.tool_name}]")
case StreamEventType.THINKING:
print(f"... {event.message}")
case StreamEventType.COMPLETE:
full_text = event.content
print(f"\n\nDone! ({len(full_text)} chars)")
How streaming works internally:
agent.execute_stream(task)
│
├─ Same lifecycle as execute(): resolve mode, params, plugins
│
▼
mode.run_stream(agent, task)
│
├─ build_messages + build_call_kwargs
│
▼
call_llm_stream(agent, kwargs)
│
├─ before_model hooks
│
▼
llm.call_stream(**kwargs)
│
├─ yield StreamEvent.TOKEN (content chunks)
├─ accumulate tool_calls across chunks
│
▼
Stream done — tool_calls detected?
│
├─ YES → yield TOOL_CALL_START → Execute → yield TOOL_CALL_END → loop back
└─ NO → yield COMPLETE (final answer)
StreamEvent types:
| Event Type | When | Useful Fields |
|---|---|---|
TOKEN |
Each content chunk | event.token |
TOOL_CALL_START |
Tool execution begins | event.tool_name, event.tool_args |
TOOL_CALL_END |
Tool execution completes | event.tool_name, event.tool_result |
LLM_CALL_START |
LLM call begins | event.call_count |
LLM_CALL_END |
LLM call completes | event.call_count |
THINKING |
Internal processing step | event.message |
COMPLETE |
Final result ready | event.content, event.metadata |
ERROR |
Error occurred | event.message |
All 3 modes support streaming: Direct, Standard, and Autonomous.
Example 6: Multimodal — "Analyze this image" — DONE (v0.4.0)
Framework-level Task.attachments is now fully implemented:
from nucleusiq.agents.task import Task
from nucleusiq.agents.attachments import Attachment, AttachmentType
task = Task(
id="img-1",
objective="What is in this image?",
attachments=[
Attachment(type=AttachmentType.IMAGE_URL, content="https://example.com/photo.jpg"),
],
)
result = await agent.execute(task)
All 7 attachment types are supported: TEXT, PDF, IMAGE_URL, IMAGE_BASE64, FILE_BYTES, FILE_BASE64, FILE_URL.
Example 7: Direct LLM Access — "I don't need an Agent" — WORKS TODAY
For users who want full control without an Agent wrapper:
from nucleusiq_openai import BaseOpenAI
llm = BaseOpenAI(model_name="gpt-4o")
# Direct call with ALL parameters
result = await llm.call(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://...", "detail": "high"}},
]},
],
temperature=0.2,
max_tokens=4096,
seed=42,
reasoning_effort="high",
service_tier="priority",
n=3,
logprobs=True,
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "mp3"},
response_format=MyPydanticModel,
)
# Or raw Responses API
response = await llm.responses_call(
model="gpt-4o",
input="Search for quantum computing breakthroughs",
tools=[{"type": "web_search_preview"}],
include=["output[*].web_search_call.results"],
reasoning={"effort": "high", "summary": "auto"},
truncation="auto",
**any_future_params,
)
# Direct streaming (provider-level)
async for event in llm.call_stream(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a poem"}],
):
if event.type == StreamEventType.TOKEN:
print(event.token, end="")
Type-Safe LLM Parameters (LLMParams) — IMPLEMENTED
The llm_params field is NOT a raw Dict[str, Any]. It is a typed Pydantic model that provides:
- IDE autocomplete for all parameter names
- Value validation (wrong types or invalid values are caught immediately)
- Provider-specific parameters via inheritance
Architecture: Base + Provider-Specific
nucleusiq (core) nucleusiq_openai (provider)
┌──────────────────────┐ ┌────────────────────────────────┐
│ LLMParams (base) │ │ OpenAILLMParams(LLMParams) │
│ │ │ │
│ temperature: float │ extends │ reasoning_effort: Literal │
│ max_tokens: int │ ────────► │ service_tier: Literal │
│ top_p: float │ │ modalities: List[Literal] │
│ seed: int │ │ audio: AudioOutputConfig │
│ stream: bool │ │ parallel_tool_calls: bool │
│ stop: List[str] │ │ logprobs: bool │
│ n: int │ │ top_logprobs: int │
│ ...common params │ │ metadata: Dict │
└──────────────────────┘ │ store: bool │
│ truncation: Literal │
Future providers: │ ...OpenAI-specific │
GeminiLLMParams └────────────────────────────────┘
OllamaLLMParams
What Happens With a Typo
from nucleusiq_openai import OpenAILLMParams
# TYPO: "temperture" instead of "temperature"
params = OpenAILLMParams(temperture=0.3)
# ❌ ValidationError: Extra inputs are not permitted
# 'temperture' → Did you mean 'temperature'?
# WRONG VALUE: reasoning_effort must be one of the Literal values
params = OpenAILLMParams(reasoning_effort="super_high")
# ❌ ValidationError: Input should be 'none', 'minimal', 'low',
# 'medium', 'high', or 'xhigh'
# CORRECT: IDE autocomplete guides you to the right values
params = OpenAILLMParams(
temperature=0.3,
reasoning_effort="high",
service_tier="flex",
seed=42,
)
Complete Configuration Reference
Where Each Parameter Lives
| Parameter | BaseOpenAI init | AgentConfig.llm_params | Per-execute llm_params | llm.call() direct |
|---|---|---|---|---|
temperature |
Default for all calls | Override per agent | Override per task | Full control |
max_tokens |
- | llm_max_tokens field |
Override per task | Full control |
seed |
Default for all calls | Override per agent | Override per task | Full control |
reasoning_effort |
Default for all calls | Override per agent | Override per task | Full control |
service_tier |
Default for all calls | Override per agent | Override per task | Full control |
top_p |
- | Override per agent | Override per task | Full control |
frequency_penalty |
- | Override per agent | Override per task | Full control |
presence_penalty |
- | Override per agent | Override per task | Full control |
n |
- | - | Override per task | Full control |
logprobs |
- | - | - | Full control |
modalities |
- | Override per agent | Override per task | Full control |
audio |
- | Override per agent | Override per task | Full control |
stream |
- | Override per agent | - | Full control |
parallel_tool_calls |
- | Override per agent | Override per task | Full control |
metadata |
Default for all calls | Override per agent | Override per task | Full control |
attachments (multimodal) |
- | - | In task dict (v0.4.0) | Build manually |
Design Principles
1. Progressive Disclosure
Users only see what they need. A beginner creates an Agent with 5 lines. An advanced user adds llm_params to AgentConfig. A power user calls llm.call() directly.
2. Sensible Defaults
Every parameter has a default that works for 90% of cases. Users override only what they need.
3. Merge Chain (LLM < AgentConfig < Per-Execute)
Settings merge in a clear priority order. Per-execute overrides beat AgentConfig, which beats LLM init defaults.
4. Agent Stays Clean
Agent() and agent.execute() never expose raw LLM parameters directly. The llm_params dict is an opt-in escape hatch that passes through transparently.
5. Don't Break Existing Code
All changes are additive. llm_params defaults to None. Existing code works identically. execute_stream() is a new method alongside the existing execute().
6. Provider Isolation
The framework (nucleusiq core) defines abstract contracts (BaseLLM, StreamEvent, BaseTool). Providers (nucleusiq-openai, future Gemini/Ollama) implement those contracts. No provider-specific types leak into core.
Platform APIs — Separate Focused Classes — NOT YET IMPLEMENTED
Platform APIs (embeddings, audio, files) are not part of the LLM. They will be standalone classes (GAPs 6-8):
from nucleusiq_openai.embeddings import OpenAIEmbeddings
from nucleusiq_openai.audio import OpenAIAudio
from nucleusiq_openai.files import OpenAIFiles
# Embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectors = await embeddings.embed(["Hello world", "Goodbye world"])
# Audio
audio = OpenAIAudio()
transcript = await audio.transcribe("meeting.mp3")
speech = await audio.speak("Hello!", voice="nova", format="mp3")
# Files
files = OpenAIFiles()
file_id = await files.upload("report.pdf", purpose="file_search")
await files.delete(file_id)
Summary: What Changes for the User
| User Level | What They Do | Capability | Status |
|---|---|---|---|
| Beginner | Agent(llm=BaseOpenAI("gpt-4o")) |
Nothing changes — just works | DONE |
| Tuning | BaseOpenAI(seed=42, reasoning_effort="low") |
Set LLM-wide defaults at init | DONE |
| Per-Agent | AgentConfig(llm_params=OpenAILLMParams(temperature=0.9)) |
Different settings per agent (type-safe) | DONE |
| Per-Task | agent.execute(task, llm_params=OpenAILLMParams(reasoning_effort="high")) |
Override per task (type-safe) | DONE |
| Streaming | async for event in agent.execute_stream(task) |
Real-time token-by-token output | DONE |
| Multimodal | Task(attachments=[Attachment(type=..., content=...)]) |
Images/files in tasks | DONE |
| Direct | llm.call(...) or llm.responses_call(...) |
Full raw API access | DONE |
| Direct Streaming | async for event in llm.call_stream(...) |
Provider-level streaming | DONE |