Ollama provider
Run Ollama models (local daemon or a reachable HTTP API) through NucleusIQ using the official ollama Python SDK — no LangChain.
Alpha release
nucleusiq-ollama 0.1.0a1 is a PyPI pre-release (Development Status :: 3 - Alpha). APIs and behavior may change. Requires nucleusiq>=0.7.10 (structured-output resolver wiring). Treat as experimental for production.
What you get (Phase A)
| Capability | Supported |
|---|---|
Chat (/api/chat) |
Yes |
Streaming (StreamEvent, tokens + metadata) |
Yes |
@tool / function tools |
Yes |
Structured output (format / JSON schema) |
Yes — combining response_format with tools drops format with a warning (same caution pattern as Groq) |
think (reasoning / THINKING stream events) |
Yes |
| Vision / embeddings | Not wired in BaseOllama yet |
Prerequisites
- Ollama installed and a model pulled, e.g.
ollama pull llama3.2(names depend on your catalog). - Daemon reachable at
OLLAMA_HOSTor the SDK default (http://127.0.0.1:11434when unset).
Installation
pip install nucleusiq nucleusiq-ollama
Pin the alpha explicitly when reproducibility matters:
pip install "nucleusiq>=0.7.10" "nucleusiq-ollama==0.1.0a1"
Dependency: ollama>=0.5.0,<1.0.
Environment
| Variable | Purpose |
|---|---|
OLLAMA_HOST |
Passed as host= to the ollama AsyncClient / Client (omit for local default). |
OLLAMA_API_KEY |
Optional Bearer token for hosted / authenticated endpoints. |
OLLAMA_MODEL |
Optional default model id (examples often use llama3.2). |
# export OLLAMA_HOST=http://127.0.0.1:11434
# export OLLAMA_API_KEY=... # only if your endpoint requires it
export OLLAMA_MODEL=llama3.2
Quick start (Direct)
Use BaseOllama with async_mode=True. Call await agent.initialize() before execute() (matches monorepo examples).
import asyncio
from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.agents.task import Task
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq_ollama import BaseOllama, OllamaLLMParams
async def main() -> None:
llm = BaseOllama(model_name="llama3.2", async_mode=True)
agent = Agent(
name="ollama-demo",
prompt=ZeroShotPrompt().configure(
system="You are a concise assistant.",
),
llm=llm,
config=AgentConfig(
execution_mode=ExecutionMode.DIRECT,
llm_params=OllamaLLMParams(temperature=0.3, max_output_tokens=256),
),
)
await agent.initialize()
result = await agent.execute(
Task(id="ollama-1", objective="What is the capital of France?"),
)
print(result.output)
asyncio.run(main())
Tools (Standard / Autonomous)
Ollama accepts OpenAI-style function tools via to_ollama_function_tool. From the agent’s perspective this is the same @tool workflow as other providers.
from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq.tools.decorators import tool
from nucleusiq_ollama import BaseOllama, OllamaLLMParams
@tool
def add(a: int, b: int) -> str:
"""Add two integers."""
return str(a + b)
llm = BaseOllama(model_name="llama3.2", async_mode=True)
agent = Agent(
name="ollama-tools",
prompt=ZeroShotPrompt().configure(system="Use tools for arithmetic."),
llm=llm,
tools=[add],
config=AgentConfig(
execution_mode=ExecutionMode.STANDARD,
llm_params=OllamaLLMParams(temperature=0.4, max_output_tokens=512),
),
)
There are no Ollama “native server tools” wired like Gemini GoogleTool — local @tool only in this alpha.
OllamaLLMParams
Extends LLMParams with extra="forbid".
| Field | Meaning |
|---|---|
think |
bool or "low" / "medium" / "high" — maps to Ollama think; streams THINKING events when enabled. |
keep_alive |
Ollama model keep-alive duration (float, str, or None). |
Framework fields such as temperature, max_output_tokens (→ Ollama num_predict), top_p, penalties, stop, seed are merged into Ollama options — see the design doc.
from nucleusiq.agents.config import AgentConfig
from nucleusiq_ollama import OllamaLLMParams
config = AgentConfig(
llm_params=OllamaLLMParams(
temperature=0.2,
max_output_tokens=512,
think="medium",
keep_alive="5m",
),
)
Structured output
With nucleusiq>=0.7.10, the core structured_output resolver recognizes BaseOllama so Agent(..., response_format=MyModel) uses the correct provider payload. Model and schema support still depend on your Ollama model and server version — validate on your stack.
Tools + schema
If the agent has tools, structured format is dropped for safety (logged). Prefer a tools-only pass or a separate execute without tools when you need strict JSON.
Streaming
Use agent.execute_stream(...) like other providers; the adapter emits StreamEvent tokens (and THINKING when think is enabled).
Runnable examples (monorepo)
From src/providers/inference/ollama after uv sync:
uv run python examples/agents/00_ollama_smoke.py
uv run python examples/agents/01_ollama_direct.py
uv run python examples/agents/02_ollama_stream_live.py
uv run python examples/agents/03_ollama_capabilities_matrix.py
03_ollama_capabilities_matrix.py — chat, stream, structured output, and thinking × DIRECT / STANDARD / AUTONOMOUS (filter with --only).
Package README: src/providers/inference/ollama/README.md.
See also
- Ollama quickstart — Copy-paste gears
- Providers — Portability
- Models — Parameter tabs
- Installation —
nucleusiq[http]optional extra (v0.7.10) - Structured output — Framework patterns