Skip to content

Ollama provider

Run Ollama models (local daemon or a reachable HTTP API) through NucleusIQ using the official ollama Python SDK β€” no LangChain.

🟒 Stable β€” nucleusiq-ollama 0.2.0

nucleusiq-ollama 0.2.0 ships as Development Status :: 5 - Production/Stable (first stable line). Requires nucleusiq>=0.7.12. 98 unit tests, 99.85% coverage.

What you get in 0.2.0

Capability Supported
Chat (/api/chat) βœ…
Streaming (StreamEvent, tokens + metadata) βœ…
@tool / function tools βœ…
Structured output (format / JSON schema) βœ… β€” combining response_format with tools drops format with a warning (same caution pattern as Groq)
think (reasoning / THINKING stream events) βœ…
Vision (image messages) βœ… New in 0.2.0
LLMCallRecord.provider="ollama" enrichment βœ… New in 0.2.0
Embeddings Out of scope for this stable line.

Prerequisites

  1. Ollama installed and a model pulled, e.g. ollama pull llama3.2 (names depend on your catalog).
  2. Daemon reachable at OLLAMA_HOST or the SDK default (http://127.0.0.1:11434 when unset).

Installation

pip install nucleusiq nucleusiq-ollama

Pin the stable line for reproducible builds:

pip install "nucleusiq>=0.7.12" "nucleusiq-ollama>=0.2.0,<0.3"

Dependency: ollama>=0.5.0,<1.0.

Environment

Variable Purpose
OLLAMA_HOST Passed as host= to the ollama AsyncClient / Client (omit for local default).
OLLAMA_API_KEY Optional Bearer token for hosted / authenticated endpoints.
OLLAMA_MODEL Optional default model id (examples often use llama3.2).
# export OLLAMA_HOST=http://127.0.0.1:11434
# export OLLAMA_API_KEY=...   # only if your endpoint requires it
export OLLAMA_MODEL=llama3.2

Quick start (Direct)

Use BaseOllama with async_mode=True. Call await agent.initialize() before execute() (matches monorepo examples).

import asyncio

from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.agents.task import Task
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq_ollama import BaseOllama, OllamaLLMParams


async def main() -> None:
    llm = BaseOllama(model_name="llama3.2", async_mode=True)

    agent = Agent(
        name="ollama-demo",
        prompt=ZeroShotPrompt().configure(
            system="You are a concise assistant.",
        ),
        llm=llm,
        config=AgentConfig(
            execution_mode=ExecutionMode.DIRECT,
            llm_params=OllamaLLMParams(temperature=0.3, max_output_tokens=256),
        ),
    )

    await agent.initialize()

    result = await agent.execute(
        Task(id="ollama-1", objective="What is the capital of France?"),
    )
    print(result.output)


asyncio.run(main())

Tools (Standard / Autonomous)

Ollama accepts OpenAI-style function tools via to_ollama_function_tool. From the agent’s perspective this is the same @tool workflow as other providers.

from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq.tools.decorators import tool
from nucleusiq_ollama import BaseOllama, OllamaLLMParams


@tool
def add(a: int, b: int) -> str:
    """Add two integers."""
    return str(a + b)


llm = BaseOllama(model_name="llama3.2", async_mode=True)
agent = Agent(
    name="ollama-tools",
    prompt=ZeroShotPrompt().configure(system="Use tools for arithmetic."),
    llm=llm,
    tools=[add],
    config=AgentConfig(
        execution_mode=ExecutionMode.STANDARD,
        llm_params=OllamaLLMParams(temperature=0.4, max_output_tokens=512),
    ),
)

There are no Ollama "native server tools" like Gemini's GoogleTool β€” local @tool only.

Vision (image messages) β€” new in 0.2.0

The _shared/wire.py sanitize_messages helper now splits OpenAI-style multimodal content lists into Ollama's chat-message shape: text parts become the content string, and image_url parts whose URL is a data:image/*;base64,… URL are decoded into Ollama's images field.

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "data:image/png;base64,iVBORw0K..."},
            },
        ],
    }
]
messages = [
    {
        "role": "user",
        "content": "What's in this image?",
        "images": ["iVBORw0K..."],  # raw base64 strings
    }
]
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this:"},
            {"type": "image", "data": "iVBORw0K..."},   # raw base64
        ],
    }
]

HTTP image URLs are skipped

Anything that isn't a data: URL (e.g. https://example.com/cat.png) triggers a warning and is omitted from the request β€” NucleusIQ does not fetch remote images on your behalf. Encode the image client-side as data:image/...;base64,... before sending.

Multi-modal model required

Vision requires a multimodal Ollama model β€” for example llama3.2-vision, llava, or bakllava. Pull it first:

ollama pull llama3.2-vision

OllamaLLMParams

Extends LLMParams with extra="forbid".

Field Meaning
think bool or "low" / "medium" / "high" β€” maps to Ollama think; streams THINKING events when enabled.
keep_alive Ollama model keep-alive duration (float, str, or None).

Framework fields such as temperature, max_output_tokens (β†’ Ollama num_predict), top_p, penalties, stop, seed are merged into Ollama options β€” see the design doc.

from nucleusiq.agents.config import AgentConfig
from nucleusiq_ollama import OllamaLLMParams

config = AgentConfig(
    llm_params=OllamaLLMParams(
        temperature=0.2,
        max_output_tokens=512,
        think="medium",
        keep_alive="5m",
    ),
)

Structured output

With nucleusiq>=0.7.10, the core structured_output resolver recognizes BaseOllama so Agent(..., response_format=MyModel) uses the correct provider payload. Model and schema support still depend on your Ollama model and server version β€” validate on your stack.

Tools + schema

If the agent has tools, structured format is dropped for safety (logged). Prefer a tools-only pass or a separate execute without tools when you need strict JSON.

Streaming

Use agent.execute_stream(...) like other providers; the adapter emits StreamEvent tokens (and THINKING when think is enabled).

Runnable examples (monorepo)

From src/providers/inference/ollama after uv sync:

uv run python examples/agents/00_ollama_smoke.py
uv run python examples/agents/01_ollama_direct.py
uv run python examples/agents/02_ollama_stream_live.py
uv run python examples/agents/03_ollama_capabilities_matrix.py

03_ollama_capabilities_matrix.py β€” chat, stream, structured output, and thinking Γ— DIRECT / STANDARD / AUTONOMOUS (filter with --only).

Package README: src/providers/inference/ollama/README.md.

See also