Running Iris Locally

Iris supports configurable providers for every subsystem, which means you can run it entirely on local hardware using Ollama — no external API keys required. This is great for privacy-conscious setups, offline use, or just avoiding API costs during development.

How It Works

Every Iris subsystem that calls an LLM (chat, memory extraction, summarization, consolidation, embeddings, etc.) has its own provider and model settings. By default these point to Anthropic and OpenAI, but you can override any or all of them in config/iris-custom.php to use Ollama or any other Prism provider.

Prerequisites

Ollama installed and running — Install Ollama for your platform
A chat model — You'll need a model that supports tool use and structured output, since Iris relies on both heavily
An embedding model — For semantic memory search
Sufficient hardware — Local models need memory. Check your model's requirements against your available GPU VRAM (or system RAM for CPU inference)

Step 1: Pull Your Models

You'll need a chat model and an embedding model. Pull whichever models you prefer from the Ollama library:

bash

ollama pull <your-chat-model>
ollama pull <your-embedding-model>

The key requirements for the chat model are tool/function calling support and structured output reliability — Iris uses both extensively across all subsystems.

Verify Ollama is running:

bash

ollama list

Step 2: Configure the Environment

Ollama's default URL is http://localhost:11434. If you're running it elsewhere, set the URL in your .env:

bash

# .env (only needed if Ollama isn't on localhost:11434)
OLLAMA_URL=http://localhost:11434

Since you're not using Anthropic or OpenAI, you can remove or leave blank those API keys:

bash

# .env
# ANTHROPIC_API_KEY=     # Not needed for local-only setup
# OPENAI_API_KEY=        # Not needed for local-only setup

Step 3: Create Your Custom Config

Create config/iris-custom.php to point every subsystem at Ollama. Here's a complete example using qwen3.5:35b for chat and qwen3-embedding for embeddings — swap in your preferred models:

php

<?php

declare(strict_types=1);

return [
    // Primary chat agent
    'agent' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],

    // Disable Anthropic-specific provider tools (web search/fetch)
    // These are Anthropic-only features and won't work with other providers
    'provider_tools' => [],

    // Conversation summarization
    'summarization' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],

    // Truth crystallization and promotion
    'truths' => [
        'crystallization_provider' => 'ollama',
        'crystallization_model' => 'qwen3.5:35b',
        'promotion_provider' => 'ollama',
        'promotion_model' => 'qwen3.5:35b',
    ],

    // Memory recall query generation
    'memory' => [
        'recall_provider' => 'ollama',
        'recall_model' => 'qwen3.5:35b',
    ],

    // Memory extraction from conversations
    'extraction' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],

    // Memory consolidation
    'consolidation' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],

    // Truth consolidation
    'truth_consolidation' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],

    // Embeddings for semantic search
    'embeddings' => [
        'provider' => 'ollama',
        'model' => 'qwen3-embedding',
        'provider_options' => [
            'dimensions' => 1536,
        ],
    ],

    // Proactive messages (heartbeat)
    'heartbeat' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],

    // Sub-agent for task delegation
    'subagent' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],
];

Embedding Dimensions

Iris uses pgvector with 1536-dimensional vectors by default (matching OpenAI's text-embedding-3-small). If your Ollama embedding model produces different dimensions, you have two options:

Set dimensions in provider_options to 1536 (shown above) — this works if the model supports dimension configuration
Match the model's native dimensions — this requires updating the database column size in a migration

For most setups, setting 'dimensions' => 1536 in provider_options is the simplest path.

WARNING

If you switch embedding models or providers after memories have been stored, the existing embeddings won't be compatible with the new model. You'll need to re-embed existing memories or start fresh.

Things to Know

Provider tools. Anthropic's built-in web search and web fetch tools are provider-specific and won't work with other providers. The example config above disables them with 'provider_tools' => [].

Structured output. Iris's background tasks (extraction, consolidation, truth crystallization) rely heavily on structured output. Make sure your chosen model handles JSON schema responses reliably.

Tool use. Iris relies on tools for agentic behavior. Make sure your chat model supports tool/function calling.

Hybrid Configurations

You don't have to go fully local. Since every subsystem has its own provider, you can mix cloud and local based on what matters most to you:

php

// config/iris-custom.php
return [
    // Use Claude for chat, Ollama for everything else
    'agent' => [
        'provider' => 'anthropic',
        'model' => 'claude-sonnet-4-5',
    ],

    // Background tasks run locally
    'extraction' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],
    'summarization' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],
    'consolidation' => [
        'provider' => 'ollama',
        'model' => 'qwen3.5:35b',
    ],

    // Local embeddings
    'embeddings' => [
        'provider' => 'ollama',
        'model' => 'qwen3-embedding',
        'provider_options' => [
            'dimensions' => 1536,
        ],
    ],
];

This gives you cloud-quality chat while offloading batch work to local models — a practical middle ground between quality and cost.

Running Iris Locally ​

How It Works ​

Prerequisites ​

Step 1: Pull Your Models ​

Step 2: Configure the Environment ​

Step 3: Create Your Custom Config ​

Embedding Dimensions ​

Things to Know ​

Hybrid Configurations ​

Running Iris Locally

How It Works

Prerequisites

Step 1: Pull Your Models

Step 2: Configure the Environment

Step 3: Create Your Custom Config

Embedding Dimensions

Things to Know

Hybrid Configurations