Ollama

The Ollama adapter provides access to local models running via Ollama, allowing you to run AI models on your own infrastructure with full privacy and no API costs.

Installation

npm install @tanstack/ai-ollama

npm install @tanstack/ai-ollama

Basic Usage

typescript

import { chat } from "@tanstack/ai";
import { ollamaText } from "@tanstack/ai-ollama";

const stream = chat({
  adapter: ollamaText("llama3"),
  messages: [{ role: "user", content: "Hello!" }],
});

import { chat } from "@tanstack/ai";
import { ollamaText } from "@tanstack/ai-ollama";

const stream = chat({
  adapter: ollamaText("llama3"),
  messages: [{ role: "user", content: "Hello!" }],
});

Basic Usage - Custom Host

typescript

import { chat } from "@tanstack/ai";
import { createOllamaChat } from "@tanstack/ai-ollama";

const adapter = createOllamaChat("llama3", "http://your-server:11434");

const stream = chat({
  adapter,
  messages: [{ role: "user", content: "Hello!" }],
});

import { chat } from "@tanstack/ai";
import { createOllamaChat } from "@tanstack/ai-ollama";

const adapter = createOllamaChat("llama3", "http://your-server:11434");

const stream = chat({
  adapter,
  messages: [{ role: "user", content: "Hello!" }],
});

Configuration

typescript

import { createOllamaChat } from "@tanstack/ai-ollama";

// Custom host (URL string)
const adapter = createOllamaChat("llama3", "http://your-server:11434");

// Custom client config (e.g., custom headers, fetch)
const adapter2 = createOllamaChat("llama3", {
  host: "http://your-server:11434",
  headers: { Authorization: "Bearer ..." },
});

import { createOllamaChat } from "@tanstack/ai-ollama";

// Custom host (URL string)
const adapter = createOllamaChat("llama3", "http://your-server:11434");

// Custom client config (e.g., custom headers, fetch)
const adapter2 = createOllamaChat("llama3", {
  host: "http://your-server:11434",
  headers: { Authorization: "Bearer ..." },
});

Available Models

To see available models on your Ollama instance:

ollama list

ollama list

Popular Models

llama3 / llama3.1 / llama3.2 - Meta's Llama models
mistral / mistral:7b - Mistral AI models
mixtral - Mixtral MoE model
codellama - Code-focused Llama
phi3 - Microsoft's Phi models
gemma / gemma2 - Google's Gemma models
qwen2 / qwen2.5 - Alibaba's Qwen models
deepseek-coder - DeepSeek coding model

Example: Chat Completion

typescript

import { chat, toServerSentEventsResponse } from "@tanstack/ai";
import { ollamaText } from "@tanstack/ai-ollama";

export async function POST(request: Request) {
  const { messages } = await request.json();

  const stream = chat({
    adapter: ollamaText("llama3"),
    messages,
  });

  return toServerSentEventsResponse(stream);
}

import { chat, toServerSentEventsResponse } from "@tanstack/ai";
import { ollamaText } from "@tanstack/ai-ollama";

export async function POST(request: Request) {
  const { messages } = await request.json();

  const stream = chat({
    adapter: ollamaText("llama3"),
    messages,
  });

  return toServerSentEventsResponse(stream);
}

Example: With Tools

typescript

import { chat, toServerSentEventsResponse, toolDefinition } from "@tanstack/ai";
import { ollamaText } from "@tanstack/ai-ollama";
import { z } from "zod";

const getLocalDataDef = toolDefinition({
  name: "get_local_data",
  description: "Get data from local storage",
  inputSchema: z.object({
    key: z.string(),
  }),
});

const getLocalData = getLocalDataDef.server(async ({ key }) => {
  // Access local data
  return { data: "..." };
});

export async function POST(request: Request) {
  const { messages } = await request.json();

  const stream = chat({
    adapter: ollamaText("llama3"),
    messages,
    tools: [getLocalData],
  });

  return toServerSentEventsResponse(stream);
}

import { chat, toServerSentEventsResponse, toolDefinition } from "@tanstack/ai";
import { ollamaText } from "@tanstack/ai-ollama";
import { z } from "zod";

const getLocalDataDef = toolDefinition({
  name: "get_local_data",
  description: "Get data from local storage",
  inputSchema: z.object({
    key: z.string(),
  }),
});

const getLocalData = getLocalDataDef.server(async ({ key }) => {
  // Access local data
  return { data: "..." };
});

export async function POST(request: Request) {
  const { messages } = await request.json();

  const stream = chat({
    adapter: ollamaText("llama3"),
    messages,
    tools: [getLocalData],
  });

  return toServerSentEventsResponse(stream);
}

Note: Tool support varies by model. Models like llama3, mistral, and qwen2 generally have good tool calling support.

Model Options

Ollama supports various provider-specific options. Unlike the other providers, Ollama nests its sampling and runner parameters inside an options object within modelOptions — temperature, top_p, and num_predict (the token-limit key) all live under modelOptions.options:

typescript

import { chat } from "@tanstack/ai";
import { ollamaText } from "@tanstack/ai-ollama";

const stream = chat({
  adapter: ollamaText("llama3:latest"),
  messages: [{ role: "user", content: "Hello!" }],
  modelOptions: {
    options: {
      temperature: 0.7,
      top_p: 0.9,
      top_k: 40,
      num_predict: 1000, // Max tokens to generate
      repeat_penalty: 1.1,
      num_ctx: 4096, // Context window size
      num_gpu: -1, // GPU layers (-1 = auto)
    },
  },
});

import { chat } from "@tanstack/ai";
import { ollamaText } from "@tanstack/ai-ollama";

const stream = chat({
  adapter: ollamaText("llama3:latest"),
  messages: [{ role: "user", content: "Hello!" }],
  modelOptions: {
    options: {
      temperature: 0.7,
      top_p: 0.9,
      top_k: 40,
      num_predict: 1000, // Max tokens to generate
      repeat_penalty: 1.1,
      num_ctx: 4096, // Context window size
      num_gpu: -1, // GPU layers (-1 = auto)
    },
  },
});

If you previously passed temperature / topP / maxTokens at the root of chat(), note that for Ollama they map to modelOptions.options.temperature, modelOptions.options.top_p, and modelOptions.options.num_predict. See Moving Sampling Options into modelOptions.

Advanced Options

All sampling and runner parameters are nested under modelOptions.options:

typescript

modelOptions: {
  options: {
    // Sampling
    temperature: 0.7,
    top_p: 0.9,
    top_k: 40,
    min_p: 0.05,
    typical_p: 1.0,

    // Generation
    num_predict: 1000,
    repeat_penalty: 1.1,
    repeat_last_n: 64,
    penalize_newline: false,

    // Performance
    num_ctx: 4096,
    num_batch: 512,
    num_gpu: -1,
    num_thread: 0, // 0 = auto

    // Memory
    use_mmap: true,
    use_mlock: false,

    // Mirostat sampling
    mirostat: 0, // 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0
    mirostat_tau: 5.0,
    mirostat_eta: 0.1,
  },
}

modelOptions: {
  options: {
    // Sampling
    temperature: 0.7,
    top_p: 0.9,
    top_k: 40,
    min_p: 0.05,
    typical_p: 1.0,

    // Generation
    num_predict: 1000,
    repeat_penalty: 1.1,
    repeat_last_n: 64,
    penalize_newline: false,

    // Performance
    num_ctx: 4096,
    num_batch: 512,
    num_gpu: -1,
    num_thread: 0, // 0 = auto

    // Memory
    use_mmap: true,
    use_mlock: false,

    // Mirostat sampling
    mirostat: 0, // 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0
    mirostat_tau: 5.0,
    mirostat_eta: 0.1,
  },
}

Summarization

Summarize long text content locally:

typescript

import { summarize } from "@tanstack/ai";
import { ollamaSummarize } from "@tanstack/ai-ollama";

const result = await summarize({
  adapter: ollamaSummarize("llama3"),
  text: "Your long text to summarize...",
  maxLength: 100,
  style: "concise", // "concise" | "bullet-points" | "paragraph"
});

console.log(result.summary);

import { summarize } from "@tanstack/ai";
import { ollamaSummarize } from "@tanstack/ai-ollama";

const result = await summarize({
  adapter: ollamaSummarize("llama3"),
  text: "Your long text to summarize...",
  maxLength: 100,
  style: "concise", // "concise" | "bullet-points" | "paragraph"
});

console.log(result.summary);

Setting Up Ollama

1. Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from https://ollama.com

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from https://ollama.com

2. Pull a Model

ollama pull llama3

ollama pull llama3

3. Start Ollama Server

ollama serve

ollama serve

The server runs on http://localhost:11434 by default.

Running on a Remote Server

typescript

import { createOllamaChat } from "@tanstack/ai-ollama";

const adapter = createOllamaChat("llama3", "http://your-server:11434");

import { createOllamaChat } from "@tanstack/ai-ollama";

const adapter = createOllamaChat("llama3", "http://your-server:11434");

To expose Ollama on a network interface:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Environment Variables

Optionally set the host in environment variables:

OLLAMA_HOST=http://localhost:11434

OLLAMA_HOST=http://localhost:11434

API Reference

ollamaText(model)

Creates an Ollama text/chat adapter using OLLAMA_HOST from the environment (defaults to http://localhost:11434).

Parameters:

model - Model name (e.g. "llama3", "mistral:7b")

createOllamaChat(model, hostOrConfig?)

Creates an Ollama text/chat adapter with an explicit host or client config.

Parameters:

model - Model name
hostOrConfig? - Either an OLLAMA_HOST-style URL string, or an OllamaClientConfig object (e.g. { host, headers, fetch }).

ollamaSummarize(model) / createOllamaSummarize(model, hostOrConfig?)

Creates an Ollama summarization adapter — same signature shape as the chat adapter.

Benefits of Ollama

✅ Privacy - Data stays on your infrastructure
✅ Cost - No API costs after hardware
✅ Customization - Use any compatible model
✅ Offline - Works without internet
✅ Speed - No network latency for local deployment

Limitations

Image Generation: Ollama does not support image generation. Use OpenAI or Gemini for image generation.
Performance: Depends on your hardware (GPU recommended for larger models)

Next Steps

Getting Started - Learn the basics
Tools Guide - Learn about tools
Other Adapters - Explore other providers

Installation#

Basic Usage#

Basic Usage - Custom Host#

Configuration#

Available Models#

Popular Models#

Example: Chat Completion#

Example: With Tools#

Model Options#

Advanced Options#

Summarization#

Setting Up Ollama#

1. Install Ollama#

2. Pull a Model#

3. Start Ollama Server#

Running on a Remote Server#

Environment Variables#

API Reference#

ollamaText(model)#

createOllamaChat(model, hostOrConfig?)#

ollamaSummarize(model) / createOllamaSummarize(model, hostOrConfig?)#

Benefits of Ollama#

Limitations#

Next Steps#

Installation

Basic Usage

Basic Usage - Custom Host

Configuration

Available Models

Popular Models

Example: Chat Completion

Example: With Tools

Model Options

Advanced Options

Summarization

Setting Up Ollama

1. Install Ollama

2. Pull a Model

3. Start Ollama Server

Running on a Remote Server

Environment Variables

API Reference

ollamaText(model)

createOllamaChat(model, hostOrConfig?)

ollamaSummarize(model) / createOllamaSummarize(model, hostOrConfig?)

Benefits of Ollama

Limitations

Next Steps