GrepAI Embeddings with Ollama

This skill covers using Ollama as the embedding provider for GrepAI, enabling 100% private, local code search.

When to Use This Skill

Setting up private, local embeddings
Choosing the right Ollama model
Optimizing Ollama performance
Troubleshooting Ollama connection issues

Why Ollama?

Advantage	Description
🔒 Privacy	Code never leaves your machine
💰 Free	No API costs or usage limits
⚡ Speed	No network latency
🔌 Offline	Works without internet
🔧 Control	Choose your model

Prerequisites

Ollama installed and running
An embedding model downloaded

bash

# Install Ollama
brew install ollama  # macOS
# or
curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Start Ollama
ollama serve

# Download model
ollama pull nomic-embed-text

Configuration

Basic Configuration

yaml

# .grepai/config.yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434

With Custom Endpoint

yaml

embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://192.168.1.100:11434  # Remote Ollama server

With Explicit Dimensions

yaml

embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
  dimensions: 768  # Usually auto-detected

Available Models

Recommended: nomic-embed-text

bash

ollama pull nomic-embed-text

Property	Value
Dimensions	768
Size	~274 MB
Speed	Fast
Quality	Excellent for code
Language	English-optimized

Configuration:

yaml

embedder:
  provider: ollama
  model: nomic-embed-text

Multilingual: nomic-embed-text-v2-moe

bash

ollama pull nomic-embed-text-v2-moe

Property	Value
Dimensions	768
Size	~500 MB
Speed	Medium
Quality	Excellent
Language	Multilingual

Best for codebases with non-English comments/documentation.

Configuration:

yaml

embedder:
  provider: ollama
  model: nomic-embed-text-v2-moe

High Quality: bge-m3

bash

ollama pull bge-m3

Property	Value
Dimensions	1024
Size	~1.2 GB
Speed	Slower
Quality	Very high
Language	Multilingual

Best for large, complex codebases where accuracy is critical.

Configuration:

yaml

embedder:
  provider: ollama
  model: bge-m3
  dimensions: 1024

Maximum Quality: mxbai-embed-large

bash

ollama pull mxbai-embed-large

Property	Value
Dimensions	1024
Size	~670 MB
Speed	Medium
Quality	Highest
Language	English

Configuration:

yaml

embedder:
  provider: ollama
  model: mxbai-embed-large
  dimensions: 1024

Model Comparison

Model	Dims	Size	Speed	Quality	Use Case
`nomic-embed-text`	768	274MB	⚡⚡⚡	⭐⭐⭐	General use
`nomic-embed-text-v2-moe`	768	500MB	⚡⚡	⭐⭐⭐⭐	Multilingual
`bge-m3`	1024	1.2GB	⚡	⭐⭐⭐⭐⭐	Large codebases
`mxbai-embed-large`	1024	670MB	⚡⚡	⭐⭐⭐⭐⭐	Maximum accuracy

Performance Optimization

Memory Management

Models load into RAM. Ensure sufficient memory:

Model	RAM Required
`nomic-embed-text`	~500 MB
`nomic-embed-text-v2-moe`	~800 MB
`bge-m3`	~1.5 GB
`mxbai-embed-large`	~1 GB

GPU Acceleration

Ollama automatically uses:

macOS: Metal (Apple Silicon)
Linux/Windows: CUDA (NVIDIA GPUs)

Check GPU usage:

bash

ollama ps

Keeping Model Loaded

By default, Ollama unloads models after 5 minutes of inactivity. Keep loaded:

bash

# Keep model loaded indefinitely
curl http://localhost:11434/api/generate -d '{
  "model": "nomic-embed-text",
  "keep_alive": -1
}'

Verifying Connection

Check Ollama is Running

bash

curl http://localhost:11434/api/tags

List Available Models

bash

ollama list

Test Embedding

bash

curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "function authenticate(user, password)"
}'

Running Ollama as a Service

macOS (launchd)

Ollama app runs automatically on login.

Linux (systemd)

bash

# Enable service
sudo systemctl enable ollama

# Start service
sudo systemctl start ollama

# Check status
sudo systemctl status ollama

Manual Background

bash

nohup ollama serve > /dev/null 2>&1 &

Remote Ollama Server

Run Ollama on a powerful server and connect remotely:

On the Server

bash

# Allow remote connections
OLLAMA_HOST=0.0.0.0 ollama serve

On the Client

yaml

# .grepai/config.yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://server-ip:11434

Common Issues

❌ Problem: Connection refused ✅ Solution:

bash

# Start Ollama
ollama serve

❌ Problem: Model not found ✅ Solution:

bash

# Pull the model
ollama pull nomic-embed-text

❌ Problem: Slow embedding generation ✅ Solutions:

Use a smaller model (
```
nomic-embed-text
```
)
Ensure GPU is being used (
```
ollama ps
```
)
Close memory-intensive applications
Consider a remote server with better hardware

❌ Problem: Out of memory ✅ Solutions:

Use a smaller model
Close other applications
Upgrade RAM
Use remote Ollama server

❌ Problem: Embeddings differ after model update ✅ Solution: Re-index after model updates:

bash

rm .grepai/index.gob
grepai watch

Best Practices

Start with
nomic-embed-text
: Best balance of speed/quality
Keep Ollama running: Background service recommended
Match dimensions: Don't mix models with different dimensions
Re-index on model change: Delete index and re-run watch
Monitor memory: Embedding models use significant RAM

Output Format

Successful Ollama configuration:

✅ Ollama Embedding Provider Configured

   Provider: Ollama
   Model: nomic-embed-text
   Endpoint: http://localhost:11434
   Dimensions: 768 (auto-detected)
   Status: Connected

   Model Info:
   - Size: 274 MB
   - Loaded: Yes
   - GPU: Apple Metal

grepai-embeddings-ollama

NPX Install

Tags

SKILL.md Content

GrepAI Embeddings with Ollama

When to Use This Skill

Why Ollama?

Prerequisites

Configuration

Basic Configuration

With Custom Endpoint

With Explicit Dimensions

Available Models

Recommended: nomic-embed-text

Multilingual: nomic-embed-text-v2-moe

High Quality: bge-m3

Maximum Quality: mxbai-embed-large

Model Comparison

Performance Optimization

Memory Management

GPU Acceleration

Keeping Model Loaded

Verifying Connection

Check Ollama is Running

List Available Models

Test Embedding

Running Ollama as a Service

macOS (launchd)

Linux (systemd)

Manual Background

Remote Ollama Server

On the Server

On the Client

Common Issues

Best Practices

Output Format