← Back to Guides

Model Setup Advisor

Get personalized model recommendations based on your hardware and preferences.

How do you want to run SourcePrep?

Your GPU

Usable: 12GB of 16GBPeak model: 10GB
FastA few minutes per stage (~25-45 tok/s)

Recommended Configuration

Fast ModelLocal
qwen3:4b

3GB VRAM

Best small model. Rivals 72B models at this size.

Thinking ModelLocal
qwen3:14b

10GB VRAM

Strong reasoning. Best mid-range option.

Code ModelLocal
qwen2.5-coder:7b

5GB VRAM

Specialized for code. Good at edge detection.

Peak VRAM: 10GB (models swap, not cumulative)

A few minutes per stage (~25-45 tok/s)

Quick Setup

$ ollama pull qwen3:4b
$ ollama pull qwen3:14b
$ ollama pull qwen2.5-coder:7b

Reference: All Recommended Models

Local (Ollama)

ModelSlotSizeVRAMNotes
qwen3:4bfast2.5GB~3GBBest small model. Rivals 72B models at this size.
qwen3:1.7bfast1.4GB~2GBUltra-light. Decent JSON output.
qwen3:0.6bfast0.5GB~1GBSmallest option. May produce unreliable JSON.
qwen3:30bthinking19GB~20GBMoE — 30B total, only 3B active. Outstanding reasoning.
qwen3:14bthinking9.3GB~10GBStrong reasoning. Best mid-range option.
qwen3:8bthinking5.2GB~6GBGood reasoning. Default recommendation.
qwen3:4bthinking2.5GB~3GBUsable for thinking when VRAM is very limited.
qwen3-coder:30bcode19GB~20GBMoE — 30B total, 3.3B active. Best code model on Ollama.
qwen2.5-coder:7bcode4.7GB~5GBSpecialized for code. Good at edge detection.

Cloud (BYOK)

ModelProviderInput/1MOutput/1MBatch
gpt-4.1-nanoOpenAI$0.10$0.40Standard
gpt-4.1-miniOpenAI$0.40$1.60Standard
gemini-2.5-flash-liteGoogle (Gemini)$0.07$0.30Compact
gemini-2.5-flashGoogle (Gemini)$0.15$0.60Compact
claude-haiku-3.5Anthropic (Claude)$0.80$4.00Compact
claude-sonnet-4.5Anthropic (Claude)$3.00$15.00Large

VRAM Guide

Usable VRAMFastThinkingCode
≤ 2GBqwen3:0.6b (single model for all)
2–4GBqwen3:1.7b or qwen3:4b (single model for all)
5–6GBqwen3:4bqwen3:4b(uses Fast)
6–12GBqwen3:4bqwen3:8b(uses Fast)
12–20GBqwen3:4bqwen3:14bqwen2.5-coder:7b
20GB+qwen3:4bqwen3:30b (MoE)qwen3-coder:30b (MoE)

Why Ollama models swap, not stack

SourcePrep's pipeline runs one stage at a time. Ollama automatically loads and unloads models as needed, so you only need enough VRAM for the single largest model — not the total of all models combined.