← Back to Guides
Model Setup Advisor
Get personalized model recommendations based on your hardware and preferences.
How do you want to run SourcePrep?
Your GPU
Usable: 12GB of 16GBPeak model: 10GB
●Fast — A few minutes per stage (~25-45 tok/s)
Recommended Configuration
Fast ModelLocal
qwen3:4b3GB VRAM
Best small model. Rivals 72B models at this size.
Thinking ModelLocal
qwen3:14b10GB VRAM
Strong reasoning. Best mid-range option.
Code ModelLocal
qwen2.5-coder:7b5GB VRAM
Specialized for code. Good at edge detection.
Peak VRAM: 10GB (models swap, not cumulative)
A few minutes per stage (~25-45 tok/s)
Quick Setup
$ ollama pull qwen3:4b
$ ollama pull qwen3:14b
$ ollama pull qwen2.5-coder:7b
Reference: All Recommended Models
Local (Ollama)
| Model | Slot | Size | VRAM | Notes |
|---|---|---|---|---|
| qwen3:4b | fast | 2.5GB | ~3GB | Best small model. Rivals 72B models at this size. |
| qwen3:1.7b | fast | 1.4GB | ~2GB | Ultra-light. Decent JSON output. |
| qwen3:0.6b | fast | 0.5GB | ~1GB | Smallest option. May produce unreliable JSON. |
| qwen3:30b | thinking | 19GB | ~20GB | MoE — 30B total, only 3B active. Outstanding reasoning. |
| qwen3:14b | thinking | 9.3GB | ~10GB | Strong reasoning. Best mid-range option. |
| qwen3:8b | thinking | 5.2GB | ~6GB | Good reasoning. Default recommendation. |
| qwen3:4b | thinking | 2.5GB | ~3GB | Usable for thinking when VRAM is very limited. |
| qwen3-coder:30b | code | 19GB | ~20GB | MoE — 30B total, 3.3B active. Best code model on Ollama. |
| qwen2.5-coder:7b | code | 4.7GB | ~5GB | Specialized for code. Good at edge detection. |
Cloud (BYOK)
| Model | Provider | Input/1M | Output/1M | Batch |
|---|---|---|---|---|
| gpt-4.1-nano | OpenAI | $0.10 | $0.40 | Standard |
| gpt-4.1-mini | OpenAI | $0.40 | $1.60 | Standard |
| gemini-2.5-flash-lite | Google (Gemini) | $0.07 | $0.30 | Compact |
| gemini-2.5-flash | Google (Gemini) | $0.15 | $0.60 | Compact |
| claude-haiku-3.5 | Anthropic (Claude) | $0.80 | $4.00 | Compact |
| claude-sonnet-4.5 | Anthropic (Claude) | $3.00 | $15.00 | Large |
VRAM Guide
| Usable VRAM | Fast | Thinking | Code |
|---|---|---|---|
| ≤ 2GB | qwen3:0.6b (single model for all) | ||
| 2–4GB | qwen3:1.7b or qwen3:4b (single model for all) | ||
| 5–6GB | qwen3:4b | qwen3:4b | (uses Fast) |
| 6–12GB | qwen3:4b | qwen3:8b | (uses Fast) |
| 12–20GB | qwen3:4b | qwen3:14b | qwen2.5-coder:7b |
| 20GB+ | qwen3:4b | qwen3:30b (MoE) | qwen3-coder:30b (MoE) |
Why Ollama models swap, not stack
SourcePrep's pipeline runs one stage at a time. Ollama automatically loads and unloads models as needed, so you only need enough VRAM for the single largest model — not the total of all models combined.
