Model Setup Advisor

Get personalized model recommendations based on your hardware and preferences.

How do you want to run SourcePrep?

Your GPU

Platform

GPU Model

Usable: 12GB of 16GBPeak model: 10GB

●Fast — A few minutes per stage (~25-45 tok/s)

Recommended Configuration

Fast ModelLocal

qwen3:4b

3GB VRAM

Best small model. Rivals 72B models at this size.

Thinking ModelLocal

qwen3:14b

10GB VRAM

Strong reasoning. Best mid-range option.

Code ModelLocal

qwen2.5-coder:7b

5GB VRAM

Specialized for code. Good at edge detection.

Peak VRAM: 10GB (models swap, not cumulative)

A few minutes per stage (~25-45 tok/s)

Quick Setup

$ ollama pull qwen3:4b

$ ollama pull qwen3:14b

$ ollama pull qwen2.5-coder:7b

Reference: All Recommended Models

Local (Ollama)

Model	Slot	Size	VRAM	Notes
qwen3:4b	fast	2.5GB	~3GB	Best small model. Rivals 72B models at this size.
qwen3:1.7b	fast	1.4GB	~2GB	Ultra-light. Decent JSON output.
qwen3:0.6b	fast	0.5GB	~1GB	Smallest option. May produce unreliable JSON.
qwen3:30b	thinking	19GB	~20GB	MoE — 30B total, only 3B active. Outstanding reasoning.
qwen3:14b	thinking	9.3GB	~10GB	Strong reasoning. Best mid-range option.
qwen3:8b	thinking	5.2GB	~6GB	Good reasoning. Default recommendation.
qwen3:4b	thinking	2.5GB	~3GB	Usable for thinking when VRAM is very limited.
qwen3-coder:30b	code	19GB	~20GB	MoE — 30B total, 3.3B active. Best code model on Ollama.
qwen2.5-coder:7b	code	4.7GB	~5GB	Specialized for code. Good at edge detection.

Cloud (BYOK)

Model	Provider	Input/1M	Output/1M	Batch
gpt-4.1-nano	OpenAI	$0.10	$0.40	Standard
gpt-4.1-mini	OpenAI	$0.40	$1.60	Standard
gemini-2.5-flash-lite	Google (Gemini)	$0.07	$0.30	Compact
gemini-2.5-flash	Google (Gemini)	$0.15	$0.60	Compact
claude-haiku-3.5	Anthropic (Claude)	$0.80	$4.00	Compact
claude-sonnet-4.5	Anthropic (Claude)	$3.00	$15.00	Large

VRAM Guide

Usable VRAM	Fast	Thinking	Code
≤ 2GB	qwen3:0.6b (single model for all)
2–4GB	qwen3:1.7b or qwen3:4b (single model for all)
5–6GB	qwen3:4b	qwen3:4b	(uses Fast)
6–12GB	qwen3:4b	qwen3:8b	(uses Fast)
12–20GB	qwen3:4b	qwen3:14b	qwen2.5-coder:7b
20GB+	qwen3:4b	qwen3:30b (MoE)	qwen3-coder:30b (MoE)

Why Ollama models swap, not stack

SourcePrep's pipeline runs one stage at a time. Ollama automatically loads and unloads models as needed, so you only need enough VRAM for the single largest model — not the total of all models combined.