Targeted RAG
Knowledge
Pick the files your agent should know about. SourcePrep embeds them, learns their structure, and serves a focused context window — not a dump of your entire repo.
Watching your knowledge
The Knowledge Status panel is where you confirm your knowledge is ready and watch it stay in sync as you work. It's the first panel to add to your dashboard — without it built, nothing else has anything to serve.
The status badge tells you whether you can trust what the agent is reading right now:
- Fresh — every chunk on disk is in the index.
- Stale — files changed; the next save reindexes them (usually under a second).
- Building — the background worker is processing files now.
Set the scope first
SourcePrep doesn't need your whole repo. It needs the slice of your repo that's relevant to what you're working on right now. The Knowledge Scope panel (file tree) is where you mark that slice — folders and files toggle in and out with a click.
By default SourcePrep respects your .gitignore and drops standard noise directories so you don't have to spell them out:
Excluded by default
node_modules/ · dist/ · build/ · .git/ · .next/ · target/ · venv/ · __pycache__/
See Knowledge Scope for the full include/exclude pattern syntax and named scope presets.
Weighted retrieval
Within the scope you picked, not all folders deserve equal weight at query time. Path weights are query-time multipliers — boost the folders you want the agent to lean into, suppress the ones that are technically in scope but you don't want crowding results. No rebuild required.
Range 0.0 – 2.0
1.0 = neutral. 1.5 boosts a folder 50% at ranking time. 0.0 effectively hides it. Most-specific path wins.
Full details in the Path Weights guide.
Best practices
A tighter scope almost always beats a bigger one at the same token budget. The agent isn't fighting noise to find what matters, and retrieval is faster too. Scope to what you're actually working on; widen only when you need to.
How big should your scope be?
| Scope | What it gets you |
|---|---|
| 2–5 files | Just paste it. Below this an index is overkill. |
| 5–15 files | Sweet spot for focused work. Retrieval is essentially deterministic — the agent sees what you scoped. |
| 15–50 files | Selective retrieval, still excellent for focused tasks. |
| 50–200 files | Works fine; start using path weights to suppress noise. |
| 200+ files | Works for broad refactors; lean on path weights or sub-scopes. |
When each scope size fits
Working on a feature
Scope to that feature's folder plus its immediate dependencies. Two or three directories is often enough.
Refactor pass
Broader scope is fine — but lean on path weights to suppress test fixtures, vendored code, and anything you don't want the agent to mirror.
Quick Q&A
Narrowest scope wins. If you know which file the answer lives in, scope to just that directory. The agent will be ready in seconds.
Under the hood
Files in scope are parsed with Tree-sitter, chunked at function/class boundaries (or by Markdown headers for docs), embedded with the built-in ONNX CPU embedder (or your local Ollama if you prefer), and stored alongside your project metadata. A file watcher catches edits as you save and re-embeds just the changed file — usually under 200 ms.
15+ languages, CPU-only by default
Tree-sitter handles parsing for Python, TypeScript, JavaScript, Rust, Go, Java, and a dozen more. ONNX embeddings run on your CPU at acceptable speed; nothing about indexing requires a GPU.
For embedder options, see Embedding Models. Everything stays on your machine — there's no cloud component to indexing.
Initialising a project
Add a project via the dashboard + button (or prep add /path/to/project) and the first build kicks off automatically. You can watch it complete in the Knowledge Status panel above. The file watcher handles every change afterwards; you almost never need to trigger a rebuild by hand.
For the rare cases the watcher misses — a thousand-file branch switch, an exclusion-pattern change, or a search that feels suspiciously off — there's a manual Rebuild Knowledge control available in the dashboard's developer-mode panels.
