Targeted RAG

Knowledge

Pick the files your agent should know about. SourcePrep embeds them, learns their structure, and serves a focused context window — not a dump of your entire repo.

Watching your knowledge

The Knowledge Status panel is where you confirm your knowledge is ready and watch it stay in sync as you work. It's the first panel to add to your dashboard — without it built, nothing else has anything to serve.

Loading component preview…

The Knowledge Status panel — chunk count, embedding model, last build, and live freshness.

The status badge tells you whether you can trust what the agent is reading right now:

  • Fresh — every chunk on disk is in the index.
  • Stale — files changed; the next save reindexes them (usually under a second).
  • Building — the background worker is processing files now.

Set the scope first

SourcePrep doesn't need your whole repo. It needs the slice of your repo that's relevant to what you're working on right now. The Knowledge Scope panel (file tree) is where you mark that slice — folders and files toggle in and out with a click.

Loading component preview…

The Knowledge Scope panel — the file tree where you pick what your agent will know about.

By default SourcePrep respects your .gitignore and drops standard noise directories so you don't have to spell them out:

Excluded by default

node_modules/ · dist/ · build/ · .git/ · .next/ · target/ · venv/ · __pycache__/

See Knowledge Scope for the full include/exclude pattern syntax and named scope presets.

Weighted retrieval

Within the scope you picked, not all folders deserve equal weight at query time. Path weights are query-time multipliers — boost the folders you want the agent to lean into, suppress the ones that are technically in scope but you don't want crowding results. No rebuild required.

Range 0.0 – 2.0

1.0 = neutral. 1.5 boosts a folder 50% at ranking time. 0.0 effectively hides it. Most-specific path wins.

Full details in the Path Weights guide.

Best practices

A tighter scope almost always beats a bigger one at the same token budget. The agent isn't fighting noise to find what matters, and retrieval is faster too. Scope to what you're actually working on; widen only when you need to.

How big should your scope be?

ScopeWhat it gets you
2–5 filesJust paste it. Below this an index is overkill.
5–15 filesSweet spot for focused work. Retrieval is essentially deterministic — the agent sees what you scoped.
15–50 filesSelective retrieval, still excellent for focused tasks.
50–200 filesWorks fine; start using path weights to suppress noise.
200+ filesWorks for broad refactors; lean on path weights or sub-scopes.

When each scope size fits

Working on a feature

Scope to that feature's folder plus its immediate dependencies. Two or three directories is often enough.

Refactor pass

Broader scope is fine — but lean on path weights to suppress test fixtures, vendored code, and anything you don't want the agent to mirror.

Quick Q&A

Narrowest scope wins. If you know which file the answer lives in, scope to just that directory. The agent will be ready in seconds.

Under the hood

Files in scope are parsed with Tree-sitter, chunked at function/class boundaries (or by Markdown headers for docs), embedded with the built-in ONNX CPU embedder (or your local Ollama if you prefer), and stored alongside your project metadata. A file watcher catches edits as you save and re-embeds just the changed file — usually under 200 ms.

15+ languages, CPU-only by default

Tree-sitter handles parsing for Python, TypeScript, JavaScript, Rust, Go, Java, and a dozen more. ONNX embeddings run on your CPU at acceptable speed; nothing about indexing requires a GPU.

For embedder options, see Embedding Models. Everything stays on your machine — there's no cloud component to indexing.

Initialising a project

Add a project via the dashboard + button (or prep add /path/to/project) and the first build kicks off automatically. You can watch it complete in the Knowledge Status panel above. The file watcher handles every change afterwards; you almost never need to trigger a rebuild by hand.

For the rare cases the watcher misses — a thousand-file branch switch, an exclusion-pattern change, or a search that feels suspiciously off — there's a manual Rebuild Knowledge control available in the dashboard's developer-mode panels.