Vector Indexing
How SourcePrep finds the right code when you describe what you want instead of naming it exactly.
The Indexing Process
Vector Indexing is what lets SourcePrep answer "fuzzy" questions — the kind where you describe intent instead of typing exact names. The Knowledge Pipeline's structural map gives you the skeleton of your code; vector indexing adds the muscle, so searching for "how authentication works" can surface the right files even if none of them contain the word "authentication" verbatim.
Unlike cloud-based tools, this happens entirely on your localhost.
The prep-walker crate (Rust) scans your directory, respecting.gitignore and user-defined exclusions. It computes BLAKE3 hashes for change detection.
Files are parsed using Tree-sitter. Code is split into logical chunks (functions, classes) rather than arbitrary text windows. Markdown docs are split by headers.
Chunks are passed to the Native Embedder (ONNX/nomic-embed-text) or an optional Ollama instance. This converts text into 768-dimensional vectors.
Vectors and metadata are stored in a local LanceDB instance (or Qdrant/Chroma if configured). The raw text is never sent to the cloud.
Incremental Updates
SourcePrep includes a real-time file watcher (watchdog). When you save a file:
- The watcher detects the
modifyevent. - It debounces rapid changes (e.g. typing).
- It re-hashes the file content.
- If the hash changed, only that file is re-parsed and re-embedded.
This typically takes <200ms, ensuring your AI always sees the current state of your code.
Exclusions
You can control what gets indexed via the Dashboard or .sourceprep/ignore. Common patterns like node_modules/, dist/, and .git/are ignored by default.
Dashboard Controls
The Knowledge Base column in the dashboard gives you visibility and control over this process.
Index Status Card
The top card provides a real-time health check. Watch for the status badge in the top right:
- Fresh: The index is perfectly synced with your disk.
- Stale: Files have changed, but the index hasn't updated yet (usually brief during debounce).
- Building: The background worker is actively processing files.
It also breaks down the index composition:
- Code: Source files (parsed into AST chunks).
- Instructions: Markdown, text, and documentation files (parsed by headers).
- Graph: Structural nodes (symbols, imports) used for graph traversal.
Manual Rebuild
While the watcher handles 99% of changes, you might need the Build Index card for:
- Branch Switching: If you switch git branches and thousands of files change instantly, a manual rebuild ensures everything is caught.
- Config Changes: If you change
path_weightsor exclusion patterns, a rebuild applies them to the entire codebase. - Troubleshooting: If search feels "off", a full rebuild (using the
--fullflag via CLI or the dashboard button) clears the vector store and starts fresh.
