Code Graph
The structural backbone of SourcePrep.
Vector search is great for "fuzzy" questions ("how does auth work?"), but terrible at precision ("where is the User struct defined and what calls it?").
To solve this, SourcePrep maintains a parallel Code Graph — a directed graph of your codebase's structure.
Rust Engine
The Code Graph is built by a high-performance Rust engine (`prep-engine`) that runs alongside the Python daemon.
- Speed: Parses ~50k files in seconds.
- Accuracy: Uses Tree-sitter to generate concrete syntax trees (CSTs) for accurate symbol extraction.
- Multi-language: Supports Python, TypeScript, JavaScript, Go, Rust, Java, C, and C++.
The Graph
The index maps three types of relationships:
Visualizing the Graph
The Code Graph panel in the dashboard provides an interactive way to explore these relationships.
- Interactive Map: Visualize your project's structure as a network of nodes (files/symbols) and edges (imports/calls).
- Neighborhood View: Click any file to see its immediate dependencies (upstream) and consumers (downstream).
- List View: Toggle to a detailed list to see exact import counts and symbol references.
Usage
You generally don't query the code graph directly. Instead, you enable Graph Expansionin your context request (or use the "Trace" keywords in your MCP editor).
When enabled, SourcePrep:
- Finds the primary chunks via vector search.
- Identifies key symbols in those chunks.
- Queries the Code Graph for their definition sites and usages.
- "Expands" the context to include those related files, even if they didn't match the search keywords.
Example: You ask "How is billing calculated?".
Vector search finds billing.py.
The Code Graph notices billing.py imports tax_rates.py.
SourcePrep includes tax_rates.py in the context automatically, preventing the AI from hallucinating tax logic.
The Knowledge Pipeline
The Code Graph is not a static artifact. It is the backbone of a living Knowledge Pipeline that keeps learning about your code — turning raw text into navigable knowledge your AI can actually reason over.
This pipeline ensures that SourcePrep understands not just where code is (Structure), but what it does (Enrichment) and how it relates conceptually (Embeddings).
