← Back to Concepts

Code Graph

The structural backbone of SourcePrep.

Vector search is great for "fuzzy" questions ("how does auth work?"), but terrible at precision ("where is the User struct defined and what calls it?").

To solve this, SourcePrep maintains a parallel Code Graph — a directed graph of your codebase's structure.

Rust Engine

The Code Graph is built by a high-performance Rust engine (`prep-engine`) that runs alongside the Python daemon.

  • Speed: Parses ~50k files in seconds.
  • Accuracy: Uses Tree-sitter to generate concrete syntax trees (CSTs) for accurate symbol extraction.
  • Multi-language: Supports Python, TypeScript, JavaScript, Go, Rust, Java, C, and C++.

The Graph

The index maps three types of relationships:

Definitions
"Where is X declared?"
References
"Where is X used?"
Imports
"What does file A depend on?"

Visualizing the Graph

The Code Graph panel in the dashboard provides an interactive way to explore these relationships.

  • Interactive Map: Visualize your project's structure as a network of nodes (files/symbols) and edges (imports/calls).
  • Neighborhood View: Click any file to see its immediate dependencies (upstream) and consumers (downstream).
  • List View: Toggle to a detailed list to see exact import counts and symbol references.
Loading component preview…

Live preview: An interactive visualization of file-level dependencies and import relationships.

Usage

You generally don't query the code graph directly. Instead, you enable Graph Expansionin your context request (or use the "Trace" keywords in your MCP editor).

When enabled, SourcePrep:

  1. Finds the primary chunks via vector search.
  2. Identifies key symbols in those chunks.
  3. Queries the Code Graph for their definition sites and usages.
  4. "Expands" the context to include those related files, even if they didn't match the search keywords.

Example: You ask "How is billing calculated?".
Vector search finds billing.py.
The Code Graph notices billing.py imports tax_rates.py.
SourcePrep includes tax_rates.py in the context automatically, preventing the AI from hallucinating tax logic.

The Knowledge Pipeline

The Code Graph is not a static artifact. It is the backbone of a living Knowledge Pipeline that keeps learning about your code — turning raw text into navigable knowledge your AI can actually reason over.

Pipeline Stages
1. Structural Trace (Rust)
High-speed parsing of file structure to build the initial skeleton.
2. Vector Indexing
Generating search embeddings for source code chunks (Searchability).
3. Fast Catalogue (3b)
Lightweight tagging and classification of symbols.
4. Relationship Validation (Rust)
Verifying imports and call graph edges against the filesystem.
5. Deep Reasoning (14b)
Epistemic enrichment. Deep analysis to add domain tags, architecture layers, and an understanding score (0.0–1.0).
6. Module Synthesis
Cluster synthesis. Grouping related files into functional subsystem modules with navigable summaries.
7. Codebase Atlas
Pre-retrieval routing index built from synthesized modules.
8. Continuous Deepening
Convergence loop. Re-enriches nodes with decayed understanding scores until the graph stabilizes.
9. Deep Knowledge Embedding
Final deep-storage of synthesized knowledge and enriched connections.

This pipeline ensures that SourcePrep understands not just where code is (Structure), but what it does (Enrichment) and how it relates conceptually (Embeddings).

Learn more about Graph Enrichment →