Smart Context Compression
SourcePrep uses structural compression (LOD) to deliver code context at variable fidelity. Top results stay at full source, mid-relevance files show signatures and docstrings, peripheral files show just names and imports — achieving 3–20× compression with zero dependencies. The compression level adapts automatically to your AI tool's context window tier.
How it works
LOD (Levels of Detail) uses SourcePrep's trace graph to understand your code's structure — functions, classes, imports, docstrings — and extracts at variable fidelity based on relevance score. No model inference, no GPU, no dependencies.
- Score-aware — high-relevance code stays full, low-relevance compresses more
- Tier-adaptive — Claude/Gemini (50K budget) gets more full-source files; local models (20K) get tighter compression
- Code-aware — understands functions, classes, imports (not just token probabilities)
- Instant — <10ms per file (no model inference)
- Free tier — available on all tiers including Free
Levels of Detail
| LOD | What's kept | Measured ratio | When used |
|---|---|---|---|
| 0 | Full source | 1:1 | Top results (score ≥ threshold) |
| 1 | Source minus comments | 1.1–1.3× | Tier 1 neighbours |
| 2 | Signatures + docstrings + ... | 1.3–2.6× | Mid-relevance results, Tier 2 neighbours |
| 3 | Class skeletons only | ~2.6× | Class-heavy files |
| 4 | Imports + symbol names | 8–14× | Low-relevance or trace-expanded neighbours |
| 5 | File path + summary + exports | 50–140× | Peripheral files (score < 0.20) |
Ratios measured on real SourcePrep source files (7K–18K chars). Files with more code inside functions achieve higher ratios; files with heavy module-level constants compress less at LOD 2.
Tier-Adaptive Compression
SourcePrep detects your AI tool from the MCP handshake and adapts compression to match the context window. Larger windows get more full-source files; smaller windows get tighter compression so the same structural information fits.
| Tier | Clients | Budget | Hub files | Neighbour LOD |
|---|---|---|---|---|
| Tier 1 | Claude Code, Gemini CLI | 50K chars | 10 at full source | LOD 1 (source minus comments) |
| Tier 2 | Cursor, Windsurf, Copilot | 24–30K chars | 6 at full source | LOD 2 (signatures) |
| Tier 2.5 | Cline, Roo, Continue | 20K chars | 4 at LOD 2 | LOD 4 (names + imports) |
Tier detection is automatic — no configuration needed. The first prep call in each session also gets a 50% orientation boost for richer initial context.
Compression flow
The extractor uses SourcePrep's pre-computed trace graph (symbol spans, class hierarchy, import edges) to know exactly where functions start and end — no re-parsing needed at query time.
Usage
Via MCP (automatic)
LOD compression is always active for MCP tool calls. When you call prep_search, results are automatically LOD-compressed based on your client tier. No configuration needed.
Via Dashboard
In the Context Options panel, select LOD (Structural) from the Compression dropdown. Click “Assemble” — each source citation will show an LOD{n} badge and compression ratio.
Via API
Response metadata
When LOD compression is active, the response includes a compression object and per-chunk lod / compression_ratio fields:
{
"context": "[src/auth.py | lod=2]\ndef login(...):\n ...",
"chunks": [
{
"source_path": "src/auth.py",
"score": 0.47,
"lod": 2,
"compression_ratio": 2.6
}
],
"compression": {
"enabled": true,
"mode": "lod",
"input_chars": 8200,
"output_chars": 1840,
"lod_distribution": { "0": 1, "2": 3, "4": 2 }
}
}Supported languages
LOD extraction supports signature detection and import recognition for:
For unsupported languages, files fall back to LOD 0 (full source) gracefully.
Fallback behaviour
Compression is best-effort. If the trace graph doesn't have symbol data for a file (e.g., the file was added after the last build), SourcePrep falls back to LOD 0 and returns the full source. The fallback field in the chunk metadata will be true.
Coming Soon: Language Compression for Documentation
A future Pro feature will add language-aware compression for markdown and documentation files using a lightweight BERT model. This will complement LOD's structural compression for code with token-level compression optimised for natural language.
