← Guides

Smart Context Compression

SourcePrep uses structural compression (LOD) to deliver code context at variable fidelity. Top results stay at full source, mid-relevance files show signatures and docstrings, peripheral files show just names and imports — achieving 3–20× compression with zero dependencies. The compression level adapts automatically to your AI tool's context window tier.

How it works

LOD (Levels of Detail) uses SourcePrep's trace graph to understand your code's structure — functions, classes, imports, docstrings — and extracts at variable fidelity based on relevance score. No model inference, no GPU, no dependencies.

  • Score-aware — high-relevance code stays full, low-relevance compresses more
  • Tier-adaptive — Claude/Gemini (50K budget) gets more full-source files; local models (20K) get tighter compression
  • Code-aware — understands functions, classes, imports (not just token probabilities)
  • Instant — <10ms per file (no model inference)
  • Free tier — available on all tiers including Free

Levels of Detail

LODWhat's keptMeasured ratioWhen used
0Full source1:1Top results (score ≥ threshold)
1Source minus comments1.1–1.3×Tier 1 neighbours
2Signatures + docstrings + ...1.3–2.6×Mid-relevance results, Tier 2 neighbours
3Class skeletons only~2.6×Class-heavy files
4Imports + symbol names8–14×Low-relevance or trace-expanded neighbours
5File path + summary + exports50–140×Peripheral files (score < 0.20)

Ratios measured on real SourcePrep source files (7K–18K chars). Files with more code inside functions achieve higher ratios; files with heavy module-level constants compress less at LOD 2.

Tier-Adaptive Compression

SourcePrep detects your AI tool from the MCP handshake and adapts compression to match the context window. Larger windows get more full-source files; smaller windows get tighter compression so the same structural information fits.

TierClientsBudgetHub filesNeighbour LOD
Tier 1Claude Code, Gemini CLI50K chars10 at full sourceLOD 1 (source minus comments)
Tier 2Cursor, Windsurf, Copilot24–30K chars6 at full sourceLOD 2 (signatures)
Tier 2.5Cline, Roo, Continue20K chars4 at LOD 2LOD 4 (names + imports)

Tier detection is automatic — no configuration needed. The first prep call in each session also gets a 50% orientation boost for richer initial context.

Compression flow

Query arrives
  ↓
Semantic search → scored chunks
  ↓
Detect client tier (1 / 2 / 2.5)
  ↓
assign_lod(score, tier) per file:
  Tier 1: ≥0.40 → LOD 0, ≥0.25 → LOD 2, ≥0.15 → LOD 4
  Tier 2: ≥0.50 → LOD 0, ≥0.35 → LOD 2, ≥0.20 → LOD 4
  Tier 2.5: ≥0.60 → LOD 0, ≥0.40 → LOD 2, ≥0.25 → LOD 4
  ↓
LODExtractor.extract(file, lod, trace_nodes)
  ↓
Compressed context assembled within budget

The extractor uses SourcePrep's pre-computed trace graph (symbol spans, class hierarchy, import edges) to know exactly where functions start and end — no re-parsing needed at query time.

Usage

Via MCP (automatic)

LOD compression is always active for MCP tool calls. When you call prep_search, results are automatically LOD-compressed based on your client tier. No configuration needed.

Via Dashboard

In the Context Options panel, select LOD (Structural) from the Compression dropdown. Click “Assemble” — each source citation will show an LOD{n} badge and compression ratio.

Via API

curl -X POST http://localhost:8400/projects/my-project/context \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How does authentication work?",
    "k": 10,
    "max_chars": 12000,
    "structured": true,
    "compression": "lod"
  }'

Response metadata

When LOD compression is active, the response includes a compression object and per-chunk lod / compression_ratio fields:

{
  "context": "[src/auth.py | lod=2]\ndef login(...):\n    ...",
  "chunks": [
    {
      "source_path": "src/auth.py",
      "score": 0.47,
      "lod": 2,
      "compression_ratio": 2.6
    }
  ],
  "compression": {
    "enabled": true,
    "mode": "lod",
    "input_chars": 8200,
    "output_chars": 1840,
    "lod_distribution": { "0": 1, "2": 3, "4": 2 }
  }
}

Supported languages

LOD extraction supports signature detection and import recognition for:

PythonTypeScriptJavaScriptRustGoJavaKotlinSwiftC#C++CPHPRubyDartScala

For unsupported languages, files fall back to LOD 0 (full source) gracefully.

Fallback behaviour

Compression is best-effort. If the trace graph doesn't have symbol data for a file (e.g., the file was added after the last build), SourcePrep falls back to LOD 0 and returns the full source. The fallback field in the chunk metadata will be true.

Tip: Keep your index up to date (enable the file watcher or rebuild regularly) for the best compression results.

Coming Soon: Language Compression for Documentation

A future Pro feature will add language-aware compression for markdown and documentation files using a lightweight BERT model. This will complement LOD's structural compression for code with token-level compression optimised for natural language.

Roadmap: Language compression is built and available in the settings panel but requires additional setup. Follow the release notes for availability updates.