← Back to Guides

Codebase Audit

Autonomous codebase health analysis powered by the trace graph. Get architecture reports, gap analysis, tech debt summaries, and actionable recommendations — all generated from your existing enrichment data.

Overview

AutoAudit analyzes your trace graph (nodes, edges, augmentations, epistemic enrichment, modules, and atlas) to produce structured findings and optional LLM-generated reports. It runs as an independent tool — not a pipeline stage — so you trigger it when you want insights, not on every file change.

AutoAudit V2 transforms SourcePrep from a "passive observer" into an "active taskmaster". Findings are categorized into flat tabs (Architecture, Quality, Coverage, Tech Debt), prioritized, and include concrete actionable items. You can select findings and click "Copy AI Command" to instantly hand off the context assembly to your AI via MCP.

Loading component preview…

Live preview: AutoAudit findings organized by severity and category with AI handoff.

Two Tiers

Tier 1 — Analyzers

Pure graph queries. No LLM needed. Runs in <2 seconds. Produces structured findings (JSON).

Tier 2 — Synthesis

LLM-generated markdown reports. Uses your configured large model. Produces 5 documents.

Quick Start

CLI

# Run Tier 1 analyzers only (fast, no LLM)
prep audit

# Run Tier 1 + Tier 2 synthesis (generates markdown reports)
prep audit --synthesize

# Filter to a specific category
prep audit --category architecture

MCP Tools

Four MCP tools are available in Cursor, Windsurf, and any MCP-compatible editor:

  • prep_audit — Run or retrieve audit findings. Returns severity counts and top findings. Set synthesize: true to also generate full reports.
  • prep_audit_report — Read a specific generated report by name. Available reports: AUDIT_SUMMARY, ARCHITECTURE_ANALYSIS, GAP_ANALYSIS, COMPONENT_INVENTORY, TECH_DEBT_REPORT.
  • prep_audit_refactor — (V2) Context Assembly Handoff. Takes an array of finding_ids (e.g. ["ARCH-1", "QUAL-2"]) and returns the full structural trace graph context for all affected files, priming the AI for an immediate refactor.
  • prep_audit_check — (V2) Validation. Takes an array of analyzers to re-run locally to verify that recent code changes actually resolved the finding.

REST API

MethodEndpointPurpose
POST/projects/{id}/auditTrigger audit (Tier 1 + optional Tier 2)
GET/projects/{id}/audit/statusCheck progress and last run metadata
GET/projects/{id}/audit/findingsRaw structured findings (filterable)
GET/projects/{id}/audit/reportsList generated report documents
GET/projects/{id}/audit/report/{name}Read a specific report

Built-in Analyzers

AutoAudit ships with 11 analyzers. Each reads the trace graph and produces structured findings — no LLM, no side effects.

AnalyzerCategoryWhat It Finds
large_filessizeFiles over configurable line thresholds
circular_depsarchitectureImport cycles (Tarjan's SCC algorithm)
misplaced_importsarchitectureCross-module dependency bottlenecks
hub_bottlenecksarchitectureFiles with disproportionate fan-in (z-score outliers)
dead_codequalityFiles with zero importers that aren't entry points
duplicate_logicqualityFiles with suspiciously similar summaries (Jaccard)
tech_debtqualityAggregated tech debt from epistemic enrichment
stalenessqualityEnrichments that are out of date
test_coveragetestingSource files with no associated test file
naming_consistencynamingLanguage-specific naming convention violations
api_surfacecoveragePublic symbols missing docstrings

Generated Reports

When you run with --synthesize (or synthesize: true in the API), an LLM generates 5 markdown documents from the findings:

  • AUDIT_SUMMARY — Health grade (A–F), critical findings, top recommendations
  • ARCHITECTURE_ANALYSIS — Module dependency flow, bottlenecks, boundary violations
  • GAP_ANALYSIS — Misplaced concerns, duplicated logic, missing abstractions
  • COMPONENT_INVENTORY — Every file with purpose, module, summary, in-degree
  • TECH_DEBT_REPORT — Debt items by module with remediation roadmap

Output Location

All audit output is stored inside the project's index directory:

# Standalone mode (default):
~/.local/share/sourceprep/projects/{project-id}/audit/
  ├── findings.json            # Raw structured findings
  ├── audit_manifest.json      # Run metadata (timestamps, counts)
  ├── AUDIT_SUMMARY.md         # LLM-generated (if synthesized)
  ├── ARCHITECTURE_ANALYSIS.md
  ├── GAP_ANALYSIS.md
  ├── COMPONENT_INVENTORY.md
  └── TECH_DEBT_REPORT.md

# Embedded mode:
/path/to/project/.sourceprep/audit/
  └── (same files)

These files are also indexed by SourcePrep's search engine, so you can query them via prep_search (e.g., "what tech debt exists in the auth module?").

Settings

Audit behavior is configurable via the dashboard Settings panel or the audit_config section in ui_config.json:

SettingDefaultDescription
auto_run_after_deepfalseAuto-run Tier 1 after deep enrichment completes
auto_synthesizefalseAlso generate LLM reports when auto-running
large_file_threshold_bytes80,000File size for "critical" severity (~2000 lines)
large_file_warning_bytes40,000File size for "warning" severity (~1000 lines)
hub_z_threshold2.0Z-score for hub bottleneck detection
similarity_threshold0.65Jaccard threshold for duplicate logic detection
Loading component preview…

Live preview: Opportunities panel consolidating all improvement items with filters and Pi Agent status.

Pipeline Connection

AutoAudit is not a pipeline stage. It's an independent tool that reads the same data the pipeline writes. The connection:

  1. The enrichment pipeline runs to completion and produces trace_nodes, trace_augmented, trace_epistemic, trace_modules, and atlas.json.
  2. You run prep audit when you want insights. The audit reads all that data and produces findings + reports.
  3. If auto_run_after_deep is enabled, Tier 1 analyzers run automatically when deep enrichment completes.
  4. Audit reports are indexed by SourcePrep's search engine and served via MCP, so your AI tools can access them.