Codebase Audit
Autonomous codebase health analysis powered by the trace graph. Get architecture reports, gap analysis, tech debt summaries, and actionable recommendations — all generated from your existing enrichment data.
Overview
AutoAudit analyzes your trace graph (nodes, edges, augmentations, epistemic enrichment, modules, and atlas) to produce structured findings and optional LLM-generated reports. It runs as an independent tool — not a pipeline stage — so you trigger it when you want insights, not on every file change.
AutoAudit V2 transforms SourcePrep from a "passive observer" into an "active taskmaster". Findings are categorized into flat tabs (Architecture, Quality, Coverage, Tech Debt), prioritized, and include concrete actionable items. You can select findings and click "Copy AI Command" to instantly hand off the context assembly to your AI via MCP.
Two Tiers
Pure graph queries. No LLM needed. Runs in <2 seconds. Produces structured findings (JSON).
LLM-generated markdown reports. Uses your configured large model. Produces 5 documents.
Quick Start
CLI
# Run Tier 1 analyzers only (fast, no LLM)
prep audit
# Run Tier 1 + Tier 2 synthesis (generates markdown reports)
prep audit --synthesize
# Filter to a specific category
prep audit --category architectureMCP Tools
Four MCP tools are available in Cursor, Windsurf, and any MCP-compatible editor:
prep_audit— Run or retrieve audit findings. Returns severity counts and top findings. Setsynthesize: trueto also generate full reports.prep_audit_report— Read a specific generated report by name. Available reports:AUDIT_SUMMARY,ARCHITECTURE_ANALYSIS,GAP_ANALYSIS,COMPONENT_INVENTORY,TECH_DEBT_REPORT.prep_audit_refactor— (V2) Context Assembly Handoff. Takes an array offinding_ids(e.g.["ARCH-1", "QUAL-2"]) and returns the full structural trace graph context for all affected files, priming the AI for an immediate refactor.prep_audit_check— (V2) Validation. Takes an array ofanalyzersto re-run locally to verify that recent code changes actually resolved the finding.
REST API
| Method | Endpoint | Purpose |
|---|---|---|
| POST | /projects/{id}/audit | Trigger audit (Tier 1 + optional Tier 2) |
| GET | /projects/{id}/audit/status | Check progress and last run metadata |
| GET | /projects/{id}/audit/findings | Raw structured findings (filterable) |
| GET | /projects/{id}/audit/reports | List generated report documents |
| GET | /projects/{id}/audit/report/{name} | Read a specific report |
Built-in Analyzers
AutoAudit ships with 11 analyzers. Each reads the trace graph and produces structured findings — no LLM, no side effects.
| Analyzer | Category | What It Finds |
|---|---|---|
| large_files | size | Files over configurable line thresholds |
| circular_deps | architecture | Import cycles (Tarjan's SCC algorithm) |
| misplaced_imports | architecture | Cross-module dependency bottlenecks |
| hub_bottlenecks | architecture | Files with disproportionate fan-in (z-score outliers) |
| dead_code | quality | Files with zero importers that aren't entry points |
| duplicate_logic | quality | Files with suspiciously similar summaries (Jaccard) |
| tech_debt | quality | Aggregated tech debt from epistemic enrichment |
| staleness | quality | Enrichments that are out of date |
| test_coverage | testing | Source files with no associated test file |
| naming_consistency | naming | Language-specific naming convention violations |
| api_surface | coverage | Public symbols missing docstrings |
Generated Reports
When you run with --synthesize (or synthesize: true in the API), an LLM generates 5 markdown documents from the findings:
- AUDIT_SUMMARY — Health grade (A–F), critical findings, top recommendations
- ARCHITECTURE_ANALYSIS — Module dependency flow, bottlenecks, boundary violations
- GAP_ANALYSIS — Misplaced concerns, duplicated logic, missing abstractions
- COMPONENT_INVENTORY — Every file with purpose, module, summary, in-degree
- TECH_DEBT_REPORT — Debt items by module with remediation roadmap
Output Location
All audit output is stored inside the project's index directory:
# Standalone mode (default):
~/.local/share/sourceprep/projects/{project-id}/audit/
├── findings.json # Raw structured findings
├── audit_manifest.json # Run metadata (timestamps, counts)
├── AUDIT_SUMMARY.md # LLM-generated (if synthesized)
├── ARCHITECTURE_ANALYSIS.md
├── GAP_ANALYSIS.md
├── COMPONENT_INVENTORY.md
└── TECH_DEBT_REPORT.md
# Embedded mode:
/path/to/project/.sourceprep/audit/
└── (same files)These files are also indexed by SourcePrep's search engine, so you can query them via prep_search (e.g., "what tech debt exists in the auth module?").
Settings
Audit behavior is configurable via the dashboard Settings panel or the audit_config section in ui_config.json:
| Setting | Default | Description |
|---|---|---|
| auto_run_after_deep | false | Auto-run Tier 1 after deep enrichment completes |
| auto_synthesize | false | Also generate LLM reports when auto-running |
| large_file_threshold_bytes | 80,000 | File size for "critical" severity (~2000 lines) |
| large_file_warning_bytes | 40,000 | File size for "warning" severity (~1000 lines) |
| hub_z_threshold | 2.0 | Z-score for hub bottleneck detection |
| similarity_threshold | 0.65 | Jaccard threshold for duplicate logic detection |
Pipeline Connection
AutoAudit is not a pipeline stage. It's an independent tool that reads the same data the pipeline writes. The connection:
- The enrichment pipeline runs to completion and produces trace_nodes, trace_augmented, trace_epistemic, trace_modules, and atlas.json.
- You run
prep auditwhen you want insights. The audit reads all that data and produces findings + reports. - If
auto_run_after_deepis enabled, Tier 1 analyzers run automatically when deep enrichment completes. - Audit reports are indexed by SourcePrep's search engine and served via MCP, so your AI tools can access them.
