← Guides

Audit Enrichment

A lint warning in a hub file that 200 other files depend on is not the same as the same warning in a one-off script. SourcePrep takes raw findings from your linter and annotates each one with structural context so you can triage by blast radius, not just by severity.

What it does

Feed prep_audit a list of findings from any tool — ruff, ESLint, semgrep, CodeQL, GitHub Code Scanning — and SourcePrep layers structural information on top of each one:

  • Dependent count — how many other files import the flagged file.
  • Hub statuscritical, high, moderate, or low, derived from the file's position in the graph.
  • Module — which architectural module the file belongs to.
  • Related concepts — business rules or design decisions recorded for that file.
  • Risk score — a 0.0–1.0 number combining severity with structural impact.
  • Recommendation — a short plain-language hint about where this finding sits on the triage spectrum.

Simple format

The simplest way to use this is with a flat list of findings. Each finding needs a file path, a line, a message, a severity, and the tool that produced it.

prep_audit(findings=[
  {
    "file": "src/prep/core/indexer.py",
    "line": 142,
    "message": "Unused import: pathlib.PurePath",
    "severity": "warning",
    "tool": "ruff"
  }
])

SourcePrep returns each finding with a prep object attached:

{
  "file": "src/prep/core/indexer.py",
  "line": 142,
  "message": "Unused import: pathlib.PurePath",
  "severity": "warning",
  "tool": "ruff",
  "prep": {
    "dependents": 47,
    "hub_status": "high",
    "module": "core",
    "concepts": ["indexer-must-be-idempotent"],
    "risk_score": 0.62,
    "recommendation": "Hub file — verify no dynamic import relies on this symbol before removal."
  }
}

SARIF round-trip

If your pipeline already speaks SARIF (GitHub Code Scanning, semgrep, CodeQL), hand the SARIF document directly to prep_audit. SourcePrep detects the format, enriches every result, and returns a valid SARIF document back — the structural context rides along as properties.prep on each result.

# Your tool emits SARIF
semgrep --config auto --sarif --output findings.sarif

# Pipe it through prep_audit
prep_audit(findings=<SARIF dict>)

# Back comes enriched SARIF — every result now has properties.prep

Both SARIF 2.0.0 and 2.1.0 are accepted. The returned document round-trips cleanly into any SARIF viewer — the enrichment is additive, not destructive.

Why this matters

Linters and scanners are structure-blind by design — they look at one file at a time. That produces long flat lists where a dead import in a leaf utility looks identical to a dead import in a file that forty modules depend on. SourcePrep fixes that triage gap without changing your scanner:

  • Sort by risk_score to get the blast-radius-aware queue.
  • Filter by hub_status to isolate findings in fragile areas.
  • Cross-reference concepts to see which findings touch codified business rules.

Example workflow

Typical loop inside an AI agent or a CI job:

  1. Run your linter. Export JSON or SARIF.
  2. Call prep_audit(findings=...).
  3. Re-sort results by risk_score descending.
  4. Attack the top findings first — those are the ones where fixing one line benefits the most of the codebase.

Good to know

  • Enrichment is read-only and runs locally against your existing index — nothing is sent to the cloud.
  • If a file isn't yet indexed, the finding is returned unchanged with prep.hub_status = "unknown".
  • The tool field is preserved end-to-end so downstream dashboards can still group by linter.