← Guides

Syncing Team Context (BYOC)

If you are on the Team or Enterprise tier, you can set up a headless indexing server so your developers never have to run local LLMs or burn CPU cycles to build the trace graph.

Your CI/CD pipeline builds the index once on every push to main. Every developer on your team downloads the pre-computed graph instantly. Their local SourcePrep only computes deltas for their uncommitted changes.


How it works

  1. Build once: A CI/CD job runs the SourcePrep headless image after every merge to main. It produces the full enriched trace graph — structural, semantic, and clustered — so every developer on the team starts from the same understanding.
  2. Store centrally: The index artifacts are uploaded to an S3-compatible bucket (Cloudflare R2, AWS S3, MinIO, etc.).
  3. Sync locally: Each developer's SourcePrep client checks the bucket on startup and downloads the latest index in seconds.
  4. Delta only: When a developer edits files locally, SourcePrep enriches only those files using their local LLM or BYOK API key. The rest of the graph comes from the shared index.
Loading component preview…

Live preview: The Sync Status Card showing a synced team index with commit hash and last-synced timestamp.

Two Docker image variants are available:

ImageSizeGPUUse case
ghcr.io/ericbintner/prep-headless:cpu~2-3 GBNoGitHub Actions + BYOK (OpenAI/Anthropic)
ghcr.io/ericbintner/prep-headless:gpu~8-10 GBYesRunPod, Modal, AWS ECS + local Ollama

Quick Start: CPU + BYOK (Zero Infrastructure)

The fastest way to get started. Runs directly inside a free GitHub Actions runner using your existing OpenAI or Anthropic API key. No GPU rental, no RunPod, no Modal.

1. Create a Storage Bucket

We recommend Cloudflare R2 (zero egress fees). AWS S3 or MinIO also work.

Create a bucket (e.g., prep-team-indexes) and generate an Access Key pair with read/write permissions.

2. Add the GitHub Action

Copy the workflow template into your repository:

# .github/workflows/prep-sync.yml
name: "SourcePrep Team Sync"
on:
  push:
    branches: ["main"]  # Add other branches as needed

jobs:
  index:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/ericbintner/prep-headless:cpu
    steps:
      - uses: actions/checkout@v4
      - name: Build & Upload Index
        env:
          PREP_S3_ENDPOINT: ${{ secrets.PREP_S3_ENDPOINT }}
          PREP_S3_BUCKET: ${{ secrets.PREP_S3_BUCKET }}
          PREP_S3_ACCESS_KEY: ${{ secrets.PREP_S3_ACCESS_KEY }}
          PREP_S3_SECRET_KEY: ${{ secrets.PREP_S3_SECRET_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          prep sync-headless \
            --repo-path . \
            --branch "${{ github.ref_name }}" \
            --model-provider openai \
            --model-name gpt-4.1-mini \
            --embedder native

That's it. Every push to main will rebuild the index (incrementally — only changed files are re-processed) and upload it to your bucket.

3. Connect Your Team

Commit the sync configuration to your repository (credentials are never committed):

// .sourceprep/team_config.json
{
  "sync": {
    "enabled": true,
    "s3_endpoint": "https://<account-id>.r2.cloudflarestorage.com",
    "s3_bucket": "prep-team-indexes",
    "s3_prefix": "my-repo-name",
    "poll_interval_minutes": 30
  }
}

Each developer provides their S3 read credentials via one of:

  • Environment variables: PREP_S3_ACCESS_KEY / PREP_S3_SECRET_KEY
  • A gitignored file: .sourceprep/.secrets
  • The OS keychain (prompted on first run)

When a developer opens the project, SourcePrep downloads the latest shared index automatically.


Advanced: GPU + Local LLM (RunPod / Modal)

For teams with large codebases or strict privacy requirements. Runs the enrichment pipeline on a rented GPU using open-source models (Qwen3, DeepSeek). No code leaves your infrastructure.

Modal provides serverless GPU execution that scales to zero.

  1. Install the Modal CLI: pip install modal && modal setup
  2. Save your S3 credentials as a Modal Secret named prep-s3-creds.
  3. Deploy the adapter: modal deploy modal/modal_adapter.py
  4. Copy the webhook URL into your GitHub Action (replace the run step with a curl call to the webhook).

The adapter source is in the prep-deploy repository under modal/.

Option B: RunPod Serverless

RunPod provides cheap A4000/A100 GPUs on demand.

  1. Build and push the RunPod image:
    docker build -f runpod/Dockerfile.runpod -t my-org/prep-runpod .
    docker push my-org/prep-runpod
  2. Create a Serverless Endpoint in the RunPod dashboard using your image.
  3. Set your S3 credentials as endpoint environment variables.
  4. Add a webhook trigger in your GitHub Action.

Cost Comparison

Approximate costs for a ~5,000-file codebase with 5 merges/day. Actual costs vary by codebase size and model pricing.

MethodPer runMonthly (est.)
CPU + OpenAI (gpt-4.1-mini)~$8~$1,200
GPU + Qwen3 (RunPod A4000)~$0.60~$90
GPU incremental (typical PR)~$0.05~$8

Most teams start with CPU + BYOK for simplicity, then migrate to GPU as their codebase grows and API costs spike.


Enterprise: AWS / Azure (VPC)

For organizations that require air-gapped deployment inside their own cloud infrastructure.

The ghcr.io/ericbintner/prep-headless:gpu image runs on any container orchestrator with GPU support:

# AWS ECS / Fargate
docker run --gpus all \
  ghcr.io/ericbintner/prep-headless:gpu \
  sync-headless \
    --repo-path /mnt/repo \
    --model-provider local \
    --model-name qwen3:8b
  • AWS: Use ECS with GPU task definitions or SageMaker Processing Jobs. Storage: internal S3 with IAM role auth (no static keys).
  • Azure: Use Azure Container Apps with GPU profiles or Azure ML. Storage: Azure Blob.
  • GitLab / Jenkins: Use the same Docker image in your existing CI/CD pipelines.

A reference AWS ECS task definition is provided in the prep-deploy repository under aws/.


How Local Deltas Work

When a developer edits a file that exists in the shared index, SourcePrep automatically:

  1. Detects the change via the local file watcher.
  2. Enriches only that file using the developer's local LLM or BYOK key.
  3. At query time, the local delta takes priority over the shared version (the stale remote entry is masked).
  4. When the developer's changes are merged into main and the shared index is rebuilt, the local delta is automatically discarded.

This means every developer always has the most up-to-date context: the team's shared graph plus their own uncommitted work.


Security

  • Never commit S3 credentials to Git. Use GitHub Secrets, Modal Secrets, or environment variables.
  • The .sourceprep/team_config.json file (committed to your repo) contains only the bucket endpoint, name, and prefix — no secrets.
  • SourcePrep includes a built-in secrets leakage detector that warns if credential-like keys appear in team_config.json.
  • Each developer provides read credentials via env vars, a gitignored .sourceprep/.secrets file, or OS keychain.

CLI Reference

prep sync-headless \
  --repo-path .                     # Path to a pre-cloned repository
  --repo-url https://...            # Or: clone from URL (uses $GIT_TOKEN for auth)
  --branch main                     # Branch to index (default: main)
  --model-provider openai           # local | openai | anthropic | google
  --model-name gpt-4.1-mini         # Model name for enrichment pipeline
  --api-key sk-...                   # API key (or use env: OPENAI_API_KEY, etc.)
  --embedder native                 # native (ONNX, CPU) | ollama
  --full                            # Force full rebuild (skip incremental)
  --s3-endpoint https://...         # S3 endpoint (or PREP_S3_ENDPOINT env)
  --s3-bucket my-bucket             # S3 bucket (or PREP_S3_BUCKET env)
  --s3-prefix my-repo               # S3 prefix (or PREP_S3_PREFIX env)
  --s3-access-key AKIA...           # S3 access key (or PREP_S3_ACCESS_KEY env)
  --s3-secret-key ...               # S3 secret key (or PREP_S3_SECRET_KEY env)

FAQ

How often does the local client check for updates?

On daemon startup, on manual “Sync Now” button press, and on a configurable polling interval (default: every 30 minutes).

Does every push trigger a full rebuild?

No. By default, sync-headless runs in incremental mode. It compares the existing index manifest against the current repo state and only re-processes changed, added, or deleted files. Pass --full to force a complete rebuild.

Which branches can I sync?

Any branch. Configure the branches list in your GitHub Action. Each branch gets its own S3 prefix automatically.

What if my repo is private?

The headless container supports Git clone via $GIT_TOKEN (HTTPS) or $SSH_KEY (SSH). In GitHub Actions, the actions/checkout step handles this automatically.

Do I need a GPU?

No. The :cpu image uses SourcePrep's built-in ONNX embedder (runs on CPU) and sends LLM reasoning to a cloud API (OpenAI, Anthropic, etc.). A GPU is only needed if you want to run models locally for privacy or cost reasons.

Where are the deployment templates?

All Dockerfiles, platform adapters (Modal, RunPod), GitHub Actions workflows, and AWS ECS references are in the public prep-deploy repository.