Concurrency Discovery & Locking

Why your throughput may dip after a plan upgrade — and the one click that fixes it.

🛟Quick fix

Just upgraded your Ollama Cloud plan and SourcePrep still feels slow? Open AI Gateway, click the Developer wrench under Pipeline Activity, and hit Reset next to your cloud endpoint. SourcePrep will re-discover the new limit on the next request.

Why this exists

When you point SourcePrep at a cloud LLM endpoint for the first time, the system has no idea how many parallel requests your provider will actually accept. Ollama Cloud, Together, OpenRouter, and most BYOK providers don’t advertise their concurrency cap up front, and the “real” cap depends on your plan tier, your account region, and provider load.

SourcePrep figures it out from real traffic. It starts conservatively, grows the parallel-request budget while latency stays healthy, and backs off the moment the provider returns a rate-limit error or latency spikes. The safe ceiling that emerges from this dance is the discovered ceiling.

Once SourcePrep finds that ceiling, it locks it for 24 hours so it doesn’t keep poking at the limit on every long-running build. That keeps your pipelines smooth and your provider happy. The trade-off: if the underlying capacity changes, the lock has to be invalidated for SourcePrep to find the new number.

What you’ll see

In the dashboard’s pipeline queue (sidebar), each cloud endpoint shows its current concurrency state next to the in-flight count:

cloud:default_ollama 5/52 📈 — five requests in flight, current ceiling is 52, system is still climbing (probing).
cloud:default_ollama 50/52 🔒 — sitting at the discovered ceiling; the lock is held and won’t grow further until it expires.
cloud:default_ollama 8/12 🌧️ — recent backoff detected; the system is recovering before trying to grow again.

For the full timeline (every probe, every backoff, every recovery), open Settings → Diagnostics → Concurrency Health. Each cloud node has a live event feed showing why the current limit is what it is.

When the ceiling is wrong

The lock exists because re-probing every minute would itself be wasted capacity. But three things can make the locked ceiling stale:

You upgraded your plan. Ollama Cloud Free → Max, OpenRouter free tier → paid, and similar plan jumps usually come with higher concurrency limits. SourcePrep is still using the old, lower lock.
Your provider changed limits. Rare, but providers occasionally raise (or lower) per-account quotas. The previous discovered ceiling no longer reflects reality.
Your network path changed. A new VPN, a different egress region, or a switch between residential and corporate uplink can shift effective throughput enough that the old ceiling becomes the wrong number.

How to reset it

You have three options, in order of convenience:

1. The developer button (recommended)

Open AI Gateway in the dashboard. Scroll to Pipeline Activity. Click the small Developer wrench in the section header. Hit Reset next to the cloud endpoint you want to re-discover. Confirm.

You’ll see a Cleared. New probe on next acquire. confirmation. The next outbound request to that endpoint kicks off fresh discovery.

2. The HTTP API

For scripts, headless setups, or CI:

curl -X POST 'http://localhost:8400/compute/concurrency/clear?node_id=cloud:default_ollama'

Replace cloud:default_ollama with the node id of your endpoint. Run GET /compute/scheduler to list live nodes.

3. Wait it out

The lock expires automatically after 24 hours. After that, SourcePrep probes one step up on the next acquire and starts re-discovering. If you’re patient, doing nothing works.

What gets reset

A reset only invalidates the persisted ceiling for that one endpoint. Specifically:

The 24-hour lock for that cloud:* node is dropped.
The in-memory slot is reset back to probing state.
On the next request to that endpoint, discovery restarts from the system’s jumpstart seed (typically 5 parallel requests, or whatever the provider’s probe value suggests).

Nothing else is touched:

Other cloud endpoints keep their existing locks.
Local node concurrency (VRAM-bounded) is unaffected.
In-flight requests are not interrupted.
Your saved endpoint config, model selection, and pipeline state are unchanged.

What to expect after a reset

For a few seconds to a minute, throughput may briefly drop while the system climbs back from the conservative jumpstart. This is expected — discovery has to start somewhere safe. If you upgraded your plan, the new ceiling will land higher than before within a few minutes of normal traffic.

If you don’t see a higher ceiling settle in, your provider may still be capping you at the old limit (some plan upgrades take time to propagate). The Concurrency Health view in Settings → Diagnostics shows every backoff event, which is the quickest way to see what’s actually happening.

🤖For AI agents

If you’re working with an AI assistant connected to SourcePrep via MCP, ask it to prep_search “concurrency ceiling” — the same explanation surfaces as a built-in concept the agent can read directly.