a litmus test for every prompt your agents run

Score every prompt before your agents run it.

Your agents run whatever you hand them, even a half-baked one-liner. Litmusify reads each prompt on-device in under a millisecond and pulls the shaky ones aside before they spiral into a thousand-call mess.

Leave your email and we'll start solving your burn.
sub-ms scoring·zero data retention·free for solo devs
litmusify-proxy agent.ts intercepting
·
litmus score
released
compute saved · session
$0.00
⏎ score

it slips in front of the coding agents your team already runs

CursorClaude CodeRoo CodeDevinWindsurf
// where the money goes

One lazy prompt. A $15 loop.

Nobody selling you tokens has a reason to help you use fewer. One vague instruction fans out: your agent ransacks the repo, guesses wrong, and tries again. And again.

prompt ▸ "fix the authentication bug" → unconstrained
agent calls
0
tokens burned
0
cost · 20 min later
$0.00
with Litmusify: detoured before call #2 · $0.40, zero loop

Illustrative scenario.

// how it works

Score. Coach. Release.

No big model in the way. A small classifier on your machine reads each prompt in a millisecond, weighing your instruction and never your code. Your best engineers never feel it.

01score

Score, don't generate

Local classifiers read structure and specificity to score the prompt. No generative call, no waiting.

< 1 ms · on localhost
02detour

Soft-detour the bad ones

Below the bar, the prompt is paused and coached right in your IDE, not thrown as an error.

native chat UI
03release

Release the clean prompt

Tightened, it streams straight to the model. Litmusify trains the human as the work ships.

zero added latency
// architecture

A local proxy on localhost.

A tiny daemon on localhost catches each prompt right before it leaves for the network. Clean ones sail through; shaky ones loop back for a quick fix. Your code never goes anywhere.

IDE / agent
cursor · roo
litmusify · localhost:8080
score · <1ms
model provider
claude · gpt
⚠ ambiguous → detour back to IDE ✓ clean → released to model
raw prompt stays on localhost · only anonymized telemetry leaves the perimeter
// results

The number a CFO understands.

Your VP won't act on an average score. So we tie it to your merged PRs: the real compute cost of every shipped pull request, before and after.

COMPUTE PER MERGED PR▼ 73%
$45
$12
beforeafter Litmusify · same velocity
$45$12
Compute cost per merged PR
<1ms
Added latency. No LLM in the path
0bytes
Of your source code that ever leaves

Illustrative figures.

// developer incentive

Good prompts earn God Mode.

Nobody likes being audited, so we flipped it. Everyone starts on the cheap models. Write clean prompts and your litmus score climbs, unlocking higher limits and the best frontier models by default.

0/100✦ God Mode unlocked
0–40
haiku
base rate limit
40–70
sonnet
2× rate limit
70–90
opus
4× rate limit
90+
all frontier · god mode
unlimited
// your team

The leaderboard that rewards clarity.

It's not a race to burn the most tokens. We rank your team by how clearly they ask, tied straight to cost per PR. Clarity rises, waste sinks.

devlitmustier$/pr
1M@maya94god mode$9
2R@ravi88opus$11
3S@sam73sonnet$16
4A@alex54haiku$27
5J@jordan39haiku$41

Sample team data.

// the moat

It gets sharper with every prompt.

A gateway can count your tokens. It can't see what you meant. We learn from the one signal nobody else has: which prompts ended in a clean merge, and which spiralled. Every prompt sharpens the next call.

edge cases in the ledger0 ▲ 38% / quarter
01 synthetic bootstrap tens of thousands of simulated lazy-vs-clean prompts.
02 open-source honeypot a free local tool; solo devs opt in to anonymized telemetry.
03 enterprise ledger intent to outcome across real codebases. The specialized model.

Infra gateways clone an API proxy in weeks. They can't clone the data.

Illustrative figures.

Your machine / VPCclassifiers · embeddings · raw prompt
stays here
↓  anonymized telemetry only  ↓
Litmusify dashboardscores · token counts · detour rate
// privacy by architecture

Your code never leaves the machine.

The classifiers, embeddings, and scoring all run on your machine or in your VPC. No one else reads a word, so there's nothing for your security team to sign off on.

  • Raw prompts are never logged centrally.
  • Only anonymized telemetry leaves the device.
  • Preserves your zero-retention vendor deal.
// faq

Questions, answered.

What engineers and their VPs ask before they let us near the agents.

No. Classifiers, embeddings, and scoring all run as a local proxy on your machine or in your VPC. Only anonymized telemetry ever syncs: the litmus score, token counts, and detour rate. Your raw prompts and source are never transmitted, so there's nothing for a security team to review.

Lightweight local models (think XGBoost and a small PyTorch classifier, not a frontier LLM) read structural signals: target specificity, action verbs, acceptance criteria, and vector distance from known-good prompts. Those features produce a probabilistic 0–100 score in well under a millisecond.

No. There's no LLM in the hot path. Local classifiers score the prompt in under a millisecond, so a well-formed prompt passes straight through with no perceptible latency.

Rarely. The score is an exponentially-weighted moving average with session inertia, so early exploratory turns barely move it and one-off shorthand on a senior's profile sails through. Only sustained ambiguity trips a detour, and even then it coaches rather than blocks.

Never a hard block. Litmusify uses a soft detour: it pauses an ambiguous prompt and returns a coaching note rendered natively in your IDE. Tighten the prompt and it's released instantly. No error codes, no broken flow.

Anything that speaks MCP or an OpenAI-compatible API: Cursor, Claude Code, Roo Code, Devin, Windsurf, and more, in front of any frontier model. It runs as a local proxy, so it sits in front of the agents without changing your setup.

About five minutes. Drop in the local daemon, point your IDE's base URL (or MCP server) at localhost, and it starts scoring. A file-watcher keeps the config pinned so nobody drifts off the proxy by accident.

The dashboard cross-references the anonymized telemetry with your version control to compute compute cost per merged PR, the one metric leadership can act on. Not a subjective "litmus score 74", just dollars per shipped PR, before and after.

Yes. The whole stack runs inside your VPC with no raw data leaving the perimeter, which is exactly what InfoSec wants to hear. Team and enterprise plans add the shared dashboard, SSO, and PR-level reporting. Free for solo developers.

// the teamwomen-founded

Built by people who've felt the burn.

Two women building the guardrails for the agentic era. We sat beside the engineers, watched the spend spiral from a single lazy prompt, and built the litmus test we wished we'd had.

Aditi
CEO & Co-founder

Product leader with 8+ years building 0→1 products across AI, edtech, and consumer. As a founding PM she shipped AI features to large user bases and rebuilt core data systems around large language models. That's where she watched unmanaged prompt quality turn into runaway compute spend that nobody owned. She started Litmusify to put a litmus test in front of every prompt, and leads its product and strategy.

Cassandra Mackin
CTO & Co-founder

CS master's candidate in AI at Georgia Tech, with a background spanning bioinformatics, linguistics, and UX design. Her work ranges from reasoning systems and ARC-style puzzles to state estimation with Kalman filters and path planning with A* search. Earlier she led UX across logistics, onboarding, and accessibility-focused products. At Litmusify she owns the engine: the local classifiers, embeddings, and loopback proxy that keep scoring fast and private.

Headshots coming soon.

Stop paying for prompts that were never going to work.

Tell us where it's burning and we'll help you put it out. Leave your email below, or just reach a founder directly at aditi@litmusify.com.

Free for solo developers. We read every one.