Annotation Scoring Calculator

Enter your task complexity and annotator accuracy to see how binary pass/fail scoring compares to per-decision scoring — and what it means for your program.

8
95.0%
Binary pass rate66.3%
Per-decision score95.0%
Gap−28.7pp

Your scoring system is the problem, not your annotators.

At 95% per-step accuracy, binary scoring predicts a 66.3% pass rate — well below any reasonable threshold. Annotators doing excellent work are being flagged as failing. This gap is structural.

To reach 90% under binary scoring with 8 decisions per job, annotators would need 98.7% per-step accuracy.

Want the full breakdown? Get the binary scoring explainer — including the math, the operational fallout, and how to fix it.

No spam. Unsubscribe anytime.

If your annotation program is showing scores like this, the fix usually starts with a diagnostic.

Talk to Justin →