Annotation is built on a simple premise: give models feedback, they improve. Repeat. It's the engine behind every generative AI program worth building.
So when a team came to me with annotation quality stuck at 60% - well below the 90% threshold stakeholders needed - the first question wasn't about the annotators. It was about the system around them.
What we found was straightforward, and a little ironic: the annotators had no feedback loop of their own. They were producing work, that work was being evaluated, but the results were never making it back to the individuals doing the labeling. No individualized scores. No guidance on where they were drifting. No mechanism to improve.
They were being asked to do precision work in the dark.
The fix wasn't complicated. Working with the team, I proposed and implemented an individualized QA feedback process - one that closed the loop for each annotator, giving them visibility into their own performance and clear direction on where to focus.
Within days, quality crossed 90%.
The lesson isn't that annotators were the problem. It's that even well-resourced programs can build sophisticated model feedback infrastructure while leaving the human layer completely unmanaged. HITL only works if the humans in the loop are actually in the loop.
That's the kind of structural problem I look for. If your annotation quality isn't where it needs to be, it's usually a process problem - not a people problem.