Welcome back, Amara 👋
You're 35% through your AAP certification path. Your next lesson covers the 7 Core ADBOK® Governing Principles — a foundational pillar of the entire framework.
Reliability is defined in the ADBOK® Guide as the core requirement that similar data, under comparable circumstances and identical guideline versions, must yield the same label, decision, or classification when processed by any qualified personnel.
This consistency requirement applies across four critical dimensions: different annotators on the same project, different time periods within the same project, different geographic locations, and different review cycles. Without reliability, AI training data becomes contradictory — and a model trained on contradictory data will produce unstable, unpredictable behavior.
The ADBOK® Guide is explicit about the consequences of poor consistency:
- Training Signal Stability: Inconsistent labels teach models contradictory patterns that lead to unstable behavior in real-world applications.
- Evaluation Validity: Model evaluation results are meaningless if labeling standards shift during testing — you can't measure what's changing.
- Fairness Assurance: Inconsistent application of standards may create unfair treatment of different demographic groups or contexts.
- Scientific Reproducibility: Research and development depend on stable measurement that enables replication and external validation.
ADBOK® identifies four organizational practices that create and sustain reliability:
This is not an aspirational target — it is the minimum professional standard. If your team's Kappa falls below 0.80, production should pause until the source of inconsistency is identified and corrected through guideline clarification, calibration sessions, or additional training.
- Participate actively in calibration sessions — these are professional development, not tests
- Apply guidelines consistently even when it seems repetitive or counterintuitive
- Escalate when you are uncertain rather than guessing — uncertainty is valuable signal
- Review your personal consistency scores and respond constructively to feedback
- Never modify your interpretation of a guideline without approval — guideline drift is a quality risk