What is the four-fifths rule?

The EEOC threshold for disparate impact: if the selection rate for any protected group is less than 80% of the rate for the highest-scoring group, that is presumptive evidence of adverse impact. Alerts fire below the 80% ratio, with a hold-and-review action.

Which dimensions are monitored?

Race/ethnicity, gender, age band (40+), and where data is available, disability status and veteran status. Per-protected-attribute, per-evaluation-stage. The pipeline is monitored end-to-end and per-agent (resume, profile, GitHub, voice).

How do you have demographic data on candidates?

Voluntary self-identification at application time, stored separately from evaluation data with strict access controls. Used only for aggregate audit, never for evaluation decisions. Candidates who decline are excluded from the audit population, not penalized.

Doesn’t bias monitoring on AI hiring have its own legal risk?

It does — getting it wrong has bigger consequences than not running it, in some interpretations. Done correctly with documented thresholds, regular external audits, and clear remediation paths, monitoring is the protective posture, not the exposed one. Work with employment counsel on threshold setting.

Bias-Detection Alerts on a 4-Agent Candidate Evaluation P…

Q: What happens when an alert fires?

The pipeline goes into hold-and-review for the affected agent. Recent decisions get a human re-review. The agent's recent prompt or model-version change is rolled back if the alert correlates with it. A bias-audit report goes to the customer's HR/legal team within 24 hours.

A four-agent candidate evaluation pipeline (resume, LinkedIn profile, GitHub, voice screen) is a production ML system whose decisions affect hiring outcomes. Bias drift in any of the four agents is not just a quality issue — it is a legal one. EEOC scrutiny on AI hiring tools is active; New York City, Illinois, and the EU AI Act have specific requirements. The monitoring layer is part of the product.

What the monitor watches

Per protected attribute, per evaluation stage, per pipeline agent: the selection rate (proportion of candidates passing each stage). The four-fifths rule says: if the selection rate for a protected group is less than 80% of the highest-scoring group's rate, that is presumptive disparate impact.

Per agent: did the resume agent select female candidates at 78% the rate of male? Alert.
End-to-end: did the pipeline overall select 40+ candidates at 82% the rate of <40? Within tolerance.
Per stage: at the GitHub-analysis stage, did the rate drop disproportionately for one group? Investigate that stage.

Where demographic data comes from

Voluntary self-identification at application time. Stored in a separate table with access scoped to the audit pipeline only — never visible to the evaluation models. Candidates who decline self-ID are excluded from the audit population, not penalized; their evaluation runs identically.

What happens when an alert fires

The affected agent goes into hold-and-review. New evaluations queue.
Recent decisions on candidates from the affected group get human re-review (recent = past 30 days).
The agent's recent change-set (prompt updates, model-version bumps, training-data refreshes) is reviewed against the alert window. Correlation triggers rollback.
A bias-audit report goes to the customer's HR + legal contacts within 24 hours: what the alert was, what the action was, what the post-action selection rates look like.

Why this is necessary, not optional

Two reasons. Legal: AI hiring tools without active bias monitoring are a regulatory target. Documented monitoring with documented thresholds + external audits is the protective posture. Quality: the alert is also a model-quality signal. A bias drift in the GitHub agent often correlates with a feature regression — the agent started weighting commit frequency more heavily, which correlates with employment status in a way that disadvantaged a group. Fix the feature, fix the bias, fix the quality.

What we measured

Alert rate steady-state: 0.5-1 alerts per quarter across customer cohort
False-positive rate: ~30% — alert fires, investigation finds no actual drift, threshold or sample size to adjust
Time from drift to alert: median 7 days; threshold tunable by customer based on volume
92% candidate satisfaction with the pipeline's perceived fairness (post-process survey)

What this does not handle

Disparate impact on attributes you do not monitor. If candidates do not self-identify a relevant attribute and you do not have demographic data, you cannot monitor. Best practice is to encourage self-ID, expand the attribute set as data allows, and document your monitoring scope clearly so blind spots are known and not hidden.

Bias-Detection Alerts on a 4-Agent Candidate Evaluation Pipeline

What the monitor watches

Where demographic data comes from

What happens when an alert fires

Why this is necessary, not optional

What we measured

What this does not handle

Share this article

Muhammad Mudassir

Muhammad Mudassir

Frequently Asked Questions

What is the four-fifths rule?

Which dimensions are monitored?

How do you have demographic data on candidates?

What happens when an alert fires?

Doesn’t bias monitoring on AI hiring have its own legal risk?

Still have questions?

Related Articles

The Production LLMOps Stack: Evals, Judges, Retries, Circuit Breakers

LLM-as-Judge With Temperature-Escalation Retry Inside a 60-Second Budget

Smart Category-Score Routing That Cuts LLM Cost ~75%

Explore More Insights