Back to Blog
Methodology

Simpson's Paradox in Hiring Data: The Hidden Trap

December 18, 20259 min read
TR

Tyler Horan, Ph.D.

Principal Auditor & Founder

Imagine you audit an AI hiring tool and find no evidence of adverse impact. Impact ratios are above 0.80 for all demographic groups. The tool passes the four-fifths rule. Case closed, right?

Not necessarily. There's a statistical phenomenon that can hide discrimination in plain sight: Simpson's Paradox.

What Is Simpson's Paradox?

Simpson's Paradox occurs when a trend present in several groups of data reverses when the groups are combined. In hiring data, this means aggregate results can show no disparity even when significant disparity exists within individual departments or job categories.

A Real-World Example

Let's say a company uses an AI tool across two departments: Engineering and Sales. Here's the hiring data:

Engineering Department

GroupApplicantsHiredSelection Rate
White4008020%
Black1001010%

Engineering impact ratio for Black applicants: 10% / 20% = 0.50 (FLAG)

Sales Department

GroupApplicantsHiredSelection Rate
White1006060%
Black40020050%

Sales impact ratio for Black applicants: 50% / 60% = 0.83 (MONITOR)

Combined (Aggregate)

GroupApplicantsHiredSelection Rate
White50014028%
Black50021042%

Aggregate impact ratio for Black applicants: 42% / 28% = 1.50 (PASS—Black applicants appear to be favored)

The Paradox Revealed

In the aggregate data, Black applicants actually have a higher selection rate than White applicants. The tool appears to favor Black candidates!

But within each department, Black applicants have a lower selection rate. In Engineering, the impact ratio is 0.50—severe adverse impact.

What happened? Simpson's Paradox. Two confounding factors combined:

  1. Black applicants disproportionately applied to Sales (400 of 500) while White applicants disproportionately applied to Engineering (400 of 500)
  2. Sales has a much higher overall selection rate than Engineering

The higher baseline selection rate in Sales "masks" the within-department disparity when data is combined.

Why This Matters for Bias Audits

An audit that only looks at aggregate data would conclude this tool has no adverse impact. But within Engineering, Black applicants are being selected at half the rate of White applicants. That's a serious problem.

This is why NYC LL144's requirement for intersectional analysis is important—but even that may not catch Simpson's Paradox if the stratification is by demographics only, not by department.

How Paritas Handles It

Every Paritas audit includes Simpson's Paradox detection:

  • We stratify results by job category and department (when data is available)
  • We compare aggregate impact ratios to stratified impact ratios
  • We flag cases where results reverse upon stratification
  • We recommend deeper investigation when paradoxical patterns appear

The aggregate numbers might look fine. The department-level numbers tell the real story.

Need an audit?

Get started with independent, PhD-led AEDT bias auditing.

Get Started