“The field has amplified the potential for harm by codifying the 4/5 rule into popular AI fairness software toolkits,” researchers Jiahao Chen, Michael McKenna, and Elizabeth Anne Watkins wrote in a paper. academic published earlier this year. “The harmful erasure of legal nuances is a wake-up call for computer scientists to self-critically reassess the abstractions they create and use, especially in the interdisciplinary field of AI ethics.”
The rule has been used by federal agencies, including the Departments of Justice and Labor, the Equal Employment Opportunity Commission, and others, as a way to compare the hiring rate of protected groups and of whites and whether the hiring practices had discriminatory effects.
The goal of the rule is to incentivize companies to hire protected groups at a rate at least 80% of that of white males. For example, if the hiring rate for white males is 60% but only 45% for blacks, the ratio of the two hiring rates would be 45:60 – or 75% – which would not meet the threshold. 80% of the rule. Federal guidelines on using the rule for employment purposes have been updated over the years to incorporate other factors.
The use of the ruler in fairness tools arose when computer engineers sought a way to abstract the technique used by social scientists as a fundamental approach to measuring disparate impact in numbers and code, a said Watkins, a social scientist and postdoctoral research associate at the Center at Princeton University. for Information Technology Policy and the Human-Computer Interaction Group.
“In computing, there is a way to abstract everything. It all comes down to numbers,” Watkins told Protocol. But important nuances got lost in translation when the rule was digitized and codified for easy bias removal tools.
When applied in real-world scenarios, the rule is typically applied as the first step in a longer process to understand why a disparate impact has occurred and how to address it. However, engineers often use fairness tools at the end of a development process, as the last checkbox before delivering a product or machine learning model.
“It’s actually become the reverse, where it’s at the end of a process,” said Watkins, who studies how computer scientists and engineers do their AI work. “It’s completely reversed from what it was supposed to do… The human element of decision-making is lost.”
The simplistic application of the rule also misses other important factors considered in traditional valuations. For example, researchers typically want to inspect the subsections of candidate groups that need to be measured using the ruler.
To have a disparate impact of 19% and say it’s legally safe when you can confidently measure a disparate impact at 1% or 2% is deeply unethical,
Other researchers have also inspected AI ethics toolkits to examine their connection to actual ethical work.
The rule used on its own is a crude instrument and not sophisticated enough to meet today’s standards, said Danny Shayman, AI and machine learning product manager at InRule, a company that sells automated intelligence software to customers of employment, insurance and financial services.
“To have a 19% disparate impact and say it’s legally safe when you can confidently measure a 1% or 2% disparate impact is deeply unethical,” Shayman said, adding that AI-based systems can confidently measure impact in a much more nuanced way. .
Model drifting in another way
But the rule is making its way into the tools AI developers are using in hopes of removing disparate impacts on vulnerable groups and detecting biases.
“The 80% threshold is the widely used standard for detecting disparate impact,” Salesforce notes in its description of its bias detection methodology, which incorporates the rule to flag data for potential bias issues. “Einstein Discovery triggers this data alert when, for a sensitive variable, the selection data for a group is less than 80% of the group with the highest selection rate.”
H20.ai also references the rule of documentation on how disparate impact analysis and mitigation works in its software.
Neither Salesforce nor H20.ai responded to requests for comment on this story.
The researchers also argued that translating a rule used in federal labor law into AI equity tools could divert it to terrain outside the normal context of hiring decisions, such as banks and housing. They said it amounts to epistemic intrusion, or the practice of making judgments in arenas outside of an area of expertise.
“In reality, no evidence exists for its adoption in other areas,” they wrote of the rule. “In contrast, many toolkits [encourage] this epistemic intrusion, creating a self-fulfilling prophecy of relevance spillover, not only into other US regulatory contexts, but even into non-US jurisdictions!
Watkins research collaborators work for Parity, an algorithmic auditing firm that could benefit from the deterrent use of off-the-shelf fairness tools. Chen, Parity’s chief technology officer, and McKenna, the company’s chief data science officer, are currently embroiled in a legal dispute with Parity’s CEO.
While applying the rule in AI equity tools can create unforeseen problems, Watkins said she doesn’t want to demonize computer engineers for using it.
“The reason this metric is implemented is because developers want to do better,” she said. “They are not encouraged to [software] development cycles to do this slow, deep work. They must collaborate with people trained in abstraction and trained to understand these spaces that are abstract.