One popular topic in psychology when discussing test performance is the idea of stereotype threat. First proposed by Claude Steele and Joshua Aronson in 1995, the stereotype threat is phenomenon where a person who belongs to a stereotyped demographic group performs in accordance with the stereotype after being reminded of it. Usually this is suggested as a partial or total explanation for the average score differences among demographic groups on academic tests (e.g., Nisbett, 2009).

From its first discussion in 1995, the concept has been widely embraced and researched in psychology. The Google Ngram chart below shows how the concept grew as a topic of discussion in books from 1995 to 2013, at which point it leveled off.

Google Ngram showing the increasing popularity of stereotype threat as a topic of discussion in books published in English, as demonstrated in the blue line. “Stereotype threat” since 2013 has been mentioned in books about as often as “general intelligence.”

However, cracks are starting to appear in the stereotype threat literature. A large-scale, pre-registered study of stereotype threat on math tests in females failed to show evidence for the phenomenon (Flore et al., 2018), and there is strong evidence of publication bias in research on the topic. Schimmack (2019) estimates that publication bias and other questionable research practices has inflated the percentage of studies supporting gender stereotype threat from 14% to 84%. Two recent meta-analyses on gender stereotype threat show strong evidence of publication bias (Flore & Wicherts, 2015; Ganley et al., 2013). These results make it clear that the strength of the evidence supporting the stereotype phenomenon is inflated.

Most problematic is the finding that stereotype threat is strongest in highly artificial, laboratory studies. As the research is conducted in more realistic settings, the phenomenon weakens and disappears completely under the most relevant, real-world scenarios (Shewach et al., 2019).

When I was preparing my upcoming book, In the Know: Debunking 35 Myths About Human Intelligence, I revisited Steele and Aronson’s (1995) article. I found many of the hallmarks of research that psychologists now know is unlikely to replicate: a high degree of flexibility in methods, shifting procedures from study to study within an article, small sample sizes, etc. [Update: The book has been released.]

What was most damning, though, were the statistical power calculations I did. The average sample size of African Americans in Steele and Aronson’s (1995) four studies was 37.75. This means that the statistical power was too small to detect any but the strongest effects of stereotype threat. Assuming an effect of d = .50 (Cohen’s, 1988, “medium” effect size and a reasonable default at the time for new research topics), the power to detect stereotype threat in Steele and Aronson’s (1995) four studies ranged from .300 to .459. The joint probability for detecting stereotype threat in all four studies was just .014. Conversely, the probability of not detecting the stereotype threat in four out of four studies was .173 (Warne, in press, p. 279).

In layman’s terms, this means that all four studies were each (individually) more likely to fail to detect a stereotype threat than to detect the phenomenon. Moreover, the probability of all four studies failing to demonstrate stereotype threat was over ten times more likely (17.3%) than identifying the phenomenon in all four studies (1.4%). Based on these probabilities, the most likely result of a collection of four small-sample studies on stereotype threat was a mix of some studies supporting and not supporting the existence of the phenomenon. Steele and Aronson (1995) either got extremely lucky . . . or they engaged in questionable research practices that inflated the strength of the evidence for stereotype threat.

Such an improbable result for four consecutive studies with low power is not firm evidence for a psychological finding (Schimmack, 2012). Moreover, the methodological characteristics of studies on stereotype threat that undermine my confidence in the original Steele and Aronson (1995) study are still typical for research on the topic. This is why I do not believe there is strong evidence that stereotype threat effects are real. The evidence in their favor is flimsy and contaminated by shoddy research practices.

Wasted Treasure and Time

If this were an academic spat without any relevance outside of the ivory tower, then I would not care much. But Steele and Aronson’s (1995) original article spawned a flood of research on stereotype threat.

I conducted a basic search of funded grants related to stereotype threat in the databases of the National Science Foundation and the U.S. Department of Education’s Institute for Educational Sciences. I confined my results to studies that used the exact phrase “stereotype threat” as a keyword and had the phrase in either the grant title or abstract. This would ensure that all search results were grants for projects that had stereotype threat as a central topic.

A search produced a total of 89 grants (80 at NSF, 9 at IES) to study stereotype threat at a cost of $67,556,712 (with $49,431,331 at NSF and $18,125,381 at IES). All of that money was spent chasing after a phenomenon that never had strong evidence supporting it in the first place.

A more effective use of money than spending it on stereotype threat research. (Source)

Moreover, this is a conservative estimate based only on grants at two federal agencies in a single country. It also doesn’t consider the indirectly funded research through taxes and tuition that supported researchers working at universities, or the costs of awarding jobs and tenure to people publishing articles in this area. The true cost of the research on stereotype threat is surely much greater. Imagine all the research on real phenomena that could have happened if NSF and IES weren’t wasting money on stereotype threat research!

There is also the cost of time that needs to be considered. All of these grants were for at least a year; some of them lasted 5 years. What discoveries about human nature could have been made if scientists weren’t wasting their time on a scientific dead-end?

This is why the general public needs to demand better research from scientists. While much shoddy research falls by the wayside, some of it catches fire and leads to decades of scientists spending money and time chasing after ethereal findings that are probably not real. This represents a grave violation of the public’s trust, and the scientific community should ask itself some difficult questions about why they rewarded flimsy research on this topic with so much money for so long.

References

Flore, P. C., Mulder, J., & Wicherts, J. M. (2018). The influence of gender stereotype threat on mathematics test scores of Dutch high school students: a registered report. Comprehensive Results in Social Psychology, 3(2), 140-174. https://doi.org/10.1080/23743603.2018.1559647

Flore, P. C., & Wicherts, J. M. (2015). Does stereotype threat influence performance of girls in stereotyped domains? A meta-analysis. Journal of School Psychology, 53(1), 25-44. https://doi.org/10.1016/j.jsp.2014.10.002

Ganley, C. M., Mingle, L. A., Ryan, A. M., Ryan, K., Vasilyeva, M., & Perry, M. (2013). An examination of stereotype threat effects on girls’ mathematics performance. Developmental Psychology, 49(10), 1886-1897. https://doi.org/10.1037/a0031412

Nisbett, R. E. (2009). Intelligence and how to get it: Why schools and cultures count. W. W. Norton & Company.

Schimmack, U. (2012). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17(4), 551-566. https://doi.org/10.1037/a0029487

Schimmack, U. (2019), January 2). Social psychology textbook audit: Stereotype threat [Blog post]. https://replicationindex.com/2019/01/02/social-psychology-textbook-audit-stereotype-threat/

Shewach, O. R., Sackett, P. R., & Quint, S. (2019). Stereotype threat effects in settings with features likely versus unlikely in operational test settings: A meta-analysis. Journal of Applied Psychology, 104(12), 1514-1534. https://doi.org/10.1037/apl0000420

Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69(5), 797-811. https://doi.org/10.1037/0022-3514.69.5.797

Warne, R. T. (in press). In the know: Debunking 35 myths about human intelligence. Cambridge University Press.

css.php