For over one thousand years, the ptolematic model of the universe dominated Western and Islamic astronomy. All that most people remember about the theory (and often only characteristic they ever learned) is that it placed the earth at the center of the universe, with the planets, sun, and moon all orbiting the earth. This is understandable because the theory is often called the “geocentric model,” which indicates that the earth is at its center. Additionally, most visual depictions of the model show the earth at the center of a concentric series of circles, each representing the planets, moon and sun.
In reality, the ptolemaic model was much more complex because it would not take long for an ancient astronomers to observe discrepancies between the actual motion of the planets in the sky and the predicted motion of planets that move in a simple circle around the earth. In particular, the planets sometimes appear to move backwards across the background of the stars (a phenomenon called apparent retrograde motion), something that should never happen if a planet orbited earth in a regular circle at a constant speed.
To handle these discrepancies, ancient Greek astronomers theorized that the planets were on the perimeter of a circle that rotated around a point which, in turn, orbited the earth. This circle that rotated while orbiting the earth was called an epicycle and is illustrated below.
From the perspective of an observer on the earth (in the center of the diagram), the planet (symbolized by the solid red dot) does not directly orbit the earth. Instead, it lies on the edge of the smaller circle, which is the epicycle. As the epicycle rotates while orbiting the earth, the planet appears to follow the dark curved line, arriving at position 2 before later making a backwards loop to position 3 and then advancing to position 4 in a regular fashion.
Epicycles greatly complicated the geocentric model, but including them in calculations produced far more accurate predictions than a simple geocentric model without them could produce. Modern mathematicians now understand that adding epicycle would always improve the predictions of the planets’ motions, and that given a sufficient number of epicycles, errors in prediction could be minimized. Indeed, with an infinite number of epicycles, the predictions would be perfectly accurate. Although medieval and Renaissance astronomers did not really add epicycle upon epicycle, doing so would have improved their models and (apparently) strengthened the evidence in favor of the geocentric theory of the universe.
Epicycles in the Social Sciences
Social scientists have their own methods for making models increasingly complex in order to fit their observations, and these modifications often function the same way that epicycles functioned for the geocentric model. They increase the complexity of a theory and appear to strengthen it by making it approximate observations better. These “epicycles” are moderators and mediators.
In a relationship between X and Y, a moderator is a third variable Z that can change the strength (and sometimes direction) of the relationship between X and Y. When a moderator is present, there is still a direct relationship between X and Y, but the moderator can alter that relationship; including that moderator will make a statistical model more accurate.
When the third variable Z is a mediator, the causal relationship between X and Y is not direct. Rather, X has a causal influence on Z, which has, in turn, a causal impact on Y. Therefore, the relationship between X and Y are indirect, and building a theory for how X impacts Y requires understanding the mediating influence of Z. Again, this complicates the model but also makes it more accurate.
The Epicycles of Stereotype Threat Theory
Just as epicycles can be added to the geocentric model to make it better fit scientists’ observations — but more complex — moderators and mediators can be added to theories of behavior to better fit psychologists’ observations and providing the illusion of scientific progress. Indeed, I believe that this is what is happening to stereotype threat theory.
Stereotype threat is the theorized phenomenon that occurs when a person who belongs to a stereotyped group is reminded of the stereotype that applies to their group, which then causes the person to conform to that stereotype. This phenomenon was first reported in a highly-cited article by Steele and Aronson (1995), in which they claimed that African American college students who were reminded of negative stereotypes about their racial group had lower performance on cognitive tests than African American students who did not have similar reminders.
Stereotype threat theory quickly found adherents in psychology. By 2015, there had been 300 studies conducted to show the effect (Pennington et al., 2016), and its advocates touted stereotype threat as a “robust” phenomenon (e.g., Spencer et al., 2016, p. 418; Smittick, 2019, p. 7). Many authors claim that it is an explanation for underperformance of African Americans on cognitive tests (e.g., Kaufman, 2013; Nisbett, 2009) or females taking mathematics tests (e.g., Picho-Kiroga et al., 2021; Steele, 1997).
However, psychologists soon recognized that a simple, direct relationship of threat → underperformance was inadequate. Most problematically, the stereotype threat phenomenon appeared in only 30-55% of replication studies (Stoet & Geary, 2012). As was typical at the time, Stoet and Geary used a very broad definition of “replication study.” In 2021, when I searched for close replication studies of stereotype threat studies, I only found four. The results? Three of the replications failed to demonstrate the phenomenon, and the fourth only did so partially. That is a very low batting average, though typical of social psychology (Open Science Collaboration, 2015).
Thus, there was a paradox: according to theory, stereotype threat is a real phenomenon and is robust enough to explain (at least partially) lower academic performance for some demographic groups. Yet, in studies designed to detect the phenomenon, it often did not appear. To handle this paradox, psychologists almost immediately started proposing mediator and moderator variables that could explain why reminders of a stereotype did not always result in changes in performance (e.g., Steele, 1997). These mediator and moderator variables functioned like epicycles in the geocentric theory: they added considerable complexity to the theory, and they fit observations from studies of the phenomenon that the theory did not predict before the mediators and moderators were suggested.
For over 25 years, stereotype threat theorists have continued to propose more mediators and moderators. In one literature review, Pennington et al. (2016) found 18 mediator variables that psychologists had proposed for stereotype threat theory. In a different literature review, Spencer et al. (2016, p. 423), described 11 moderators for stereotype threat, not including moderators that others have proposed, including awareness of the stereotype, task difficulty (Steele, 1997), identity strength, age, strength of the stereotype threat (Appel et al., 2015), and face validity of a test (Hollis-Sawyer & Sawyer, 2008). This brings total of 33 variables (18 mediators and at least 15 moderators), though that number is likely an underestimate; I have not attempted an exhaustive search of proposed mediators and moderators in stereotype threat studies.
Often these mediators and moderators can explain results of an isolated study where discrepancies between the predictions of stereotype theory and the data. But when a theory requires 18 moderators and at least 15 mediators, then there is the serious possibility that these variables are functioning as epicycles. Moderators and mediators increase the complexity of stereotype threat theory, and any increase in explanatory power that they give may be illusionary support for the theory.
Like adherents to the geocentric theory, stereotype threat proponents are remarkably incapable of recognizing that their own data disproves their theory. For example, when Sunny et al. (2017) conducted a study that consistently failed to find evidence for stereotype threat in chemistry students, the researchers suggested a variety of ad hoc explanations (pp. 170-171), including the possibilities that (1) stereotype threat would appear later in the students’ education (2) it would appear earlier in the students’ education, (3) stereotype threat does not occur for women in chemistry, (4) there was not sufficient domain identification in the sample, and (5) motivation may not be a mediator variable in their specific sample. Similarly, in their meta-analysis, Picho-Kiroga et al. (2021) found that studies including moderators that Steele (1997) stated were “essential” for stereotype threat were actually less likely to show the phenomenon (Warne, 2022) and yet still advocated for the theory. In neither article did the authors ever consider that stereotype threat theory might be wrong. Instead, they attempted to salvage the theory, including via proposed mediators and/or moderators.
These researchers were just adding more epicycles to their theory — not developing an accurate understanding of human behavior. When a researcher’s goal is to salvage a theory, instead of generating new and falsifiable predictions, then they are adding epicycles.
Conclusion
I hope that this blog post does not lead readers to think that mediators and moderators are always useless variables and a sign of shoddy science. That is not the case; some mediators and moderators are real and important. For example, sex is a common moderator variable in the biological and social sciences; some interventions are more effective in men than women (or vice-versa). However, the interaction effects that provide evidence for a moderator variable have a low level of replication (Pallesen, 2018), likely because most interactions have much lower statistical power than main effects (Gelman, 2018).
While mediator variables are real, the analysis procedures to test for mediators have their problems, too. When the mediator variable is not manipulated (as is often the case in stereotype threat studies), then mediation analysis can give misleading results and false positives (Bullock et al., 2010). It takes a carefully designed study and newly developed statistical procedures to produce high confidence in the relevance of a mediator variable (Bullock & Green, 2021).
It is extraordinarily unlikely that all 33 (or more) mediators and moderators proposed for stereotype threat theory are real, even if some help the theory fit the data from an experiment. Like ptolemaic astronomy and its epicycles, a theory with a large number of mediators and moderators is unnecessarily complex and provides illusionary understanding of reality.
Finally, adherents to stereotype threat are not the only theorists who add epicycles to salvage their untenable theory. Mindset theory has the same characteristics, as do many theories from sociology (e.g., structural racism as an explanation for differences in group outcomes). Mediators and moderators are real, but when a theory has more proposed mediators and moderators than evidence to support them, then the safest heuristic is that all of them are just epicycles added to a flawed theory.
References
Appel, M., Weber, S., & Kronberger, N. (2015). The influence of stereotype threat on immigrants: Review and meta-analysis. Frontiers in Psychology, 6, Article 900. https://doi.org/10.3389/fpsyg.2015.00900
Bullock, J. G., & Green, D. P. (2021). The failings of conventional mediation analysis as a design-based alternative. Advances in Methods and Practices in Psychological Science, 4, Article 4. https://doi.org/10.1177/25152459211047227
Bullock, J. G., Green, D. P., & Ha, S. E. (2010). Yes, but what’s the mechanism? (Don’t expect an easy answer). Journal of Personality and Social Psychology, 98(4), 550-558. https://doi.org/10.1037/a0018933
Gelman, A. (2018, March 15). You need 16 times the sample size to estimate an interaction than to estimate a main effect. Statistical modeling, causal inference, and social science. https://statmodeling.stat.columbia.edu/2018/03/15/need-16-times-sample-size-estimate-interaction-estimate-main-effect/
Hollis-Sawyer, L. A., & Sawyer, T. P., Jr. (2008). Potential stereotype threat and face validity effects on cognitive-based test performance in the classroom. Educational Psychology, 28(3), 291-304. https://doi.org/10.1080/01443410701532313
Kaufman, S. B. (2013). Ungifted: Intelligence redefined. Basic Books.
Nisbett, R. E. (2009). Intelligence and how to get it: Why schools and cultures count. W. W. Norton & Company.
Collaboration, O. S. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), Article aac4716. https://doi.org/10.1126/science.aac4716
Pallesen, J. (2018, December 27). Interaction effects do not replicate. https://rpubs.com/Jonatan/interactions
Pennington, C. R., Heim, D., Levy, A. R., & Larkin, D. T. (2016). Twenty years of stereotype threat research: A review of psychological mediators. PLoS ONE, 11(1), Article e0146487. https://doi.org/10.1371/journal.pone.0146487
Picho-Kiroga, K., Turnbull, A., & Rodriguez-Leahy, A. (2021). Stereotype threat and its problems: Theory misspecification in research, consequences, and remedies. Journal of Advanced Academics, 32(2), 231-264. https://doi.org/10.1177/1932202×20986161
Smittick, A. L. (2019). The in-between: A meta-analytic investigation of stereotype threat effects on mediators of the stereotype threat-performance relationship. Unpublished doctoral dissertation (Texas A&M University, College Station, TX).
Spencer, S. J., Logel, C., & Davies, P. G. (2016). Stereotype threat. Annual Review of Psychology, 67, 415-437. https://doi.org/10.1146/annurev-psych-073115-103235
Steele, C. M. (1997). A threat in the air: How stereotypes shape intellectual identity and performance. American Psychologist, 52(6), 613-629. https://doi.org/10.1037/0003-066X.52.6.613
Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69(5), 797-811. https://doi.org/10.1037/0022-3514.69.5.797
Stoet, G., & Geary, D. C. (2012). Can stereotype threat explain the gender gap in mathematics performance and achievement? Review of General Psychology, 16(1), 93-102. https://doi.org/10.1037/a0026617
Sunny, C. E., Taasoobshirazi, G., Clark, L., & Marchand, G. (2017). Stereotype threat and gender differences in chemistry. Instructional Science, 45(2), 157-175. https://doi.org/10.1007/s11251-016-9395-8
Warne, R. T. (2022). No strong evidence of stereotype threat in females: A reassessment of the Picho-Kiroga et al. (2021) meta-analysis. Journal of Advanced Academics, 33(2), 171-186. https://doi.org/10.1177/1932202×211061517