Creativity is an important topic for scientists to study because it can contribute to problem solving, advance culture, and bolster to invention and economic growth. Beyond practical applications, research on creativity can provide insight into human psychology because humans are — by far — the most creative species on earth. Understanding creativity may provide insights into how and why humans differ from other mammals.

The scientific study of creativity predates psychology itself. In his book Hereditary Genius, Sir Francis Galton (1869) contrasted the characteristics of eminent men in creative fields (e.g., poetry, music, and painting) with individuals who were eminent in other areas (such as law and science). Decades later, Guilford (1950) used his presidential address to the American Psychological Association to urge psychologists to take the study of creativity seriously.

Today, the study of creativity flourishes, and the field has its own scholarly journals and professional community. Creativity research also has a variety of psychometric instruments that scientists have developed in order to quantitatively measure the construct of creativity.

The most popular too for measuring creativity is the Torrance Tests of Creative Thinking (TTCT), which was published in 1972. The TTCT has a verbal version (TTCT-V) and a figural or non-verbal version (TTCT-F). The TTCT-V requires examinees to written responses to a series of prompts, while the TTCT-F requires examinees to draw pictures in response to visual stimuli.

The TTCT was a breakthrough in creativity research. Responding to the test stimuli requires no special training, and the scores were numbers that could be analyzed quantitatively, like any other test score. Many populations can take the TTCT. For example, the test requires no special version or adaptation for children, and TTCT-F can be given to people in other languages. Some examinees even find taking the TTCT to be fun. That kind of response is rare for a psychometric test!

Both the TTCT-F and TTCT-V produce an overall score and at least three subscores. For the TTCT-F these are fluency, originality, and flexibility. Fluency refers to the number of valid responses an examinee produces. Originality refers to how unusual or unique the responses are. Flexibility refers to the number of different categories that individuals produce in their responses. The TTCT-F has six subscores: fluency, originality, elaboration, abstract titles, resistance to closure, and a creative strengths checklist. Elaboration refers to the amount of detail and/or complexity in responses. The other three TTCT-F subscores refer to characteristics of the drawings themselves and not as much about general aspects of creativity. Providing different subscores for different aspects of creativity is a strength of the TTCT.

The TTCT-V (in green) and TTCT-F (in red).

But there are some drawbacks to the TTCT. One of the biggest is that it is a nightmare to score. Most psychometric tests can be scored easily because there are a limited number of possible responses people can give. In intelligence tests, for example, there is often as few as one correct answer. But a good creativity test should prompt a wide variety of responses for each item, which makes scoring complicated because often it is not possible to foresee every possible response.

The scoring is so difficult for the TTCT that when I was in graduate school, one of my professors told my class to avoid scoring the test whenever possible. One recent group of authors called the scoring process “quite time-consuming” and “daunting” (Acar et al., 2023, pp. 5, 13). Scoring the TTCT is so bad that the test publisher provides the option of having the test scored by trained professional scorers; this extra cost is greater than the cost of the test itself.

Aside from the cumbersome nature of scoring the TTCT, there are statistical problems with its scoring system. Most important is the confounding influence of fluency. The other scores are reliant on fluency because the number of unique responses (originality) and detailed responses (elaboration) will always be dependent on the number of valid responses (fluency). This artificially inflates correlations among TTCT subscores.

This confounding effect of fluency on TTCT scores is well known (going back at least to Hocevar, 1979), but I do not believe that its importance is fully understood outside of the circle of creativity researchers.

One consequence to the confounding effect of fluency on other TTCT scores is that the inflated correlation among scores makes “creativity” (as measured by the TTCT) appear to be a more unitary concept than it really is. In a factor analysis, the inflated correlations among TTCT subscores will make it much easier than it should be to identify a factor among the scores, thus giving the illusion of internal validity to the test and providing spurious evidence that the overall total score is valid.

In my recent study using the TTCT (Warne et al., 2022), my coauthors and I controlled for fluency by dividing the other scores by the fluency score. This is a common method of controlling for fluency and turns the originality and elaboration scores into a proportion, instead of a count. For example, if in 8 valid responses, 2 of them qualify as being original, then the new originality score would be 2 / 8 = .25, because 25% of the valid responses were unusual enough to be counted as original.

Raw scores on the TTCT-F were highly correlated, but the adjusted scores were weakly correlated, and often the correlations flipped from positive to negative (Warne et al., 2022). This is a bad characteristic of a test. Relatively minor changes in scoring should not massively alter the statistical results. It would indicate that the results of any study using the TTCT are more dependent on a choice of scoring system than examinees’ actual level of creativity. In my study, controlling for the confounding influence of fluency made it impossible for the TTCT-F scores to form a coherent factor in confirmatory factor analysis.

What does this mean? In plain English, the TTCT-F (and, likely, the TTCT-V) does not measure a unitary psychological trait/construct like “creativity.” Instead, it measures a collection of behaviors that don’t have much shared variance with one another. It also means that the overall creativity scores on the TTCT-F (and probably the TTCT-V) are probably meaningless because there is not one single trait or construct that all the subscores contribute to.

TTCT-V subscore correlations changed the same way after the confounding effect of fluency was removed, though a confirmatory factor analysis was not possible (because the model would have zero degrees of freedom). But there is no reason to suspect that it measures a coherent overall trait, either.

The implications of removing the fluency confound can be daunting. The TTCT has been around for 50 years. These results cast doubt on the validity of much of the research that uses overall TTCT scores because the results may just be an artifact of the scoring system.

Even more broadly, this result also raises the question of whether “creativity” as a unitary construct is even real — or whether it is an example of the jingle fallacy (a possibility I have explored before). For decades, psychologists, artists, educators, and others have assumed that “creativity” was a single “thing” because there is one word to describe different behaviors (e.g., painting a picture, finding a novel way to solve a scientific problem, brainstorming names for a product). But this may be an example of where language does not map onto reality.

Coming back to more practical implications: I think that the scoring system of the TTCT is in dire need of revision. The new scoring system should be streamlined and should not allow fluency to contribute to other subscores. If an overall score is produced, then it should not be the product of a scoring artifact. Fifty years after the TTCT was first published, it is time to make some serious revisions to its scoring system.

References

Acar, S., Berthiaume, K., Grajzel, K., Dumas, D., Flemister, C. T., & Organisciak, P. (2023). Applying automated originality scoring to the verbal form of Torrance Tests of Creative Thinking. Gifted Child Quarterly, 67(1), 3-17. https://doi.org/10.1177/00169862211061874

Galton, F. (1869). Hereditary genius: An inquiry into its laws and consequences. Macmillan and Co.

Guilford, J. P. (1950). Creativity. American Psychologist, 5(9), 444-454. https://doi.org/10.1037/h0063487

Hocevar, D. (1979). Ideational fluency as a confounding factor in the measurement of originality. Journal of Educational Psychology, 71(2), 191-196. https://doi.org/10.1037/0022-0663.71.2.191

Warne, R. T., Golightly, S., & Black, M. (2022). Factor structure of intelligence and divergent thinking subtests: A registered report. PLoS ONE, 17(9), Article e0274921. https://doi.org/10.1371/journal.pone.0274921

css.php