Recently, the field of behavioral economics (a hybrid of psychology, business, and economics) has been rocked by two accusations of fraud: Dan Ariely (of Duke University) and Francesca Gino (of Harvard University) are both accused of fabricating data in their studies.

Ironically, both are experts on dishonesty, and Gino even wrote a book called Rebel Talent: Why It Pays to Break the Rules at Work and Life (Gino, 2018). After these accusations, one wonders how much of the book is based on her personal experience. And in a huge twist of fate, Ariely and Gino are both independently accused of fabricating data in the same article (Shu et al., 2012). This raises the question of how common data fabrication is among social sciences.

Dan Ariely and Francesca Gino.

Both Ariely and Gino deny the accusations, and yesterday Gino sued her accusers and Harvard University for $25 million. In her lawsuit, Gino states,

Data retention polices indicate that records older than six years do not need to be retained so unavailability of this data should be expected given the year the study was conducted (2012).

Gino v. Harvard University et al. (2023, p. 22)

Ariely has made a similar argument on an Israeli television program about the questions surrounding his research:

Researchers are required to maintain data on research for five years. I have no data on research that was conducted more than 15 years ago, and I have no way to find it.

Dan Ariely on The Source (2022)

In these quotes, Gino and Ariely are referring to specific articles (Gino & Wiltermuth, 2014; Mazar et al., 2008). When those articles were published, the guidelines from the American Psychological Association (APA) indeed stated that researchers should retain data for at least five years. This guideline applying to Ariely’s article was published in the fifth edition of the APA publication manual:

. . . authors are expected to retain raw data for a minimum of 5 years after publication of the research.

APA, 2001, p. 354

The relevant guideline was the exact same when Gino published her article, as stated in the sixth edition of the APA publication manual.

Authors are expected to retain raw data for a minimum of five years after publication of the research.

APA, 2010, p. 12


Comparing Gino’s and Ariely’s explanations with the ethical guidelines in place at the time reveals two relevant points. First, Gino wants the clock for data retention to start when the study is conducted, but APA (2010) guidelines at the time clearly state that researchers should retain data for five years “after publication.” Thus, she should have retained her data through at least 2019.

Second, data are to be retained for a minimum of five years; there is no expectation for researchers to throw away their data after five years. It is completely reasonable to expect researchers to keep their data for longer than five years. My earliest study was published in 2009, and I still have the data from it. I don’t understand why Ariely doesn’t have the data available for a study published just one year earlier or why Gino doesn’t have the raw data from a study published in 2014. If a person like me (who worked at a teaching university and is now outside of academia) can retain their data for every study they have ever conducted, then famous researchers at elite universities can be expected to do the same.

I am not alone. The insurance company that provided Ariely with his data for the Shu et al. (2012) study has found the original data that they provided to him. The insurance company didn’t need to retain the data, and the study was not a priority for the company. Yet, 14 years after they provided the data to Ariely — and 11 years after the study was published — the insurance company could dig up the data. Why couldn’t Ariely?

Reform Needed

Unfortunately, APA’s latest guidelines on data retention (APA, 2020, pp. 14-16) are much less clear-cut than prior versions. Gone is the minimum 5-year window for data retention. Instead,

Authors must make their data available after publication, subject to conditions and exceptions, within the period of retention specified by their institution, journal, funder, or other supporting organization. . . . If it emerges that authors are unwilling or unable to share data for verification within the retention period, the journal’s current editor may retract the article or issue an Expression of Concern about its findings according to the policy of the publisher.

APA (2020, pp. 14, 15)

I have written before about this guideline, and I like the enforcement mechanism in it. However, removing the blanket minimum results in a patchwork of mandates, and I worry that some studies will not be subject to any data retention requirement. For example, my latest article (Warne, 2023) was unfunded, and I published it outside of a university in a journal that has no data retention policy. Based on APA’s current guidelines, I do not need to retain the data at all.

Ariely and Gino used the previous guideline’s minimum as a loophole to attempt to deflect accusations of data fabrication. I believe that their reactions show why APA needs to reform its data retention policy. There needs to be a consistent minimum time period of data retention, and it needs an enforcement mechanism.

I propose that APA should mandate that researchers should retain data for at least 50 years after the publication of a study. After this guideline is announced, a study should be retracted if the original data are not retained.

Keeping data for half a century may seem daunting. Senior researchers likely recognize that they won’t be alive in 50 years. Even younger researchers who will likely be alive in 50 years may be worried about computer crashes, obsolete file formats, or natural disasters. The easiest way to retain data for that long is to upload immediately to a repository or deposit it in an archive (which may require an access embargo if some information is confidential).

Once is a fluke. Twice is a trend. Two different researchers have independently used the minimum data retention time period to deflect accusations of fraud without actually disproving them. This loophole in APA’s guidelines needs to be closed as soon as possible.


[Everything in this section was published September 13, 2023.] Shortly after publishing this blog post, I wrote to the APA Style office asking “. . . that the organization re-introduce a blanket minimum time period for data retention [and] that it be much longer than 5 years.” Stefanie Lazer, APA Style Expert replied:

Thank you for your feedback, which is greatly appreciated. We document all feedback and consider it seriously when working on new guidelines and future APA Style products. The data retention date was removed from the Publication Manual in an effort to avoid contradicting the data retention guidelines of institutions researchers were directly working or publishing with and thereby confusing the issue. As you note, what to do when those institutions do not have data retention standards in place is something that needs to be considered.

Stephanie Lazer (personal communication, August 9, 2023)

I replied the same day:

That is helpful information. Perhaps a more helpful revision would say that the default data retention time is XX years, though the requirements of a funder, institution, or journal’s guidelines should overrule that default.

My recommendation is that the default should be 50 years. While that sounds like a lot, there are articles that arouse suspicion decades after their publication. A short-term data retention policy (like 5 years) makes it difficult to investigate articles that people have raised legitimate concerns about. . . . Even if a default policy is much shorter (say, 15 or 20 years), it would be an improvement. The current situation makes it too easy for people accused of fraud to “run out the clock” and hide behind data retention policies.

Russell Warne (personal communication, August 9, 2023)


