Elisabeth Bik did not start out her career as a vigilante. In fact, for many years, she was a microbiologist, studying human microbiomes. But then, one evening in the early 2010s, she was reading some papers and noticed something odd about the images in them.
“Somebody had used the same photo twice to represent two different experiments,” she remembers.
For some fields of science, photographs are important pieces of evidence. They can be photos of cells or tissues, for example, or of “western blots”— a technique for identifying proteins that produce ink-like blobs. These images support conclusions that are influential in everything from drug development to basic research on diseases.
In this case, Bik had noticed that the same photo of a western blot had been used, but in the second instance, it had been flipped upside down. This suggested that the evidence in this paper had been manipulated, calling into question the study’s conclusions.
This one small discovery kicked off what would eventually become Bik’s new career: spotting manipulated images in scientific papers. It started as a hobby, but it ballooned into a freelance consultancy, kept afloat in part through crowdfunding. Her analysis is sometimes subjective — relying on her eyes to spot manipulations, and her own criteria for what counts. But she has provided key insights on a variety of cases, including an investigation into a flawed and high-profile Alzheimer’s study, which resulted in an investigation.
Bik is just one of many data sleuths who examine data in scientific papers, looking for patterns and problems. And there are a lot of problems to potentially find. No one knows exactly how much scientific misconduct is out there in published literature, but Retraction Watch, a blog that tracks papers pulled for a variety of issues, has documented an uptick in retracted papers in the last two decades. A recent survey of clinical trials found flawed data in more than 40 percent of the studies analyzed (not necessarily due to outright misconduct, but it raises suspicions).
Some of the sleuths who look for these flaws do their work anonymously, on forums like PubPeer, potentially to avoid legal or professional repercussions. Just this year, a group of data sleuths known as Data Colada were sued for defamation when they posted a series of investigations into strange inconsistencies in some behavioral science data.
Though some journals employ in-house statistics screeners, many researchers do this work outside the typical science publishing ecosystem. They don’t work for academic journals; their evaluations are not a part of the peer review process. But their data sleuth work is essential in revealing flaws in the system. Their work prompts the question: How do seriously flawed papers get published in the first place, and what should we do about it?
On the latest episode of Unexplainable — Vox’s podcast about exploring unknowns — we look at that Data Colada story and at why our scientific system is so reliant on independent sleuths like these to identify potentially manipulated data in scientific papers. We also look at how journals and other institutions might do a better job of preventing and disincentivizing flawed science.
But Elisabeth Bik’s story shows the personal risks that data sleuths like Bik face, but also the scope of the problem they’re trying to shine a light on. In 2016, Bik published a systematic review, analyzing images in more than 20,000 papers, and finding manipulations of photographic evidence in 4 percent of them. That figure is modest in some ways. It shows that data manipulation is relatively rare. But it also shows that manipulation is pervasive enough to call into question many research findings.
We talked with Bik about her work, but also discussed what academic journals should do to make sure these problems aren’t published in the first place.
This interview has been edited for length and clarity.
So, to be clear, in your free time, as a hobby, you looked at 20,000 papers.
Yes! That amazes a lot of people. But I only looked at the images. I didn’t read the papers.
How many hours would you say you’d spent doing this sort of thing in your free time?
Oh my gosh. I think at some point I estimated this and I forgot. But let’s just say, I had a full-time job, and I would come home and do maybe one or two hours in the evening? And then on the weekend, if I had the time off and no other obligations, I would spend perhaps a full day in total on it.
So many hours for about two or three years, I did this every free hour that I had in my spare time.
Do you have a personality that makes this fun for you? Or is there something that makes you suited to this particular task?
I think I’m suited because I’ve always seen duplicated bathroom tiles. Or floor planks. Laminate flooring, obviously, is not real wood. They are photos of wood. And so I would always spot, “Oh, this plank or this tile is the same as that one.” I always thought it was sort of fun, like a puzzle, right? Like find the differences, but then find the similarities.
And when you were reviewing the 20,000 papers, you weren’t getting paid? This wasn’t a part of your day job?
No, no, because I felt like ... I’m seeing this problem, but I need to write this as a scientific paper. Because if I just start, you know, yelling, “This is a problem,” who is going to believe me? It needs to be done in a scientific way. So I found two other scientists who were also editors-in-chief of journals, so they were very experienced in scientific publishing, and they helped me run this project.
And so we published this — specifically looking for photos and duplications within papers, I found that 4 percent of those 20,000 [papers] contained duplications within the paper.
Four percent of the 20,000 papers ... now, I am amazing at math ...
Eight hundred. It’s 800!
So these 800 papers ... How are you feeling when you discover this? Are you sort of horrified at its results?
A mixed bag of emotions. I think the part that really upset me is when I first found that image. I sort of got mad that somebody was cheating.
After that first anger or being upset, I just wanted to know “how often does this happen?” Some of these were small problems, so a correction would have been okay. Like, “Oh, we made an error, you know, by accident. We had one image that we included twice.” Sure, it can happen, it could be an error. Not all of these are signs of misconduct, obviously.
We think about half of them could have been done intentionally. But some of these images contained what appeared to be heavily manipulated areas in their photo. So let’s say within a photo, you would see the same cell two or three times, copied, photoshopped, or what appears to be. And that should, in my opinion, usually lead to a retraction.
Professionally, for you, are there risks associated with doing work like this? With saying, you know, “Hey, I noticed that your paper had huge problems or flaws”?
Yes, obviously, that doesn’t go down very well. Nobody likes to be criticized. I’ve personally gotten angry emails or threats of lawsuits. And so that is a huge risk, because most of us who criticize other papers, either for a living or for a hobby, we do this unprotected.
There’s not ... good insurance that I could afford to protect me against lawsuits, and lawyers in the US are very expensive. I would not have the money to defend myself and neither do a lot of other people. Even if you work for a university, the university could say, “This is not the work we pay you to do, so we’re not gonna protect you.”
Given those costs, when you got those lawsuits threatening you, did you consider stopping, backing down?
I have thought about stopping, yes. I’ve never been actually involved in a real lawsuit. I’ve had some threats. But I mean, maybe I’m too naive. I just thought, “Well, it’s just a threat.” You can write me, you know, a letter saying you’re going to sue me, but I never have taken that perhaps as seriously as I should have.
In one case, a French researcher filed a police report against me. And that person has a lot of followers on Twitter. So they all started to send me images of people behind bars and even doxxed my home address. And so that was a very nasty affair.
But I actually thought, “Well, why don’t you answer all your criticism that I had on your work? Instead, you’re filing a report against me that I harassed you.” I felt this person doesn’t have any scientific answers. And so I actually felt more strengthened to keep on criticizing their work.
You’re a very brave person.
Maybe I’m too naive, I sometimes think.
It feels like all of the incentives are against doing the sleuthing that you are doing. I don’t know what it says about the state of science that it kind of is relying so heavily on freelance data sleuths. That feels flawed.
It does, but I think it’s just because a lot of scientific publishers have been perhaps a little bit naive and have not realized there’s always been science misconduct.
It has been realized by sleuths like me and there’s many people doing this work. Most of them work anonymously. So I do want to give credit to all these people who do this work.
And now publishers are realizing, “Hey, we should have seen this,” and there is a change. I feel that a lot of publishers are now setting up more safeguards to protect the papers that they will accept, and so I hope that the work that we’re doing is now paying off.
They should have people, paid staff, looking for these types of potential signs of misconduct. Because peer review itself is just based on trust.
Peer review is an unpaid job that most PhDs or professors do in their free time. They sort of quickly look through a paper, quickly see if there’s something wrong with it scientifically, but they don’t really look at it from the point of view that it could be misconduct, that it could be fraud. If you really want to do fraud detection, that’s a specialized job.
Statistical analysis is another very specialized job that not all peer reviewers will be able to review well. And you can hire people to do that, you could train people to do that. But you need specialized people working at scientific publishers. You wouldn’t have your credit card fraud detection done by volunteers.
It sounds like what you’re saying is ... across a variety of journals, there should be someone like you, analyzing images.
Some journals have hired people like me. And they have specialists. Some journals have been doing that actually for a pretty long time. So there are journals recognizing this, and I hope by doing my work and making people aware of it, by writing my paper, I just hope to make more people aware of it.
(Note: One survey of around 100 biomedical journals found inconsistent use of this type of review: Around 34 percent of journals never used this kind of specialist statistical review, and another 34 percent used it for fewer than half of their studies. Only 23 percent analyzed all papers this way. They also found that these numbers haven’t changed much in the last few decades. Separately, a small survey of psychology journals found around 40 percent of editors thought “no additional specialized statistical review was warranted.”]
Knowing some amount of misconduct gets published, can we trust science?
We need science to solve climate change, pollution, hunger, pandemics. We need science in order to solve these problems. So I would not say trust all science, but in general, the vast majority of scientists does it correctly. There’s always something, some criticism you can have on somebody’s work, but I do feel all of us are doing science out of this ideology that we want to cure cancer. We want to solve pandemics. We want to do a better job.
And so I do trust science in general, but, there’s also saying trust, but verify.