Today, the prestigious academic journal JAMA Internal Medicine published an article on the association between eating whole grains and having a lower risk of death from cardiovascular disease. Many news sources are going to have headlines like "Whole grains lead to heart-healthy benefits" and "Whole Grain Consumption Lowers Death Risk."
But you shouldn't believe them. While this latest work represents excellent science — a prospective cohort observational study, in scientific parlance — it's just one study. And when you look at a single study, you're getting only one piece of the puzzle, one interpretation of the research question, one idea about how to run a scientific experiment.
In this case, the study population was not randomly assigned to eat more whole grains, which means we can't know whether the people who ate them are healthier because of their diet or because of other traits they share, like their age, ethnicity, smoking status, alcohol intake, physical activity levels, multivitamin use, and family medical history.
Studies can control for many possible "confounding factors" — or variables that may influence a particular outcome —but it's impossible to account for everything that may matter. For example, this study didn't account for key determinants of health like wealth and education, which may be more important for the health of whole grain eaters than what they eat.
Besides, as the study authors themselves point out, theirs is not the only word on this matter. Their results match those found in the Iowa Women’s Health Study and the Norwegian County Study, but they did not fully align with other studies involving diabetics and healthy older people.
Not all studies are equal
The grain study, like many other health studies you read about, is an excellent moment to think about one key insight that could help you live longer than whole grains (or red wine or coffee or chocolate) ever will: a study isn't a study isn't a study like any other.
There are literally thousands of ways to design a study. When a news story suggests, "A new scientific study has found..." or a celebrity doctor begins a sentence with, "Studies show...", you need to ask, "What kind of studies?" Because "studies" are not equally reliable, they all have different limitations, and they should not be acted on in the same manner — or even acted on at all. Here's a quick guide to understanding study design that will help you navigate the often bewildering world of health research.
1) Much of health research can be broken down into two types: Observational and experimental studies
Much of health research — especially the kind that makes the news headlines — can be broken down into two basic types: observational and experimental.
In observational studies, scientists observe and gather data on some phenomenon that's already happening: patterns of olive oil consumption, who tends to take vitamin D supplements, how much people exercise, and so on. But they don't intervene at all to change anything in people's lives; they merely gather descriptive information on habits, beliefs, or events.
With experimental research, on the other hand, scientists do intervene, or at least use statistical methods to mimic intervention: they give some people a drug, they perform an operation on others. In the best-designed experiments, study participants are randomly divided into at least two groups: those who get the intervention (i.e., treatment) and those who don't (i.e., placebo). Random allocation ensures that the groups are statistically comparable with potential "confounding factors" equally distributed among them. The only difference between the groups is the intervention, which allows researchers to tease out what effect that intervention causes. This is why conclusions from experiments are generally considered to be more reliable and trustworthy.
2) There are four basic types of observational studies
There are many different types of observational studies, but here are the four most common that you need to know about: cross-sectional surveys, cohort studies, case-control studies, and case reports.
"Cross-sectional surveys" take a random sample of people and record information about them at one point in time. For example, researchers might survey randomly selected inhabitants of Washington, DC to figure out how many have heart disease (i.e., an epidemiological survey) or how they think about the quality of green space for outdoor exercise (i.e., a public opinion poll).
"Cohort studies" are just like surveys but they track the same groups of people over an extended period of time. That's why they are often called "longitudinal" and "prospective" studies. Instead of just gathering data on heart disease in Washington DC at one point in time, a cohort study would follow groups (or cohorts) of study participants over a period of, say, 10 years, and see how many people in each of the groups develop heart disease. This allows researchers to record changes in the health of the participants over time and compare the levels of health in different groups of people.
"Case-control studies" are often called "retrospective studies." That's because researchers start with an end point and work backward, figuring out what might have caused that outcome. For example, researchers could take two groups of people who live in Washington, DC: those who have been diagnosed with heart disease and those who haven't. They could then work backwards and survey the two groups about their earlier health behaviors to figure out what might have caused the disease to develop or not. They may ask about saturated fat consumption or exposure to disease-inducing viruses. From there, they would note any differences in risk factors or exposures that emerge between the two groups which can help suggest what may have led to heart disease in some people.
"Case reports" are basically detailed stories about a particular patient's medical history. If a doctor writes up case reports about a cluster of patients with the same condition or disease, this is a "case series." Though these are considered the weakest kind of observational studies, they can still be very helpful for rare diseases and powerful for advocacy. Sometimes they can be a bellwether in medicine. Early case reports, for example, led to the tragic discovery that mothers who were taking thalidomide for morning sickness were having babies with missing limbs. These reports surfaced long before a randomized trial could ever be done — and spared thousands of babies.
Read more: Why so many of the health articles you read are junk; Stop Googling your health questions. Use these sites instead; How to read a paper; and the Vox cardstack on how to be a more savvy science reader.
3) Observational studies have limits you need to understand
From a single observational study, researchers will only be able to suggest whether there's an association between a risk like fat consumption and an outcome like heart disease — and not that one caused the other. That's because the research participants were already eating fat or already had heart disease (or not) when the study began. What if people who eat lots of fat happen to be less health conscious? What if they are poorer and therefore more stressed? What if this particular group of fat-eaters just happened to be chubbier than those who stick to a low-fat diet? These things are called "confounding factors," or the difficult-to-predict variables that are associated with both the cause (e.g., saturated fat) and potential effect (e.g., heart disease) under study.
Sometimes confounding factors are knotty and wholly misleading. In 1991, the authors of a commentary published in the New England Journal of Medicine suggested that left-handed people had a higher risk of mortality. For their retrospective case-control study, researchers looked at death certificates from two counties in southern California and then asked family members of the deceased about their beloved ones' handedness. They found that being left-handed is associated with dying younger. "The mean age at death in the right-handed sample was 75 years, as compared with a mean age at death of 66 years in the left-handers," they wrote.
After publication, the journal editor was inundated with angry letter-writers. That's because the researchers failed to account for the cultural context: there was a time in the US when left-handed children were forced to become right-handed children. The reason there were few older left-handers was not because the hand you write with spells an early end, but because the would-be elderly lefties had converted when they were young and appeared as right-handed people in the study.
4) There are two basic types of experimental research
Now let's move on to experimental research. There are two basic types: randomized controlled trials and quasi-experimental designs.
"Randomized controlled trials" are considered the gold standard of medical evidence, though as you will probably surmise by now, they aren't necessarily the best study design for every research question. The reason they're so powerful, when they're well done, is because they are designed to tease out cause-and-effect relationships; randomization means treatment groups are comparable, and the only difference between them is the intervention (i.e., whether they received the drug or not) so any difference in outcome between the two groups can be attributed to the intervention.
When these experiments are blinded, they're even more powerful: blinding means either the study participants, the doctors, or both ("double-blinded") do not know whether they are receiving/giving the real treatment or a placebo. So blinded studies account for any placebo effects that may arise.
Lastly, there's a type of study design that lies somewhere between experimental and observational research: that's the "quasi-experiment." These are essentially a type of unplanned or uncontrolled experiment that uses statistics and human ingenuity to mimic the conditions of an experiment. Scientists have found many ways of undertaking these. One example would be comparing tobacco consumption before and after a border town is subjected to new state smoking regulations with its neighboring town in a different state that keeps the old regulations. Another example would be to evaluate the effects of GPA-based university scholarships by comparing those students who were just above and just below the grade point cut-off for receiving them.
5) The king of all evidence: systematic reviews
Researchers often rank study designs in hierarchies (see above) to describe the relative weight of their conclusions. At the top of the hierarchy are syntheses of evidence that identify and integrate all sources of high-quality information relevant for a particular question coming from different contexts, settings, and methods.
These reviews address that problem of the single study puzzle piece. Rather than relying on just one person's experience or even just one randomized controlled trial, synthesized evidence draws on multiple sources and weighs their contributions to arrive at a more fully-supported conclusion according to each study's rigor and relevance. This kind of research is regarded as the highest form of evidence — the king of all evidence if you will — and the best science to inform decision-making.
The idea is that many studies, done on thousands of people and taken together as a whole, can get us closer to the truth than any single study or anecdote ever could. (That is, unless a single study or anecdote is the only evidence available.) Reviews are less biased than a selective sampling of smaller studies that they might summarize.
Within synthesized evidence, the most reliable type for evaluating health claims are called "systematic reviews." These studies represent the best available syntheses of global evidence about the likely effects of different decisions, therapies and policies.
As their name suggests, systematic reviews use particular methods for finding helpful information, assembling it, and assessing its quality and applicability to the question you're interested in answering. Following this approach to the evidence — which is usually independently repeated at least twice by separate reviewers — reduces the bias that can creep into single studies. This process also helps to make sure results are not skewed or distorted by an individual author's preconceptions or cognitive biases. Finally, such transparency means that readers can know what the authors did to arrive at their conclusions and can easily evaluate the quality of the review itself.
You can log into a place like the Cochrane Library, Health Systems Evidence, or PubMed Health and read systematic reviews about everything from the effects of acupuncture for migraines and premenstrual syndrome, to the efficacy of cranberry juice for bladder infections. The hard-working people behind these studies are even starting to translate their conclusions into "plain language summaries," written in the way most people actually speak. This means these reviews and databases are more accessible than ever before. But then again, not all systematic reviews are created equally, either. And systematic reviews are only a starting point.
Even with the best available evidence from around the world at our disposal, we have to analyze it and apply it to our particular circumstances. A personal experience with the success or failure of a drug, like an allergic reaction, is more informative for you than the most rigorous study on the drug ever could be.
Just remember that one person's experiences are merely anecdotes — the least helpful type of evidence — for others. And one study, like the latest on whole grains, is only one piece of the puzzle.
With Burden of Proof Julia Belluz (a journalist) and Steven Hoffman (an academic) join forces to tackle the most pressing health issues of our time — especially bugs, drugs, and pseudoscience thugs — and uncover the best science behind them. Have suggestions or comments? Email Belluz and Hoffman or Tweet us @juliaoftoronto and @shoffmania. You can see previous columns here.