When today's students do research, data, from academic journal articles to the newspaper, is everywhere. USA Today runs a daily infographic illustrating statistical trivia. Business journal graphs show stock market fluctuations or unemployment rates over time. Bar graphs or pie charts on tech blogs show how many consumers purchase laptops, tablets, or smartphones. The Common Core State Standards (CCSS) in English Language Arts mention “data skills” fifteen times, and the Next Generation Science Standards use the word “data” nearly 300 times. Students of all ages sense the importance of quantitative data: beginning with their first research projects, most write down any numbers they find.
Middle and high school students, who can be influenced by everything from whom to vote for to what to eat, need strategies in order to be data-savvy. One of the most critical strategies they need in order to navigate all this data is to understand the difference between “correlation” and “causation.”
CORRELATION IS NOT CAUSATION
Media headlines like, “New study proves it: ionized water prevents cancer!” or “Eating cantaloupe raises student test scores” (both fictional) are claiming causation. The headlines are saying that drinking ionized water absolutely, positively holds cancer at bay in every case and, without exception, cantaloupe consumption yields higher scores. When you see headlines like these, you’re more than likely a victim of clickbait (racy headlines that draw traffic to an online source, “baiting” you to click through) than you are seeing evidence of causation (one always makes the other happen).
When a study shows that two variables tend to move up or down in relationship to one another, experts say their data shows a correlation—a connection between the two. In other words, they are saying, “We noticed similar or inverse (opposite) behaviors in these two variables.” Researchers are leery of saying that the state of one item absolutely, positively influences—or causes—the other, because it’s so easy for their colleagues to design an alternate experiment that shoots down their argument. They take care not to say that one caused the other unless there is indelible evidence that no other variable could have been involved. If anything else could be contributing to the change in B, then A cannot be isolated as the cause.
In the cantaloupe case, is there anything else that could be going on that could also be contributing to higher test scores? Maybe cantaloupe consumption goes up in a group of kids whose test scores go up, but is it because of cantaloupe or, perhaps, because higher-income households can afford not just cantaloupe but also things like tutoring, calculators, after-school courses, and books? If there is any doubt, a scientist has to claim correlation, not causation.
You may hear, in school library circles, that librarians raise test scores. Correlation or causation? It’s correlation. We need significantly more study before we can be certain that the presence of a librarian invariably affects test scores. There can be so many other factors in a school that might also be contributing: Did the district recently add staff? Change textbook series? Move from an hourly to a block schedule? Extend the length of the school day or year? Switch to an International Baccalaureate approach? Change teacher evaluations? Hire a new administrator? Perhaps only the librarians who volunteered were studied, and there are different behaviors among unstudied librarians.
Until we can “control” for other factors (in other words, make accommodations in experimental design or execution that remove all other possible contributing factors), we can’t know for sure, so we should say, “There is a relationship between higher test scores and the presence of a librarian in a school” instead of saying that one causes another, as the original sentence says. (Don’t feel bad: It’s almost impossible to prove causation in K-12 education. There are so many socioeconomic, political, environmental, and other factors at play. Schools are complex ecosystems!)
WHY CORRELATIONS ARE STILL USEFUL
What is the point of reporting correlations if they don’t “prove” anything? Well, some correlations point to areas for future study. For example, a study by Ankar Vyas et al. found an association (a synonym for “correlation”) between higher diet soda consumption and higher risk for cardiovascular concerns (2014). The press releases from the American College of Cardiology (2014) and the University of Iowa (2014) both made the correlation argument. A scientist might find the correlation intriguing and design a new study that tries either to replicate the original study or design a new study in which she tries to isolate those two variables to see if causation is possible. Additionally, correlations, especially in education and health fields, can spur consumers to make proactive “just in case” behavior changes. In the diet soda instance, a consumer might say, “I only live once, so maybe I’ll reduce my soda consumption just in case.”
WHAT CAN YOU SHARE WITH STUDENTS?
As you become more mindful of the ways in which your students work with data during research, you will start to see entry points where you can develop mini-lessons around the kinds of data your students are facing. You’ll begin to see how your secondary students respond to data and statistics—do they take a skeptical eye, or do they take data’s numbers at face value? (After all, data and statistics can seem so compelling to students—cantaloupe and test scores? Sounds like an easy way to get Mom off our backs without having to study more.)
These mini-lessons may help in establishing foundational understanding of correlation versus causation in secondary students.
- Discuss clickbait with students and ask them to find examples of hyperbolic headlines that imply that a study has causation. Keep in mind that while we often blame digital media outlets for enticing headlines with big claims, there is a long history of hyperbole in public media in print and television, too!
- Define correlation and causation using these resources:
- Ionica Smeet’s TEDxDelft talk (
http://youtu.be/8B271L3NtAw ). - The Freakonomics DVD definition (
http://youtu.be/t8ADnyw5ou8 ). - Tyler Vigen’s mapping of disparate variables that correlate but could never have causality (
http://www.tylervigen.com/ ).
- Ionica Smeet’s TEDxDelft talk (
- Identify vocabulary that indicates causation, including “cause,” “effect,” “proves,” “generates,” “makes happen,” “affects,” and “impacts” so students can more readily identify claims of causation.
- Ask students to compare the argument made in a popular article’s headline with that made in the scholarly article being discussed. Remind students that headlines are often written by editors, but articles are by reporters. Does the article make the same case as the headline? Why or why not?
- Challenge students to search databases or Google Scholar (
http://scholar.google.com ) to find the original study referenced in the popular article. Discuss how to read and interpret an academic article, with a special focus on the Methods, Findings, and Limitations of the Study sections. Does the author imply correlation? Causation? If they were to try to move from correlation to causation, how might they design a study to try to isolate the variables?
Additional Resources
Entry ID: 1967091