clock menu more-arrow no yes mobile

Big Data 101

Colleges are hoping predictive analytics can fix their dismal graduation rates

Libby Nelson is Vox's policy editor, leading coverage of how government action and inaction shape American life. Libby has more than a decade of policy journalism experience, including at Inside Higher Ed and Politico. She joined Vox in 2014.

Welcome to college, freshmen. Now look at the neighbor on your left, and the neighbor on your right. One of the three of you won't be here at graduation.

Decades ago, colleges would start off freshmen orientation by pointing out how many students wouldn't succeed. The practice has gone out of style. But the graduation rate has barely budged: less than two-thirds of students who start college ever finish. So the central mystery of higher education remains the same: who will graduate? Who won't? What separates the successes from the dropouts?  And how can colleges turn the latter into the former before it's too late?

Ellen Wagner's job is to answer those questions. The longtime education technology expert directs the Predictive Analytics Reporting Framework, one of the biggest data sets of higher education's nascent era of Big Data.

Using data on 1.8 million students from the past, Wagner can see the future. Give her the bare bones of a college freshman's biography — age, major, whether he is the first in his family to go to college, whether she has served in the military — and she can predict whether that student is likely to graduate.

"It sounds almost like science fiction," Wagner says. "But the reality is there's a lot that every one of us can be doing right now by simply looking at patterns of information."

Looking at these patterns is known as predictive analytics. This is how Amazon knows that people who buy Harry Potter and the Sorcerer's Stone are likely to buy The Lion, the Witch, and the Wardrobe and The Hunger Games. It's how Netflix recommends House of Cards to people who watch The West Wing. But it's also used to predict more serious matters: whether Target shoppers are pregnant, for example, or whether health insurance customers are more likely to end up in the emergency room.

Whether someone will graduate from college is not a question of life or death. But it's not far off, either, for both the students themselves and for the country. Graduates with bachelor's degrees make twice the hourly wage of people without a degree; over their lifetime, they will earn up to 1.8 times what a high school graduate does. They are more likely to marry. They are healthier on a wide range of measures. And the benefits don't just accrue to individual graduates; the Organization for Community and Economic Development estimates that the US spends less than $40,000 on each college graduate, and the American economy will reap nearly $200,000 in return.


A generation ago, American young adults were the best-educated in the world. Now 25- to 34-year-olds are 12th, behind Korea and Russia, among others. If predictive analytics can fulfill its promise of increasing the odds of getting a college degree, Wagner says, "this is going to be as big as computers in education."

A trend that sweeps throughout higher education — from for-profit colleges and community colleges to elite liberal arts colleges and public flagship universities — is rare. But an unusually diverse range of more than 150 colleges are now using some form of predictive analytics. Several organizations and companies, including Wagner's PAR Framework and the Education Advisory Board, a consulting firm, have offered their own analytics; some research universities are developing custom systems of their own.

Predicting the future, though, isn't enough. College faculty by and large don't believe in academic predestination, where some students are fated to succeed and others to fail. So once colleges know the students who are most likely to drop out, the hope is that they can help them avoid that fate.

The path is strewn with potential unintended consequences. Studies show teachers expend more time and attention with students they know will succeed; will professors neglect students data shows are likely to fail? States are under pressure to improve their graduation rates; if they can identify the students least likely to graduate, will it be too tempting to shut them out rather than admit them and help them through?

Along with the privacy concerns that other industries have confronted, the rise of Big Data presents an uncomfortable question specific higher education. The American ethos of college-going rests on "if you can dream it, you can become it." But when we can pinpoint the students least likely to succeed, what will happen to them?

Predicting who will graduate might not seem difficult enough to need a futuristic framework. Every college campus hosts some students who are visibly flailing and others who are evidently thriving. One of the biggest factors in whether a student will graduate college is simply how much money his or her family makes.

The majority of students, though, still fall into the muddled middle, neither obviously succeeding nor failing, and their futures are more mysterious. Is flunking a course the sign of a bad semester, or the harbinger of much worse to come? Is a student with a 2.3 GPA going to be fine — "C's get degrees," after all — or a future dropout in the making?

Many students "are getting B's and C's, chugging along, not raising any flags," says Ed Venit, a senior research consultant with the Education Advisory Board. The consulting firm has its own predictive analytics model, with more than 100 colleges as members. "But a big chunk of those students are going to not finish."

Some colleges are trying to identify those potential dropouts as early as possible. Every incoming student at the University of Texas at Austin now is entered into what the university calls its Dashboard — a giant data set based on students from the past 10 years. The Dashboard includes close to everything the college can know about an incoming freshman: family income, financial need, test scores, what classes he took in high school, whether she is the first in her family to go to college.

The system is a digital Sorting Hat. It has 16 different analyses it runs on incoming freshmen alone, says Rita Thornton, a research associate at UT-Austin, who worked on the analytics system. Some students who fall into the bottom quartile end up in a special program, the University Leadership Network, meant to help them beat the odds. That program lavishes the students with extra help and support, but it also sends a strong message: they belong on campus, and they are going to graduate.

The psychological interventions in particular seem to be helping students overcome adversity. "One way or another," education writer Paul Tough wrote in the New York Times Magazine about the UT-Austin program, "almost all of the students I spoke to were able to turn things around, often pulling themselves back from some very low places."


But the Dashboard presents another, more troubling possibility. Public universities like UT-Austin are under pressure from state legislators to improve their graduation rates or risk losing funding. They could use the wealth of predictive data another way, to find at-risk students during the admissions process and not let them in at all.

Consultants who work on predictive analytics say this is a possibility, but they are not particularly worried. The most selective colleges already use their competitive admissions process to sort students who will succeed from those they think will not.

The rest of America's colleges and universities, they say, can't afford to be that choosy. Either these schools rely on tuition revenue to keep their doors open, or they are public colleges required by state law to accept students who hit a certain academic threshold.

As a result, most colleges "don't have a lot of leeway in who they don't take," Venit says. "They're pretty much taking every student they think could potentially succeed on campus."

Even before predictive analytics arrived on campus, Southern Illinois University analyzed applicants' grade point averages and test scores to try to determine what made a successful college student, says John Nicklow, the university's provost.

"What you can't capture is the student who is strong, but they come in and something's not right, and they fail that one course," he says. With better data, "you'll know, and you'll be able to intervene."

Doing so has produced modest, but real, results. At Southern Illinois, the percentage of freshmen who continued into the second semester inched up the first year the college used predictive analytics, from 83.2 percent to 86.9 percent, and GPAs increased as well. Of course, it is possible that the college just admitted a stronger class of students last fall, Nicklow says. But that wouldn't explain why the biggest effect was among at-risk students. Southern Illinois students working with an adviser using predictive analytics were 6 percent more likely to continue than those who were not.

College administrators see two ways they can harness the power of Big Data to eventually help students. The first plunges into the heart of the college completion crisis to identify the students who are at risk of not graduating at all, as at UT-Austin. Often these are students who face odds outside their control: they are from underrepresented minorities, or from poor families, or are starting college later in life. With those students identified, colleges want to build them a support system to see them through to graduation.

The other strategy focuses on a more insidious contributing factor to both dropout rates and soaring student debt: for too many students, college takes longer than four years. Sometimes this is because of life circumstances for students who need to work or raise a family. But other delays are more preventable: students might not be taking enough credits to graduate in four years. Or they can't get into classes they need because campus is too crowded. Or they change majors too late and have to add a semester or two for newly required classes.

The two problems aren't separate. A strategy that helps one student to graduate can help another to finish on time. The University of Hawaii system, which is using the PAR Framework, is trying to address both.

The flagship campus at Manoa has a four-year graduation rate of just 55 percent. When the university crunched its data, says Hae Okimoto, director of technology for the university system, they learned that it mattered whether students started out taking more than the bare minimum of credits they needed to be full-time.

Many colleges and universities require fewer credits for full-time status than the 15 per semester that are necessary to graduate in four years. This gives students the option of scaling back for a semester or two without risking a loss of financial aid, on-campus housing, and other benefits. But analyses have shown that students who take at least 15 credits are more likely to eventually graduate, and the University of Hawaii data showed that starting strong makes a big difference.

So the university began suggesting a customized set of courses to incoming freshmen. The courses all had open seats available and fulfilled requirements, and together they added up to 15 credits in the first semester.

Two years later, even without additional guidance, those students were still more likely to be on track. They hadn't needed constant hand-holding. They just needed a push at the beginning.

Not every college has the resources to offer continual, one-on-one advising; some, like Hawaii, hope to use the data to make changes at scale. But small colleges with more resources see even more potential in predictive analytics, because they can help advisors see red flags and stop problems before they happen. They can counsel students who seem to be struggling with their chosen major to change their minds, or urge them to pursue tutoring or join a study group that might help them succeed.

Or as Nicklow, the Southern Illinois provost, puts it: "You're not ready to be an engineer, and this data shows that — but with our help maybe you can be."

First, though, they have to find the turning points that signal likely failure or success . Weed-out courses are often the stuff of campus lore, and at Susquehanna University, one course looms large for history majors: History Methods. Even the course description for the 300-level class calls it a key to the development of future historians.

But analyzing predictive analytics data, the college discovered that the class is not as important as everyone had assumed. History Methods might be a tough and important course, but it isn't particularly predictive. Whether students struggled had little impact on whether they would eventually earn a history degree.

"By the time they get to History Methods, even if they don't do really well in History Methods — even if they fail History Methods — they are so committed to history they are likely to take it again," says McMillin, the provost. "And they are likely to eventually be successful."

The real make-or-break point, it turns out, was earlier: a survey of American history that students could take as early as freshman year. Advisors could tell students struggling with History Methods the good news that they will probably be OK as long as they persevere. But students who flail in the survey course could be told to begin considering other majors.


Administrators found other surprises as they crunched the numbers. Doing well in chemistry strongly predicts whether a student will do well in biology, for example. But so does succeeding in an unrelated language arts course.

"It's a course in how to process written information, how to write about it, and the mechanics of writing," says biology professor and associate dean David Richard. "The degree to which that was important in predicting success in biology was actually a surprise to me."

Many students will discover on their own that they aren't cut out for a particular subject: some studies find that the majority will change majors before graduating. But doing so too late, particularly if it's a dramatic switch, can require picking up new prerequisites. That could add on another semester of college, or two. And in an era where 70 percent of bachelor's degree recipients borrow money toward their education, every additional semester adds on thousands of dollars of student loan debt.

Often, students don't see failure coming until it's too late. McMillin is used to meeting with students who tell her a midterm grade "isn't really an F." Their mindset, she says, is "I know I didn't do well, but I really want to do this, so I somehow will magically be able to change," McMillin says, calling it "magical thinking."

A stubborn hope that performance and results are not connected is probably as old as higher education itself. The author and essayist Joan Didion once wrote that at 19, she saw herself as "a kind of academic Raskolnikov, curiously exempt from the cause-effect relationships which hampered others."

No wonder that professors, administrators, and advisors seem happy to have hard data to back up their role as academic Cassandras warning of impending doom.

With data in hand, McMillin says, professors can confront students: "Ninety-eight percent of people who got this grade in this class were not able to change it. Tell me how you're the exception. Let's get real here, and let's think about how we move you into another major that really aligns with your strengths and with your passions and gets you through in four years."

Colleges consider a student who successfully graduates in four years, even with a different major than originally intended, as a success story. It's certainly better than the alternative of dropping out with student loan debt to repay.

But whether a student considers that success is another matter — and "let's get real" is never a comfortable conversation.

Richard mentioned a junior at Susquehanna, a former student, who is determined to go to medical school and become a cardiologist. "I know there's absolutely no way this is going to happen," he says. "I don't think she's going to be a physician at all, regardless of cardiology. I think she would be better off considering other options."

If she were his advisee, the professor says, he could use the analytical data to urge her to change her mind and persuade her to try a different major or career path, one where she is more likely to be successful.

Magical thinking in education, though, runs deep. From kindergarten through high school graduation, students are steeped in a can-do spirit. Believe in yourself. Reach for the stars. Never give up. If someone tells you're not "college material," your job is to prove them wrong.  Listening to Richard, it's easy to have a knee-jerk reaction: who are you, biology professor, to say somebody can't be a cardiologist? You say 98 percent of students who get this grade eventually fail? Well, some people have to be in the 2 percent who succeed.

For at-risk students particularly, this confidence is key. At UT-Austin, the students chosen by the Dashboard for extra help are never told that they're considered less likely to graduate. Even at colleges that focus on using data to help students graduate in four years or pick the major best suited to their academic skills — colleges that will deploy the information selectively for a "tough love" conversation here and there — there is a reluctance to open up the analytics completely, to let students see into the crystal ball themselves.

"This is not a tool to highlight to students that they're in trouble or can't make it," says Nicklow, the Southern Illinois provost. "It's an awareness tool to make them aware that now's the time to buckle down."

Beyond the concerns about data privacy and security that are common to all predictive analytics efforts, there are reasons to worry about the initiative in higher education. Expecting academic success can be a self-fulfilling prophecy. A famous 1966 experiment found that when teachers were told that a (meaningless) test determined which students would gain the most intelligence during the academic year, teachers favored those students subtly but consistently, leading to real intelligence growth over the course of the year.

In telling the future, colleges must strike a delicate balance. They need students to believe that the trends they're projecting are real — real enough to get them to seek out tutoring help, or try a major where students with similar track records have succeeded. They need the grantmakers and administrators who hold the purse strings for support programs to believe that pinpointing the students less likely to succeed really will help them graduate.

But even as they bow to the power of Big Data to predict the future, colleges need even more to believe that future is not set in stone.

"For a lot of people, knowing there's a problem and not being able to do anything about it — it's almost as if we're failing on the job," says Wagner, who cautions against seeing her database of 1.8 million students as a "magic 8-ball" of higher education. "The idea of knowing what's going on is really important. But knowing what you can do to address that is probably even more important."


Biden has been bad for Palestinians. Trump would be worse.


Why leap years exist, explained in one simple animation


The South Carolina primary was a joke. It tells us something deadly serious.

View all stories in The Latest

Sign up for the newsletter Today, Explained

Understand the world with a daily explainer plus the most compelling stories of the day.