clock menu more-arrow no yes mobile

Filed under:

The Ebola Crisis and Where Big Data Can Help

Harvard’s HealthMap service made headlines for flagging the Ebola outbreak before the World Health Organization formally announced the epidemic.


Like fighting a forest fire with spray bottles. That was how one health worker described the challenge facing international efforts to halt the spread of Ebola.

The West Africa outbreak was first confirmed by the World Health Organization back in March. In the months since then the epidemic has transformed into the worst health crisis of the 21st century — one that has killed more than 4,500 people worldwide.

Until now, all previous Ebola outbreaks were small and localized within rural communities, but the 2014 epidemic has bucked this trend, spreading as it has across densely populated urban environments, from Guinea into Liberia, Sierra Leone, Nigeria and the U.S.

Cases in Liberia are doubling every 15 to 20 days, and in other countries such as Sierra Leone and Guinea, they’re doubling every 30 to 40 days. The Centers for Disease Control and Prevention now estimates that 1.4 million people could be infected by January if swift action isn’t taken to scale up the response.

In most cases, the infrastructure within the affected countries has already buckled under the pressure, with public-health systems nearing collapse as economic growth grinds to a halt.

So what can be done? Many have emphasized the role education must play in preventing the disease from spreading further, but just as important is ensuring that authorities have the tools needed to track a virus like Ebola.

A lengthy incubation period of up to 21 days makes it extremely difficult to determine how and where the disease is going to spread next. And with official channels under strain — Liberia has just 200 recorded doctors to care for a population of four million — emerging technologies are becoming increasingly important in the fight against the disease and attempts to stop the epidemic in its tracks.

While big-data analytics are often championed by the private sector, its potential use by aid workers has been somewhat overlooked. The technology enables vast swathes of data from a diverse array of sources to be aggregated and filtered, with irrelevant information removed along the way. Banks are using it to massively increase the accuracy of their fraud-detection measures, while pharmaceutical companies are using it to develop life-saving new drugs.

In disaster zones, however, real-time analytics that process and churn huge amounts of data can help pinpoint previously unanticipated trends, limit the number of deaths and, in doing so, massively reduce the spread of disease.

Harvard’s HealthMap service recently made headlines for reportedly flagging the Ebola outbreak some nine days before the World Health Organization formally announced the epidemic, issuing its first alert on March 19.

The organization compiles, collates and creates a visual report of global disease outbreaks, sifting through millions of social media posts including those from health care workers in Guinea blogging about their work.

HealthMap’s role in spotting Ebola is just one example of using data analytics to harness public health information, but there is huge potential for big data to help officials intervene more effectively on the ground.

The most positive impact the technology is having is in helping relief organizations anticipate where a disease is going to spread next, determining where preventative measures are proving effective and ensuring resources are allocated where they are needed most.

Using data gleaned from cellphones is key, given that even in poorer countries, cellphones are common, and carriers amass huge databases that can provide huge insights into their users’ behavior.

In the case of the current crisis, one telecomms provider in Senegal handed over anonymized voice and text data from thousands of cellphones that was used to produce a detailed overview of population movements across the region. Combined with the latest reports from the World Health Organization, the information offered clues about where to focus preventive measures and distribute health care.

But the drawback with this strategy was that it took a retrospective approach to data analytics, when the unpredictable nature of the Ebola outbreak and its lengthy incubation period meant authorities needed to be able to map movements in real time.

Which is why the CDC is now using mast data from cellphone operators to provide a real-time breakdown of when and where calls to emergency helplines are being made — a significant spike in calls from one town is enough to alert authorities, indicating where they will likely need to focus resources next.

Real-time access to this information provides authorities with a comprehensive picture of where people are and where they’re heading, allowing decisions to be made quickly without relying on hearsay or waiting for the latest update from local hospitals.

But using one data source alone can only ever give us a partial picture.

Big-data analytics is all about combining information from many different sources and analyzing them collectively to identify patterns. For disasters such as the current Ebola crisis, this means accessing health clinic reports, media updates, social media posts, information from public workers on the ground, transactional data from retailers and pharmacies, and travel ticket purchases alongside helpline data.

Being able to crunch data on this scale necessitates being able to do so quickly, collating and analyzing it in its native form as soon as it is produced across the globe — a process known as multi-center ingest. Combined with the vast quantities of public information already accessible via the Internet, big data can help ensure those working in hazardous environments are able to stay on top of ever-changing situations.

Only by crunching data on this scale can we truly determine in a timely manner whether containment policies, education campaigns and other preventative treatments are proving effective.

But one of the challenges with this is the underfunded public-health infrastructures in the affected regions, which make it difficult to get access to clean data. To date there is no internationally accepted epidemiological forecasting algorithm. The Ebola crisis has made this a priority in academia and government circles.

In the meantime, those working to fight the epidemic should use every tool at their disposal to help fight its spread. And used in the right way, big data can help authorities intervene much more effectively.

David Richards is co-founder and CEO of WANdisco, a public software company specializing in the area of distributed computing. It is a corporate contributor to Hadoop, Subversion and other open source projects. Reach him @davidrichards.

This article originally appeared on

Sign up for the newsletter Sign up for Vox Recommends

Get curated picks of the best Vox journalism to read, watch, and listen to every week, from our editors.