Last year, when industrial giant General Electric made a $105 million investment in the EMC-owned software outfit Pivotal, it puzzled a lot of people. GE, the maker of jet engines, generators and railroad locomotives, didn’t quite fit the traditional mold of a player in tech circles.
This is the same GE that spent $1 billion to create a tech hub in San Ramon, Calif., only an hour away from the heart of Silicon Valley. And it has been for the last several years on a persistent push to become just as well-known for its expertise in handling and analyzing the constant flow of operational data that its other products generate.
Today, GE hit a milestone in that campaign. Teaming with Pivotal, it announced that it had created what it calls a “data lake.” If you’re familiar with the concept of “big data,” then just think of a data lake as “bigger data.”
Here’s how it works: Let’s say you’re gathering data about the performance of engines on a fleet of jets — fuel consumption, performance, response time, operating temperatures. A common figure cited says all the sensors on GE-made jet engines and elsewhere in the plane will generate about 14 gigabytes of data on an average commercial flight.
Each bit of that data is assigned a unique ID number and then poured into a massive storage trove. It’s not put in folders or individual directories; it all resides in its raw format in one big “lake” of data. When you’re ready to conduct your analysis, each bit of raw data is in there waiting for you to scoop it up with other bits of data that fit whatever question you want to answer. GE did exactly this in a trial last year with data gathered from 15,000 flights on 25 airlines.
There are two tech forces colliding here. One is Hadoop, the open source data analytics platform that you hear so much about these days. Pivotal has built part of its platform-as-a-service business around it.
The second is Predix, GE’s internally built platform for connecting different kinds industrial equipment. The platform is already in use across GE. And last month, GE CTO Mark Little talked about how he’d like to extend it to third parties across several industries.
The data lake approach streamlines a bunch of fundamental steps in the analytics process. Typically, when you want to perform these analytics actions, you have to spend a lot of time, effort and money on getting the data into the right format. Here it’s left in its original, pure format.
The point of all this is, naturally, to learn more about whatever complex system you’re operating — jet engines, factories, oil platforms — so that it can perform better, faster, more efficiently and at lower cost.
In its trial with the planes, GE says it cut the amount of time required to do its analysis from months to days. One airline using the approach shaved its fuel prices down by one percent. That may not sound like much, but when you consider that the major U.S. air carriers spent more than $46 billion on jet fuel last year, a savings of one percent turns out to be real money. By next year, GE says it hopes to be collecting data on 10 million flights a year.
And it doesn’t stop with airlines. Other industries could use the data-lake approach to sniff out new efficiencies of their own. Expect to hear more about data lakes in the coming months.
This article originally appeared on Recode.net.