In the late ’90s, Tomi Poutanen, a precocious computer whiz from Finland, hoped to do his dissertation on neural networks, a scientific method aimed at teaching computers to act and think like humans. As a student at the University of Toronto, it was a logical choice. Geoffrey Hinton, the godfather of neural network research, taught and ran a research lab there.
But instead of encouraging Poutanen, who went on to work at Yahoo and recently co-founded media startup Milq, one of his professors sent a stern warning about taking the academic path known as deep learning.
“Smart scientists,” his professor cautioned, “go there to see their careers end.” Hinton’s lab was seen as a renegade project, more the stuff of science fiction than vocation.
Now, a couple decades later, scientists are racing to start their careers in the field. Once passé, deep learning, the subset of artificial intelligence focused on teaching machines to find and classify patterns in mass quantities of data, is now de rigueur across Google, Facebook, Microsoft, IBM and a host of other Silicon Valley companies. The trend has ignited an expensive race to scoop up scarce talent. And much of that expertise ties back to a cabal-like group of researchers who kept the futuristic field on life support 15 years prior.
In the decades since Poutanen entered Toronto, deep learning fell into what’s often called an “AI Winter” — a period, typical of the ambitious community, where the promise of theory fails to meet practical applications. Financial support dries up and researchers lose interest. Scientists had developed advanced theories of how neural networks operate, but lacked the computing power and data to put them to work.
Three computer scientists, Hinton, Yann LeCun and Yoshua Bengio, apparently missed the memo.
They toiled away at their own labs and at a research institute in Toronto called CIFAR, chiseling away at the abstract computational methods. The trio jokingly referred to themselves as the “deep learning conspiracy.” Others called them the “Canadian Mafia.”
Their bet on the tech has paid off handsomely. In 2013, Hinton was hired as a distinguished researcher at Google, where he works on its expanding deep learning division; LeCun was tapped to lead Facebook’s AI efforts later that year; and last week, IBM announced it was working with Bengio, a professor at the University of Montreal, to infuse Watson, its super-computer, with deep learning. Re/code spoke with LeCun, Bengio and a bevy of experts in the field, many of whom pointed to the dogged work of the trio as the foundation for the next frontier in AI technology.
“In the lean times when no one believed in neural nets, these are the people who really kept the torch burning and really inspired a lot of people,” explained Rob Fergus, a former LeCun colleague who followed him to Facebook.
Fruits of their efforts are already starting to appear in front of consumers, with deep learning woven into products like the new Google Photos app and in the facial recognition technology infused in Facebook’s new app, Moments. And Facebook is considering a personal assistant product within Messenger, technology that could lean on deep learning’s computing prowess, according to a report yesterday in The Information. (Facebook declined to comment.)
Disciples of the three researchers are benefiting, too, and many of those who aren’t hired by bigger players out of grad school are gobbled up as part of acquisitions. Twitter absorbed numerous former students of the three researchers with its two recent AI acquisitions. Among the ranks of DeepMind, the secretive AI company that Google bought last year, are several of the Canadian Mafia’s adherents.
The interest in deep learning is similar to when “big data” was fashionable inside tech circles not so long ago. In many ways, this is its next iteration. Conversations with more than a dozen AI experts suggest that deep learning could soon be the backbone of many tech products that we use every single day.
Fifteen years ago, Yann LeCun was an outcast.
That’s how Fergus, who’s on leave from NYU to work at Facebook, remembers his friend and mentor from the early 2000s. Fergus recalls an image recognition workshop from that time. He had just started his PhD at Oxford, and LeCun was there. Thick-set, with black-rimmed glasses, LeCun now speaks quickly and confidently in the heavy accent of his native France.
But in this instance, LeCun didn’t wow the audience with a new theory. In fact, what Fergus remembers is how LeCun was relegated to the sidelines.
“It was clear that he was an outsider,” said Fergus. “He was talking about these methods. Everyone was all, ‘Yann, yeah, we felt we had to invite him. These models he’s talking about he’s been working on for years and they’ve never really showed anything.’”
LeCun specialized in a type of deep learning called convolutional networks that focused on recognizing visual patterns in pixels. One of its earliest applications allowed banks to scan and then register hand-written checks.
In 1987, he joined Hinton’s lab at the University of Toronto. Hinton was seen as the pioneer in training neural networks with multiple layers, a computing technique that gives AI greater recognition capabilities — and moves its intelligence closer to something like the human brain.
Hinton, 67, is revered as the elder statesman of the field. He’s a math professor from central casting: Lean, foppish hair, sweaters and Oxfords, articulate, British. Former students describe him an affable instructor, a feverish thinker and a bit of an eccentric. (A favorite lab trick: Juggling grapes in his mouth.)
Marc’Aurelio Ranzato, a researcher who studied under Hinton and now works for LeCun at Facebook, said his current boss can be introverted, while Hinton is more the witty extrovert.
“He has ideas at a rate that it’s a little hard to fathom,” said Michael Mozer, an alum of his lab who now teaches at the University of Colorado. Those ideas mostly center on his conviction that machine learning mimics human development. Mozer repeated a typical Hinton refrain: “‘I know this is crazy, but just suspend your disbelief and suppose this is what’s going on in the brain.’”
After Hinton’s lab, LeCun went to work for AT&T Bell Labs, the defunct moonshot division. There he met Bengio, a fellow Frenchman, and the pair pushed on the edge of the theories advocated by Hinton. “We innovated in many exciting ways,” Bengio recalled.
But the theory outpaced computational power. And, more critically, it stretched beyond the amount of data at the neural network’s fingertips. Corporate backing shriveled, as did academic interest. Winter came. “By the mid-90s, the waves turned,” Bengio said. “By 2000, I had a hard time convincing my grad students to work on this.”
As Fergus explained, neural networks had become “boring.”
Return of the machine dream
The deep learning renaissance began, like so much on the Internet, with cats.
In 2012, the Google “Brain” team — a unit born in Google X with the audacious aim to build the largest artificial neural network, an AI brain — released a seminal finding: They sat the brain in front of millions of YouTube videos and, without input on feline features, it began spotting them.
It was great publicity fodder. But it also stood out in the research world as one of most prominent applications of “unsupervised learning” — machine intelligence with unlabeled data. YouTube was key, too. According to several deep learning experts, the biggest difference between two decades ago and today is the availability of large-scale datasets, like the world’s largest video library.
“Deep learning has been around for decades. The main reason it has been taking off in the last few years is scale,” said Andrew Ng, the scientist who launched the Google Brain team (called the deep learning team) and currently runs a similar team at Chinese search giant Baidu.
Before he left Google, Ng helped recruit Hinton, who brought his ivory tower quirks with him. At Google, he primarily works to get the AI experts “rallied around more speculative ideas,” explained Jeff Dean, Google’s senior fellow who leads its deep learning efforts. “The main thing he brings is lots of interesting ideas and how to take what we’re doing and look out five years.”
Hinton remains a part-time professor in Toronto, although Dean said he is “winding down” his time there.
With the deep learning team, Google also made a critical internal shift in how it structures its research operation, one that several in and outside the company said enabled Google to better apply cutting-edge AI to products.
Around the time Ng began the operation, Google unhooked its AI researchers from its product teams. Instead, they act as sort of AI mercenaries: The teams develop machine learning advances, including modular software, share them company-wide, then particular product teams request their expertise.
“We can turn experiments around quickly,” Dean told Re/code. “We can start with a model, frame it and have an answer within, ideally, hours or a day rather than weeks or months.” Currently, 100 teams inside Google use neural network tools, he said.
Google’s restructuring informed how its rival set up its AI operations as well.
When LeCun took the helm at Facebook’s AI team, he intentionally kept his team separate from product teams, a tactic to avoid the pitfalls of Google’s earlier formation. Facebook’s unit is now 50 strong, with a new office in Paris that opened last month. LeCun plans to keep expanding, and has already built a respected group — including Fergus and fellow NYU professor Chris Bregler — thanks to his reputation and connections in an industry where two degrees of separation is the norm.
Sure, their work on techniques like image recognition and speech recognition will align with Facebook’s product vision. But LeCun stressed it was still a research unit, with the freedom and abundant resources to pursue long-term projects.
And typical of research units, deadlines are a rarity. “There may or may not be products that come out of this for the next two or three or four or five years,” LeCun told Re/code. “It’s not clear. They may come faster, they may not.”
Drain for the brain
The pace of the AI field, however, is clear. As deep learning has grown in popularity, scores of companies are racing to scoop up scarce talent while computer scientists are finally starting to flood into the field.
Bengio, who struggled to find students for his lab a decade ago, has seen it grow from 15 to 60 in the past few years. Still, many of those students are getting snatched up by tech companies — some before they even finish their degree. “I am a bit disappointed to see many of my former students who would have been in academia go in that direction,” Bengio said. “It’s hard to find the more senior experts in deep learning.”
Money is at play. An engineer proficient in deep learning can earn upward of $250,000 a year at places like Google and Facebook, according to several sources; exceptional or more experienced ones can net seven-figure salaries.
“There’s been a huge brain drain from academia,” said Naveen Rao, the CEO of Nervana Systems, a heavily funded deep learning startup. (Bengio is an adviser.) Valley firms are taking up the mantle. That tends to push research in their preferred direction, advancing models that, for instance, work best for smartphones or search, Rao argued. DeepMind is working directly with Google’s search or Knowledge unit. “It’s always a little bit biased,” Rao said. “It always has a slant.”
(Google’s Dean responded that his division conducts and publishes research on a wealth of topics, but those immediately applicable to products net more attention.)
Another concern is the insularity of the deep learning triad. Hinton, LeCun and Bengio remain close. They co-authored a paper in “Nature” just two months ago. (LeCun invented geeky meme jokes about Hinton’s genius on his website.) Within the tight-knit circles of AI experts, some see the trio hogging the limelight. Even Bengio conceded that they may be getting too much credit.
“There’s too much emphasis on [us],” Bengio said. “We’ve been lucky in some sense. We invested in the right things.”
While younger, Ng is cited for helping to bring the discipline from academia onto Silicon Valley campuses. Juergen Schmidhuber, a Swiss AI researcher, is also credited for trumpeting deep learning methods through its dark period. Several DeepMind researchers came from his lab. He’s also a vocal critic of the Canadian Mafia. He penned a scathing critique of their “Nature” article, arguing, as he reiterated in an email, that they “cite each other but fail to cite the people who actually invented the central methods of deep learning.”
Scientific advancements tend to come inch by inch, with each new method piggybacking on the hard work of researchers to come before it. But still, ask almost anyone in the field and you’ll come back to the three names: Hinton, LeCun and Bengio.
“I think there’s lots of people who have been instrumental in making this field successful,” explained Jerome Pesenti, VP of core technology for IBM’s Watson. “These people deserve credit as well, but they were not paying attention to it 15 years ago. When you’ve been at it before everybody else, there is credit to be gained there.”
This article originally appeared on Recode.net.