The Netflix Prize: How a $1 Million Contest Changed Binge-Watching Forever
"We need to go win a million dollars." Lester Mackey was just a senior computer science major at Princeton when a friend burst into his dorm room in a hysterical fit of excitement. "We need to do this."
In October 2006, Netflix, then a service peddling discs of every movie and TV show under the sun, announced "The Netflix Prize," a competition that lured Mackey and his contemporaries for the computer programmer equivalent of the Cannonball Run. The mission: Make the company's recommendation engine 10% more accurate -- or die coding. Word of the competition immediately spread like a virus through comp-sci circles, tech blogs, research communities, and even the mainstream media. ("And if You Liked the Movie, a Netflix Contest May Reward You Handsomely" read the New York Timesheadline.) And while a million dollars created attention, it was the data set -- over 100 million ratings of 17,770 movies from 480,189 customers -- that had number-crunching nuts salivating. There was nothing like it at the time. There hasn't been anything quite like it since.
Why the hell would a tech giant even do that? While it's common for successful corporations to protect their data like pirates guarding treasure, at the time CEO Reed Hastings was looking for a way to increase the efficiency of Cinematch, the software the company rolled out in 2000 to recommend movies you might enjoy. (If you liked The 40-Year-Old Virgin, check out Superbad.) Over the years he'd recruited brilliant minds to tinker with the magic formula, but they'd hit a wall. He needed results. Fresh ideas. Innovation.
It's the same impulse that led the company to make another drastic change to their user interface earlier this year: At a press conference in March, VP of Product Todd Yellin announced the five-star ratings would be replaced with a new thumbs-up-or-down system. The star ratings, which drove much of the data and excitement around the Netflix prize, are dead. But the story of the Netflix Prize lives on. This is how a super-squad of nerds from across the globe changed Netflix, and the field of artificial intelligence, forever.
Following the announcement (and his college buddy's Braveheart-like rally cry), Lester Mackey became one of the 30,000 Netflix enthusiasts who downloaded the trove of information and set out to unlock the secrets of the recommendation algorithm. His team "Dinosaur Planet," which included his Princeton classmates David Lin and David Weiss, initially tinkered with the project out of curiosity and excitement. But quickly they were hooked.
"We didn't really think we could win it," Mackey explained to me over the phone recently. "But we thought it would be a fun activity, something that would allow us to use what we were learning in our computer science classes and our math classes. And a good opportunity to learn about the field of machine learning, which we hadn't really encountered before."
The contest didn't just catch the attention of college students with time to kill: An hour east of Princeton, in Middletown, New Jersey, the Netflix Prize announcement caught the eye of Chris Volinsky, head of a statistics research group at AT&T, and his team, who regularly read blogs to see what was going on in the emerging data science world. "This was before 'Big Data,'" he tells me, and therefore a Big Deal. He pulled his group together and asked who wanted to poke around at the data set. He didn't know the contest would stretch on for years.
Hobbyists, academics, and professionals weren't just drawn to the contest by the potential payday. The revelations were just as enticing; because the winners would retain ownership of their work, a contestant like Volinsky could also pitch management at AT&T on devoting time and resources to the project. (The rules of the contest only stipulated that the winning team would have to license its work non-exclusively to Netflix.) Most importantly, the data was just plain interesting: an unruly mess of insights into taste, behavior, and pre-streaming viewer psychology. As Chris Volinsky put it, "Everyone likes movies."
"Hastings was a tech-age Willy Wonka letting any curious hacker into his digital Chocolate Factory"
Everyone does like movies, and that was undoubtedly in Hastings mind when he first came up with the Netflix Prize. As Gina Keating described in her book Netflixed: The Epic Battle for America's Eyeballs, Hastings had lofty ambitions for the competition -- along with a family connection to the grand tradition of intellectual incubation. Prior to World War II, his great-grandfather Alfred Lee Loomis, a philanthropist and scientist, established Loomis Laboratory in Tuxedo Park, New York. It was at this area near his mansion where, Keating writes, Loomis "enticed world-famous scientists to his physics lab by dangling cutting-edge equipment, luxurious accommodations, and generous stipends."
Hastings didn't have to dangle as much. While previous crowd-sourced competitions like the Ansari X Prize, which awarded $10 million to the inventors of a reusable spacecraft in 2004, required tremendous amounts of money and physical resources, the Netflix Prize was comparably low-tech. You just needed a computer and an internet connection to compete. One of the competition's micro-celebrities, an English psychologist named Gavin Potter, went by the screen name "just a guy in a garage."
In a way, Hastings was a tech-age Willy Wonka letting any curious hacker into his digital Chocolate Factory. Instead of a chocolate river, he offered a gushing stream of data. But the winning "Golden Ticket" wasn't hiding in an ordinary candy bar. It was locked away in your brain.
The actual work of winning the Netflix Prize wasn't sexy. If you knew nothing about computers, you weren't going to download the data set and discover you're actually Matt Damon in Good Will Hunting but for recommender algorithms.
Even contestants like Mackey, who would go on to earn a Phd in Computer Science from UC Berkeley, or Volinsky, who had the support of his AT&T colleagues, had to spend hours learning about topics like "machine learning," a highly specialized subfield of computer science. They'd then adapt their insights to the figures provided by Netflix. "In the first few months we managed to do better than some of the baselines," explains Mackey. "That was when we really got excited. We were like, 'Wow, we can beat these very simple baselines after all. Maybe we have a chance of doing something in the challenge.'"
What made the contest so hard? For one thing, the sheer range of the data: Some users in the set had only reviewed 9 movies, which was the minimum to be included, and gave the contestants very little information to get a good read on their tastes. On the other hand, some Netflix obsessives -- you know the type -- had rated over 900 films. The range problem also applied to the individual titles: A blockbuster like Batman Begins would have hundreds of thousands of star-ratings, but some obscure movies only had 20 ratings in the whole set.
Later in the competition, factors like what day users rated a movie on -- maybe you're a crankier critic on Mondays -- were incorporated into mathematical models built during late night coding sessions. Dry-erase markers scribbled across white boards. Notebooks piled up. Brains fried. All the math was in pursuit of an algorithm that would precisely identify the star rating any user would give to any movie, but, as Volinsky explains it to me, the incongruity in the amount of information boggled the mind.
"They needed that sense of camaraderie to jump the competition's major hurdles. Like Napoleon Dynamite."
By most accounts, the earliest days of the Netflix Prize saw some of the biggest technological leaps. A user who went by the pseudonym "Simon Funk" adapted an approach he'd previously worked on of incremental "singular value decomposition" (SVD), which, when applied to the Netflix Prize data, provided an automated method to finding similarities between the movies users loved or hated. Unlike Volinsky or Lackey, he wasn't affiliated with a university or a research company.
"Funk" was about to hop on a plane to New Zealand when he received an email from his friend Vincent DiCarlo, a lawyer with an interest in emerging technologies who he had worked with on a previous project. DiCarlo could sense that the challenge matched up well with his friend's previous research around collaborative filtering. Using these methods, the "Simon Funk" team shot up the leaderboard to the number 4 position, but it couldn't get them to the 10 percent efficiency improvement necessary to win the prize. Uninterested in working on the project anymore, "Funk" posted his results online for other teams to use.
"We were all exchanging ideas," says DiCarlo, who continued to work on his own approach and follow the contest after "Funk" dropped out. (To this day, he serves as a spokesman of sorts for the "Simon Funk" team.) "Everybody adopted Simon's ideas and then used them to come up with new ideas. One of the big things that came out of it was the discovery early on that blending results that you got using different methods produced a kind of mysterious improvement in the predictive value of the results."
The forums, which are still viewable using internet archives, were a hotbed of problem-solving, discussion, and joyful discovery. Imagine a digital summer camp for researchers. "It was so much fun," says Mackey. "The contest was structured so well. We had to learn so much to be competitive and I met so many people along the way." As stressful as it could get, especially as the improvements to the algorithm slowed down and the years rolled along, the teams were buoyed by a sense of community.
They needed that sense of camaraderie to jump the competition's major hurdles. Like Napoleon Dynamite. As deeply chronicled in Clive Thompson's New York Times Magazinearticle about the competition, the cult teen comedy and other "culturally or politically polarizing" films like Fahrenheit 9/11, I Heart Huckabees, The Life Aquatic With Steve Zissou, Lost in Translation, Kill Bill: Volume 1, and Sideways -- all caused headaches for the Prize-seekers. Basically, mid-00s art-house movies don't respond well to algorithms.
Though progress prizes were given to the leaders in 2007 and 2008, no team had cracked that mythical 10% improvement figure. The solution to the problem? Combine forces like Voltron.
Lester Mackey, who speaks in a soft monotone, becomes audibly excited when discussing the end of the competition. Almost a decade later, but he can take you right back to the final days. By that point, Dinosaur Planet had been folded into a 30 person mega-team called The Ensemble, itself formed to compete with the super-group BellKor's Pragmatic Chaos. (Yes, most of these team names sound like titles for sci-fi novels.)
The BellKor squad included Chris Volinsky and his AT&T colleagues Robert Bell and Yehuda Koren, along with four other engineers from the United States, Austria, Canada and Israel. The global effort paid off: On June 26, 2009, BellKor's Pragmatic Chaos finally crossed the 10% finish line, triggering a 30 day window to submit a better algorithm. Mackey and his Ensemble teammates were feeling the heat.
"We had, of course, not slept for the past two days because the contest was about to end," says Mackey. "We knew we were close to BellKor's Pragmatic Chaos but we didn't know if we had passed them for a day. Days prior we had actually managed to inch ahead of them. But knowing that team, we knew they could easily strike back with something better."
Towards the end, the solutions had become almost absurdly complex. Mackey was mostly working as "The Blender," taking in all the algorithms and code produced by the various team members and trying to turn it into a final prediction. Minutes before he was supposed to submit his team's answer, Mackey received an email from his team member Peng Zhou, who alerted him to a better combination on their server. But when he wrote back to ask for the file name, he got radio silence. Unbeknownst to him at the time, Zhou had lost his internet connection.
Desperate, Mackey frantically searched through around a thousand files and found the right one, submitting it right as the last few seconds on the clock ticked away. But would it be enough to win? Or did BellKor have something else up their sleeve? There was no way of knowing.
"There was a fair amount of intrigue and backroom dealing going on," says Volinsky with a chuckle. "There turned out to be a lot of interesting espionage and trying to figure out who was working with who, who was in bed with who, who is giving so and so information."
On September 18, 2009 BellKor Pragmatic Chaos officially won the competition by a tiebreaker. By then, The Ensemble had actually matched BellKor's winning score... 20 minutes too late. As Reed Hastings told the New York Times soon after the contest came to a close, "That 20 minutes was worth a million dollars."
But the Netflix Prize was never really about the money: Volinsky tells me the cash that was awarded to AT&T was given away to a charity that provides STEM education for underprivileged communities. In the year following the competition, some argued that the prize wasn't even about the algorithm, noting that Netflix didn't end up using the final winning version. One particularly obnoxious column on Forbes even went so far as to dub the contest a "failure." The truth is more complicated.
"The first year of the competition, in 2006 and 2007, the technical advancements that were made by us and the other big teams I think were really significant in the field of recommender systems," says Volinsky. He thinks the idea that Netflix didn't use the results is a misconception. "We gave them our code. They definitely did implement and use those breakthroughs that we made in the first year."
The effect on the business world is harder to quantify but maybe equally impactful. When asked about the legacy of the Netflix Prize, almost everyone I spoke to for this story mentioned Kaggle, a platform founded in 2010 that hosts analytics competitions that was acquired by Google earlier this year. Similarly, the organization Driven Data has crowd-sourced data competitions for socially-conscious causes. Following in Netflix's footsteps, companies like Yahoo and Zillow have released massive data sets as well, while the Heritage Health Prize offered a $3 million reward for the engineer who could predict how long people will be hospitalized for.
One company that's not in the data competition game any more: Netflix. After wrapping up the first Netflix Prize, the company immediately announced a sequel that promised trickier problems and even more information about consumer demographics. "The new data set, providing more than 100 million data points, will include, among other things, information about renters' ages, genders, ZIP codes, genre ratings and previously chosen movies," read a press release from the company. "As with the first Netflix Prize, all data provided is anonymous and cannot be associated with a specific Netflix member."
"'That 20 minutes was worth a million dollars.'"
Unfortunately, like many sequels that promise to be bigger and better, Netflix Prize 2 never made it off the ground. Following the publication of a paper by two University of Texas researchers -- Arvind Narayanan and Vitaly Shmatikov -- that argued the Netflix Prize data could be de-anonymized using background knowledge obtained from the Internet Movie Database, a group of Netflix users issued a multi-million dollar lawsuit against the company, filed in federal court in California for privacy invasion. The FTC would soon get involved. In March of 2010, the company announced they'd settled the lawsuit and made changes to how they would use data in future research programs. The second Netflix Prize was dead.
While the case was perceived as a victory for web-privacy advocates, there was a sense of disappointment in data science circles. "I think it was really unfair because Netflix behaved really well and were good stewards of their customers data," says Volinsky when asked about the cancellation. "They did everything right technically, and unfortunately it came to be perceived in the press that they had released their customers data illegally, which was really not the case."
Netflix was already moving onto a different data set at that point: The introduction of streaming video in February of 2007 gave the company more precise information about actual viewer behavior. The recommender system became less about how you rate media and more about what you actually consume. It doesn't matter if you give critically acclaimed foreign dramas five stars if you spend all your waking hours bingeing The Ranch. "I imagine that the nature of the problem has changed a lot since they've changed into a streaming company," says Mackey. "They collect a lot of other feedback that they didn't really have back then. Now you can observe when someone stops a movie or replays a section -- or only watches two minute of it and never comes back."
A Netflix representative reached for comment says the company isn't particularly interested in reliving the past glories of the Netflix Prize. Like any Silicon Valley behemoth with an eye on the future, they would prefer to talk about what they're developing now. She directed me to the Netflix tech blog, which is filled with detailed, complex breakdowns that aren't too different than what you might have found on the old Netflix Prize forum back in the day. Only now the insights about Big Data and algorithmic strategy come from in-house, be it from engineers, developers, or Bill Nye the Science Guy. Random hackers in their garages? Not so much.
That shift -- from the crowdsourced wild west of a web forum to the benignly smooth aesthetics of a Medium post -- perhaps signals a larger change in online culture. Netflix is a completely different company now, a major player in Hollywood and a cable TV disruptor. With over 100 million global subscribers, the former DVD-mailing enterprise is now, by most recent estimates, worth more than $70 billion.
Even if the contest didn't play out like many might have first imagined it -- one brilliant genius scoring a million dollar jackpot -- it instead helped make large strides in the fields of artificial intelligence, machine learning, and recommender systems. Beyond the innovation, this might be one of those cases where the the old cliché about how "the real treasure was the friends made along the way" might be true. It was the code-swapping world maintained online -- many of the contestants didn't meet until a final award ceremony in New York -- that made the prize so special.
"There's definitely a sense of community and a sense of camaraderie," says Volinsky. "It was a great experience for everyone who participated in it and it was such a unique project and such a unique problem to solve that I think we'll all remember it as a high point of our careers."
At one point in our conversation, Volinsky tells me that early on in the contest he was so impressed with the company's visionary approach that he bought stock in them. Even if he didn't get to personally cash one of those over-sized checks, his interest in the company did pay off from a financial standpoint. "I thought this is a forward-looking company that has a chance to grow," he says. Then he laughs to himself. "All told, that's been one of my best investments."