What’s your “Next-Flick”? An Introduction To Recommendation Systems

How do you decide what movie/tv show to watch next? Most people find out about new movies from friends, people who are likely to be similar to themselves. While data science can’t make people friends (yet) it can attempt to mimic certain aspects of friendship when it comes to discovering new movies.That’s a scenario where the idea of recommendation systems can come into play.

Remember the TV show house of cards? Yea before Kevin Spacey…. Yea that show! The political drama that premiered on Netflix February 2013, it was Netflix’s first foray into original programming. It was a huge success and became the most watched show in its service library. Not sure if it still holds that title, Netflix aren’t as forthcoming with their data. Nevertheless, question is how did they manage to pull that off? Big data! The company used data they had amassed from subscriber behavior to determine what would be a hit. This was one of the most sophisticated attempts at data driven programming seen at the time. Netflix saw that subscriber who watched original series were also most likely to to watch movies directed by David Fincher and also enjoyed ones that starred Kevin Spacey. Considering the material and players involved , Netflix was sure that an audience was out there.

Taking a step back to investigate why this great achievement was possible, in 2012, for the first time ever, Americans watched more moves legally delivered via the internet than on physical formats like Blu Ray or DVDs. This shift also marked a major shift in how much information the providers of online programming could gather about our viewing habits. Netflix were pioneers at the frontline of the intersection of Big Data and entertainment media.They put their money where their data was and with $ 100 million for a 13 episode season, “House of Cards” was a first real glimpse of this new world.

“We know what people watch on Netflix and we’re able with a high degree of confidence to understand how big a likely audience is for a given show based on people’s viewing habits,” Netflix communications director Jonathan Friedland told Wired in November. “We want to continue to have something for everybody. But as time goes on, we get better at selecting what that something for everybody is that gets high engagement.”

The scope of the data collected by Netflix from its 29 million streaming video subscribers is staggering. Every search you make, every positive or negative rating you give to what you just watched, is piped in along with ratings data from third-party providers.Location data, device data, social media references, bookmarks. Every time a viewer logs on he or she needs to be authenticated. Every movie or TV show also has its own associated licensing data. A lot of work went into handling this viewer generated data and make sense of it, the wizardry of “Data Science!”

The type of data Netflix uses to come up with viewers characteristics is endless, they also consider things like volume, colors and scenery that might give valuable signals about what viewers prefer. All this data means Netflix has an “ addressable audience” . Unlike traditional broadcasting networks or cable companies, Netflix doesn’t have to rely on hindsight to determine if the content they are throwing at the audiences is a hit or miss.

That being said, data-centric decisions doesn’t guarantee a hit making success. But in my opinion if properly considered will surely increase your chances.In data science, we talk in probability, and we calculate confidence intervals based on the facts we have gathered, aka the data, hence….I digress.

There is always a questions of where does this Big Data approach all leads? for better or worse? what does this mean for the creative process? how does this affect the directors approach in the editing room, armed with the knowledge that certain subset of subscribers are opposed gruesome torture scenes? If Netflix is always giving us what we want, how would we be exposed to new and different movies and TV shows we have never imagined unless given a chance? We’ve seen what happens when news publications specialize in just delivering online content that maximizes page views. It isn’t always the most informative spectacle.

For years Netflix has been analyzing what we watched last night to suggest movies or TV shows that we might like to watch tomorrow. Now it is using the same formula to prefabricate its own programming to fit what it thinks we will like. It’s certainly possible to overstate the case here. One could argue that Netflix’s strategy is just a more sophisticated version of what’s been already in place, since forever. We wouldn’t be seeing superhero or zombie movies every time if the money that bankrolls the content creation business hadn’t already decided that’s what we want to see. Popular actors getting more roles . So what else is new?

There is a level of specificity made possible by big data that suggest we are heading into or already in new territory. “House of Cards” was just a symptom of a society-wide shift. We see that daily with online advertisers, they gather vast amount of information about us from our smartphones, facebook, and our google searches.

The sheer amount of data available is already phenomenal and growing at an exponential rate. The companies that figure out how to generate intelligence from this data will know more about us than we know about ourselves. There is always the fear that they would craft techniques that push us towards where they want us to go rather than where we would go by ourselves left to our own devices. However, I’ll argue that they only suggest to use what we are most likely to prefer. Yes, it’s true that this good for these companies’(like Netflix) bottom line, but the choice is always ours, happy subscriber or mindless puppet?

That’s my anecdote on recommendation systems/engines. Hopefully, it shows a very practical application recommendation system in data science. Subsequently, I’ll be walking you through how these recommendation systems work on a somewhat high level. But this way you already know the outcome, hence it is easier to follow the how. My plan is also to share with you all a project I worked on building a recommendation system, the codes (python) and mathematics (linear algebra) behind it. Thanks for hanging with me through my rant, see you on the next one!

Data Scientist