What’s your “Next-Flick”? (Part 2) Recommendation Systems demystified

Image Credits: hobbit (opens in a new window)/ Shutterstock

In my prior write-up, What’s your “Next-Flick”? An Introduction To Recommendation Systems, I tell the story of Netflix’s data-centric solution to tv programming that change the game till this day, with regards to how big data has become the major determinant of the decisions in the movie/Tv production and distribution industry as we know it today. For a company like Netflix , deciding what movie/ tv show to invest is one part of the problem big data provides solutions for, another even more important part is its distribution. With about 200million subscribers, with a constantly growing library , currently about 14,000 content of which about 10% is Netflix original (original content), there is the business question of how to get subscribers with largely varying preferences to easily get to the content they are interested in. The solution for this problem is achieved by recommendation(recommender )systems/engines

The automated process of filtering through entities ( in this example, movies), and suggesting to the users what is relevant to them or they might be interested in, based on data gathered is what is called recommendation systems. These are machine learning algorithms, designed sort through content and suggest relevant items to users. In the digital world it is impossible to escape recommender systems not that you want to any way. With vast amount of information, be it movies, news, articles, products to buy simply everything out there in the digital world, navigating through to find your preference is time consuming. Hence recommender system is critical to businesses in many industries and can be what make it stand out from its competition as is the case with Netflix.

How do recommender engines work?

There are two major paradigms of recommender systems: collaborative and content based methods

Collaborative method/filtering.

The collaborative method for recommender systems are methods are based solely on past interactions recorded between users and items in order to produce new recommendations. The main idea behind collaborative methods is that the past user- item interactions are sufficient to detect similar users and/ or similar items and make predictions based on estimated proximities.

The class of collaborative filtering algorithms is divided into two (2) sub categories

Memory based:

This approach directly works with values of recorded interactions, assuming no model, and are essentially based on nearest neighbors search. This neighborhood is based upon similarities in either the user to user or item to item .

User- User: Assume that we want to make a recommendation for a given user named James . First, every user can be represented mathematically( by its vector …let’ s focus on the idea for now) take note of interactions with the different items .Then, we can compute some kind of “similarity” between our user, James and every other users. That similarity measure is such that two users with similar interactions on the same items should be considered as being close. Once similarities to every user are calculated then we can use( k nearest Neighbor; an algorithm )to find the closest users to James (this users can be said to be like minded to James, you know bird of the same feather…) then we can suggest the most popular item amongst them, taking into consideration only the items James is yet to interact with. Another thing considered in this method is the total number of interactions by each user been compared has made, the strength of the similarity will vary

Item-Item: To make a new recommendation to a user, the idea of item-item method is to find items similar to the ones the user already “positively” interacted with. Two items are considered to be similar if most of the users that have interacted with both of them did it in a similar way. This method is said to be “item-centred” as it represent items based on interactions users had with them and evaluate distances between those items. The mathematics is similar to user-user.

There are different advantages and disadvantages for the memory based collaborative filtering as a whole. One big flaw is its scalability, it can be time consuming for big systems with millions of users.

Also, in most recommender engine algorithm it is important to take not of the “rich get richer” effect for popular items. This is when the system tends to recommend more popular items. This problem is especially prevalent in memory based collaborative filter.

Model based:

Model based approaches assume an underlying “generative” model that explains the user-item interactions and try to discover it in order to make new predictions.Model based collaborative approaches only rely on user-item interactions information and assume a latent model supposed to explain these interactions.

Content Based Method/ Filtering

Unlike collaborative methods that only rely on the user-item interactions, content based approaches use additional information about users and/or items. If we consider the example of a movies recommender system, this additional information can be, for example, the age, the sex, the job or any other personal information for users as well as the category, the main actors, the duration or other characteristics for the movies (items).

Then, the idea of content based methods is to try to build a model, based on the available “features”, that explain the observed user-item interactions. Still considering users and movies, we will try, for example, to model the fact that women tend to rate better some movies, that men tend to rate better some other movies and so on. If we manage to get such model(regression or classification), then, making new predictions for a user is pretty easy: we just need to look at the profile (age, sex, …) of this user and, based on this information, to determine relevant movies to suggest.

Hybrid Method

These methods, that combine collaborative filtering and content based approaches, achieves state-of-the-art results in many cases and are, so, used in many large scale recommender systems nowadays. The combination made in hybrid approaches can mainly take two forms: we can either train two models independently (one collaborative filtering model and one content based model) and combine their suggestions or directly build a single model (often a neural network) that unify both approaches by using as inputs prior information (about user and/or item) as well as “collaborative” interactions information.

In conclusion, recommendation system is very helpful tool in digital world, it provides a solutions to several business problems at the same time optimizing the user experience. All recommendation systems are not created equally. Depending on the type of content and user, different methodologies can be applied. Ideally a great recommender system should know the user better that they know themselves, just like that best friend we all have, it should connect them with the right content they didn’t know they loved.

Data Scientist