How YouTube determines what videos you watch

Do you ever watch YouTube for a long period of time and then suddenly find yourself watching “Top 10 frog sounds”, or some other random video you have no idea how you relate to? For a big part you are not the one choosing what to watch on Youtube. Neal Mohan, the Chief Product Officer at YouTube, said in a panel interview at CES that 70% of the content you watch on YouTube is determined by the YouTube algorithm. How does this YouTube algorithm work, and what does it make you watch?

The evolution of the YouTube algorithm

YouTube was founded in 2015 by Chad Hurley, Steve Chen and Jawed Karim as a simple video sharing website. In the beginning period of YouTube, the recommendation system was of course quite simple as well. Simply put, the algorithm just recommended the videos with the most views. For quite some time this seemed to work, until YouTube video creators found out that the only thing that mattered was if someone clicked on your video, not if they liked it. This lead to a phenomenon called “clickbaiting”, which meant putting a misleading title or thumbnail on your video in order for people to click on it. For users of YouTube this caused problems because they got recommended very boring videos, that had very exciting titles.

In 2012, the recommendation system changed fundamentally. Instead of the number of views, the algorithm looked at the average time spent on a video. This gave a better indication of how much a person actually enjoyed watching the video. The system still had one major flaw: It assumed everyone likes the same videos.

Scince 2015, the algorithm has changed to be more personal. Instead of finding the videos the average person liked the most, it now tried to find the best videos for every specific user. As you can imagine, this challenge is infinitely more complex.

So how does it work?

What we previously have called the YouTube “recommendation system” essentially exists out of 3 parts: The videos that pop up on your YouTube homepage, the search results when you search for a video, and the suggested videos after watching one video. All 3 parts essentially work the same, except that for the search results and the suggested videos the search terms and current video respectively have an big influence.

The algorithm consists of 2 neural networks (see figure 1). The first part is the candidate generation, in which a large amount of possibly suitable videos are selected out of the entire YouTube database. In the second part, these videos are ranked by a number of factors to finally determine what video to recommend.

Figure 1: The YouTube recommendation system architecture
(Deep Neural Networks for YouTube Recommendations, Covington, Adams, Sergin)

The precise way in which these factors influence the ranking is extremely complex, and the number of factors used is enormous. The factors can be grouped into three categories: personal, video performance, and external factors. Personal factors are a measurement of how well the video suits the users personal interests. This is mostly based on your preferences and your search history.

Video performance factors are a measurement of how well the video is doing on YouTube. YouTube does want to maximize watching time, so if the video is generally doing well in that aspect it is recommended quicker than some video that is not doing well with watch time.

External factors are things like the current hot topics, the time of the year, and even the time of the day. Some videos may be watched more in the winter, so YouTube will recommend them more in winter time.

In the coming decades, the YouTube algorithm will only become more important as the platform becomes more popular and more of a social influence. The algorithm will only become more complex, so let us hope this will only change it for the good!