YouTube Algorithm Explained

How To Quickly Grow Your YouTube Channel by Working With Instead of Against the YouTube Algorithm

Many small YouTubers struggle to grow their channel.

Although there are many reasons why your YouTube channel isn't growing, one of the most common ones is not understanding how the YouTube algorithm work.

In this article, I will show you how the YouTube algorithm works under the hood.

You will learn how the machine learning algorithm determines the topic of each video, how it groups similar YouTube channels together, how it understands what each viewer is interested in, and how it recommends new videos and channels that viewers aren't subscribed to.

If you are less interested in the nitty-gritty details of the YouTube algorithm, you can also read my more tactical guide on how to crack the YouTube algorithm.

How Does the YouTube Algorithm Understand the Topic of YouTube Videos

One of YouTube's top priorities is to have the ability to understand the content of each video, so it can then recommend the right video to the right person.

YouTube uses a wide variety of techniques to analyze each video.

It uses both direct and indirect metadata to extract keyword data from videos and video and audio analysis via complex machine learning algorithms.

Text-based analysis

The most straightforward analysis is based on text-based keyword extraction.

For this purpose, YouTube looks at a range of direct and indirect text-based attributes to extrapolate descriptive keywords that accurately reflect the content of its video.

Keyword analysis of title, description, and video tags

Users can add a video title, video description, video tags, and video category to their videos.

As with all user-generated meta-data, the problem is that it is often prone to errors, ambiguities, and incompleteness.

Because of that, YouTube must first sanitize and enhance all written metadata to make it useable within their recommendation algorithm.

Example:

Someone could write Instagram, Insta, gram, or IG, all referring to the same social media app. YouTube solves this ambiguity by internally grouping together synonyms with the same naming.

It then uses contextual data to understand the context of the "Instagram" keyword. Is someone talking about the company Instagram, the cultural impact of Instagram, how to use the Instagram app as a user, or about growing an audience on Instagram as a content creator?

The unified and enhanced metadata is then saved for each video for further analysis.

I recommend checking out my articles on how to title your YouTube videos and on how to optimize your video titles for more details.

Subtitles analysis

Video transcriptions in the form of subtitles are by far the best source of video metadata for content analysis.

If transcribed accurately, they represent a one-to-one conversion from spoken word to the written word, thus making it accessible for YouTube's machine learning analysis tools.

In essence, YouTube can use the same advanced technologies that its parent company Google uses to analyze billions of websites, pages, and blog posts for SEO.

Initially, YouTube relied on users to upload or transcribe their own subtitles.

Since this was a tedious job, only a handful of extremely committed YouTube users would actually add subtitles to their video.

Although subtitles were so much better for text analysis, they were kind of useless if only 1 in 100,000 had subtitles.

YouTube resolved this by heavily investing in its own voice recognition software that automatically transcribed every video upload and converted them into subtitles.

The problem with YouTube's transcriptions is that it sometimes not 100% accurate.

It often misunderstands what people say in videos and then uses wrong words in its subtitles, especially if the audio equipment is of lower quality, the background noise is very high, or if someone has an accent.

This has a couple of potentially very dangerous consequences for any YouTube content creator.

In the best case, YouTube doesn't associate an important keyword you want to rank for with your video.

Worst case, a harmless word could be interpreted as a bad slur, resulting in demonetization or an automatic community strike.

If you want to get the best possible results for your transcripts and subtitles, including proper capitalization and punctuation, I recommend checking out Descript and Otter for the best results.

Playlist analysis

YouTube will also use indirect, contextual metadata to understand what any given YouTube is about.

Is a specific video included in one or more video playlists created by your channel or by others? If yes, what keywords are included in the playlist's title and description?

YouTube might then analyze each playlist by merging all video titles within each playlist and then use pattern recognition algorithms to identify mutual keywords that accurately describe commonalities among all videos.

These commonalities in the playlists can serve as additional contextual information for each embedded video.

If a specific playlist contains incoherent data and noise, for example, YouTube will ignore the data because someone throws together unrelated videos.

You can quickly tidy up existing playlists with TubeBuddy's Playlists Action Tool.

Comment analysis

Another source of video metadata are YouTube comments. This is more relevant for larger channels that consistently get a high volume of comments for each video.

YouTube can search for specific keywords and patterns that indicate descriptive statements.

Example:

"Thanks so much. Your video has really helped me understand how to get more people to subscribe to my YouTube channel".

In this case, one of the search patterns might have "how to …"

Again, it is important to mention that YouTube doesn't take comments at face value. Otherwise, it would be easy for rogue actors to game the system to get an unfair advantage.

Instead, YouTube uses all data to increase its confidence level by confirming congruence across all data points.

If you want to see a glimpse of what YouTube is seeing, check out the TubeBuddy Comments Word Cloud.

Info cards and end screens analysis

The last video metadata sources are Info Cards and End Screens.

Are any of the embedded cards linking to specific videos, playlists, or external websites?

If so, what text is included in the title and description of each card?

In the case of YouTube videos, what metadata and content clusters are linked to each video? Are there similarities and overlaps between the source and target videos or playlists?

In the case of websites, what is each page's URL, page title, and HTML content?

The TubeBuddy Bulk Processing Tool for Info Cards and End Screens will save you a ton of time.

Visual analysis

YouTube uses an AI-based photo analysis tool to analyze video thumbnails and individual video frames based on Google's Cloud Vision AI.

The remarkable thing about Google's Cloud Vision AI is that you can openly access it to evaluate your own video thumbnails.

You can check out my article on how to design the perfect YouTube thumbnail for a step-by-step tutorial.

Here are some of the objects YouTube and Cloud Vision can recognize:

  • People
  • Faces
  • Emotions
  • Gestures
  • Clothing
  • Objects and attributes
  • Dominant colors
  • Style
  • Logos
  • Text recognition
  • Safe search rating
    • Adult
    • Spoof
    • Medical
    • Violence
    • Racy

To be honest, it is almost a bit scary how accurate AI has become over the years.

YouTube is using the same technology to analyze each video frame by frame.

Mostly to identify copyrighted content and any content that would result in a community guideline violation.

As a secondary benefit, YouTube is identifying people and objects within each of your videos.

If your video title was "How to make the perfect tomato salad" it would make a lot of sense to "see" some actual tomatoes in your videos.

This not only helps YouTube fight fake clickbait titles, but it is also a great source of extra metadata that might be difficult to express in text.

Let's say you had a video with the title "3 underrated locations in London" and the three locations that you showed in your video were "Borough Market", "Thames Barrier Park", and "Richmond Park".

If YouTube's AI could recognize specific locations purely based on visual recognition, for example, by converting a sign into text via OCR technology.

YouTube would then recommend your video to someone who was searching for "best food market in London".

This works in many cases, even if you do not include them in your title, description, tags, or subtitles.

Audio analysis

YouTube is also analyzing every moment of music, sounds, and spoken words of all videos.

The most obvious reason is, again, identifying copyrightable music for YouTube's Content ID system.

The additional audio data also provides valuable insights into what is going on within each video.

Specific songs are linked to specific artists, the genre of music, and other songs that might play well together.

Sounds and sound effects often communicate specific events. For example, a "meow" sound would indicate the presence of a cat.

The spoken words can indicate the presence of a specific person in your video.

Context analysis

YouTube also uses broader contextual data to better understand the topic of each video.

Channel

  • What do we know about the YouTube channel that is uploading a video?
  • What content clusters is the YouTube channel associated with?
  • What do we know about the users who are watching videos from this channel?
  • Which larger audience groups are watching videos from this channel?
  • Which demographic groups are watching videos from this channel?

External websites

  • Has this video been embedded in external sites?
  • If yes, what can we extrapolate additional information from the webpage content where a specific video was embedded in?
  • What else is this website publishing?
  • Who is the author of the specific article?
  • For which topics is this author known for?
  • What is the website authority score?

Viewer analysis

  • How did the viewers interact with this video?
  • Which time ranges have been watched, and how frequently?
  • Which time ranges have been skipped, and how frequently?
  • What is each viewer's click-through rate on video impressions?
  • What is the viewer's watch time in minutes/percentage?
  • What do we know about the collective of viewers with a low watch time percentage?
  • What do we know about the collective of viewers with a high watch time percentage?

How Does the YouTube Algorithm Find Similar YouTube Channels to Recommend

How does the YouTube algorithm link YouTube channels to individual topics, categories, and content clusters?

When you look at the videos posted on the CNN YouTube channel, you probably understand that they primarily focus on producing news content.

But how can YouTube determine the same for its millions of YouTube channels on scale?

The most straightforward approach would be to ask each YouTube channel to self-classify.

Just select the right category from a long list of channel categories, and everything will be okay?

Well, not so fast...

The problem with self-classification is that it is prone to errors.

A YouTube content creator might not have a clear channel direction or content strategy. Thus they might not know which category to select.

Creators might also be confused about which category to choose if they don't understand the definition or meaning of each category.

Sometimes they also might get overwhelmed, especially if the list of categories is very long.

The solution?

They are determining the YouTube channel category algorithmically!

Here is how YouTube can understand what YouTube channels are about.

They look at three different factors.

  • What is the content of each of their videos?
  • What are the most popular topics and themes in all their videos?
  • What traits, characteristics, and interests do viewers of their videos share?

All three pieces of information complement each other.

The more harmonious the data, the higher YouTube's confidence score that a particular channel belongs into a certain category and niche. The more likely YouTube will recommend your videos in the Suggested Videos section of YouTube channels in the same niche.

That is why it is so important to develop a clear YouTube content strategy for all of your videos. You can learn more about creating a content strategy from scratch in my YouTube content cluster strategy article.

How Does YouTube's Algorithm Determine What Each User Is Interested In

YouTube is tracking everything users do on their website.

Every mouse movement. Every click on a YouTube thumbnail.

How many percent of a thumbnail was visible on each page.

If a video is playing in the foreground or the background.

Which video thumbnails and titles were promoted to each user, and how many times?

What was the click-through rate of each video impression?

How many views does each YouTube video have?

What was the video watch time in minutes and percent after clicking on a video? What is the typical percentage of video viewing time by this user? Is the watch time of this video above or below average?

What did someone do after watching a specific video for the first time?

Did they engage? Did they press the like to dislike button? Did they write a comment? Was it a positive or a negative comment statement based on sentiment analysis?

Did the user expand the video description?

Did a viewer share the video? If yes, on which platform? And what was the most likely reason for sharing?

Did they subscribe to a YouTube channel? If so, on what page? If on the video page, how many percent of the video did they watch?

Did they explore specific YouTube channels? Which video titles were visible to them? Which did they click on?

Did they add videos to specific playlists? What was the title of this playlist? What were the themes and topics of other videos on the same playlist?

Did they press the watch later button?

What are the associated content clusters, themes, and topics of this video? Has this user been interested in any of these content clusters based on past behavior and watch history? Are there typically overlapping interests between two content clusters, for example, Linux and developing software?

YouTube then uses all these data points to feed its machine learning algorithms to find patterns across multiple users.

To summarize.

YouTube first classifies videos to understand what each video is about.

It then groups topics into broader content groups and niches.

If someone watches a specific video, its associated content clusters are linked to the user profile as interests.

The more videos someone watches from the same content cluster, the more likely it will recommend videos from the same category to this user.

This analysis allows YouTube to understand what each user is interested in, but it's not limiting itself to just video analysis.

YouTube also uses co-watch data to further refine the interests of individual users.

The YouTube Algorithm Creates Distinct Demographic and Interest Groups Based on What People Watch

How does the YouTube algorithm link individual viewers to broader content clusters?

YouTube groups users based on common interests and associated content clusters, so it can make recommendations based on the history of other users and new videos that have been added to specific content clusters.

This feature is very similar to the Amazon recommendation algorithm for frequently bought-together suggestions and the Facebook lookalike audiences.

YouTube is recording the watch history of every logged-in user on its platform.

It then looks at all people's watch history simultaneously and calculates the average distance between any two videos within the same session. The shorter distance if someone watches both videos back-to-back, the longer distance if they watch a couple of other videos in between.

Once YouTube has calculated the average distance between two videos, it can link together uses with similar watch history and recommend new videos based on users with a common interest.

A more advanced system can combine even more data sources.

What Is the YouTube Algorithm's Process for Identifying New Topics and Content Clusters

YouTube video categories

In the early days, YouTube tried to manually define a list of possible video categories and asked users to select one of the following topics for each of their videos:

  • Film & Animation
  • Autos & Vehicles
  • Music
  • Pets & Animals
  • Sports
  • Travel & Events
  • Gaming
  • People & Blogs
  • Comedy
  • Entertainment
  • News & Politics
  • How-to & Style
  • Education
  • Science & Technology
  • Nonprofits & Activism

You can still find the video category setting on your video settings page, although it is irrelevant and ignored today.

Self-classification caused numerous problems because video creators did not understand how each category was supposed to work and how categories were supposed to be different from each other.

The result was inconsistent labeling.

Freebase knowledge graph

YouTube quickly realized the limitations of these 15 categories and started working on a more holistic approach based on the Freebase knowledge graph database.

Freebase was a large collaborative knowledge base with more than 39 million structured data entities.

It was organized around "entities", also known as topics. Each entity was linked to one or more "types". Every type had a unique set of "attributes".

For example, a "car" entity was linked, among others, to the "engine" type, which had a "horsepower" attribute.

Similar to Wikipedia, Freebase names of entities, types, and attributes were translated into different languages, which was great news for YouTube's international expansion.

YouTube used the Freebase database as the foundation for developing its proprietary category system.

Google had already developed several taxonomy systems to classify the content of web pages for its search engine and advertising purposes.

YouTube used Google's classifier algorithms to process each entity, type, and attribute of the Freebase database to link it with Google's taxonomy system.

The resulting model was further enriched by linking specific topics to dedicated Wikipedia portal pages.

Going from 15 to more than 39 million topic categories was a huge step forward for YouTube, but it still had countless limitations.

The biggest problem was its reliance on human classification and the hierarchical top-down approach to organizing topics.

This became more evident with the explosive rise of new technologies and ideas.

Manual categories were too inflexible and too slow to adapt to changes.

Today, most ideas and concepts don't have black and white definitions, are often ambiguous and fluid in meaning, and constantly evolve over time.

Is an iPhone a telecommunication device, a mobile computer, a phone, a video camera, or a smartphone? What if we consider apps? Is it a calculator, a text processor, a game console?

What if we have more complex concepts and ideas with 20 levels of hierarchy? How do we organize these?

YouTube decided to retire Freebase in 2015 in favor of a new algorithmic classifier algorithm that didn't require any form of human classification and curation.

Algorithmic content clusters generation

How can a computer generate a super precise map of every imaginable topic in the world and then structure this map into clearly defined content clusters?

This can be achieved by advanced machine learning algorithms that look through billions of data points of video metadata and user watch histories.

First, each video is converted into text meta-data by extracting the video title, description, tags, and comments and converting the audio track into subtitles with text recognition algorithms.

Irrelevant information is discarded.

The combined text data of each video are analyzed and grouped based on keywords and phrases.

The identified keywords and phrases are then sorted and weighted by relevancy and frequency.

Keywords are then linked together based on video watch data.

YouTube looks at each keyword or phrase one at a time

Then it compiles a list of all videos that contain the specific keyword or phrase.

And then identifies all users who have watched at least two different videos with the same keyword or phrase within the same session.

YouTube then analyzes the watch history of all sessions and calculates the average watch distance between all videos that contain the target keyword or phrase.

Let's say we have three videos with the same keyword, A, B, and C.

If Jane started with A, then f, then B, and lastly C.

The distance between A and B would be 2, while A to C would be 3 and so on.

The shorter the distance between two videos, the more relevant the videos, and by proxy, the linked keywords.

Combining millions of videos and user-watch data gives you a pretty good representation of keyword relevancy.

Now, let us do something crazy.

Let's create a gigantic multidimensional mind map and combine all keywords with the average watch distance.

You end up with a gigantic graph with millions of interconnected data points.

YouTube first preprocesses the graph with an outside-in algorithm to find seed videos and keywords to slice the graph into content clusters.

YouTube is looking for natural borders with minimal overlap and a larger average watch distance.

Once YouTube has identified potential content clusters, it selects two random seed videos from each cluster to start a local, in-depth content cluster analysis.

This algorithm works from the inside out and tries to grow a local content cluster graph with clearly defined edges by identifying the shortest path between the two seed videos and then linking together relevant neighbor videos with similar keywords.

In the final process, YouTube then removes videos with the lowest similarity score, usually from the cluster's edges, for additional clarity.

Sometimes, YouTube might realize that the identified content cluster can be further divided into additional subclusters.

The benefit of this algorithmic content cluster generation approach is that it does not require or only very minimal human intervention.

This algorithm constantly identifies new content groups and topics that are often only relevant to a tiny number of users.

All it takes are a few YouTubers who make videos about a new keyword and a bunch of people who watch their videos.

Et voilà, a new content cluster has been created.

And because YouTube knows everyone's watch history, it can now recommend this new content cluster to a lookalike audience of similar people to those who watched these videos first.

YouTube Discovery Features

YouTube homepage feed algorithm

YouTube's homepage has changed a lot over the years.

The YouTube homepage only displayed video recommendations for channels users had subscribed to.

The homepage feed is now 100% personalized with video recommendations based on each user's watch history.

YouTube uses a mix of videos based on familiar topics that the user has recently watched and new videos from entirely different categories based on lookalike users to keep suggestions fresh and exciting.

Why isn't the homepage feed exclusively focusing on familiar topics? Why risk "offending" my good taste with someone I have never watched before?

This may seem counterintuitive, but it turns out that freshness is a crucial factor in keeping people on the YouTube platform for longer.

People can only watch so many videos about one topic before they feel mentally exhausted. Fresh video topics give a way out and prevent boredom from happening.

To be featured on the homepage feed, you need to improve your click-through rate and audience retention since these will help you reach larger audiences.

YouTube subscription feed algorithm

The subscription feed is pretty self-explanatory. It focuses exclusively on videos from channels that users have already subscribed to.

Although this feed focuses on your subscriptions, it is not a chronological feed.

YouTube is still trying to show you the best possible content that it believes will keep you on the platform longer.

Here is what you will see on the subscription feed.

Recently uploaded videos from channels you subscribed to, with a focus on similar topics to what you've already watched and videos that already have a proven track record in terms of high click-through rate and high watch time.

YouTube suggested videos algorithm

YouTube's "Suggested" feed algorithm, which also includes "Up Next" videos, is a significant factor to consider for creators.

This feature selects videos for the suggested area beneath the current video on mobile devices or in the right sidebar on desktop computers.

What is YouTube considering whether or not to suggest one of your videos?

The first step is to make sure that the metadata of your videos matches the metadata of videos you hope to be recommended by.

This includes similar titles, keywords, descriptions, and the video itself expressed by its subtitles.

Your content is more likely to be recommended here if it keeps viewers watching instead of leaving YouTube.

The AI also looks for complementary in-depth videos and channels, as well as taste-breakers to watch something else, so they never get overwhelmed by watching too many videos about any given topic.

Taste breakers are not random. They are still based on personal recommendations based on each user's watch history and co-watch data of similar users.

Most people believe that the YouTube "trending" section contains only videos that are currently popular.

This assumption is wrong.

Trending topics are topics that people are currently talking about in news and social media.

It is all about what is reported in the news, on social media, websites, blogs, and elsewhere.

The trending feature is "geo‐specific", meaning YouTube displays different videos depending on the viewer's location.

YouTube notification feed algorithm

Users also receive tailored video recommendations through YouTube notifications.

To get your videos in the notification feed, users first have to subscribe to your YouTube channel, followed by a click on the bell icon.

Afterward, YouTube will notify subscribers in real-time about any new videos you upload on your channel.

Subscribers receive the notification through their YouTube app or desktop notifications.

Usually, video notifications appear sequentially without discrimination, meaning that YouTube will display all notifications from all channels, independent of subscriber count.

The only exception is when a user turns on too many notifications for too many channels. In this case, YouTube uses its relevancy algorithm based on what the user is most likely to watch next.

YouTube search results algorithm

YouTube search places a strong emphasis on YouTube SEO, including keyword optimization of title, description, video tags, and keywords found in the subtitles of each video.

Additionally, it takes channel subscriber count and video watch time into account when it comes to deciding which channel will show up in search and which videos will be pushed to the top.

Freshness is another important ranking factor that allows new and updated content to be recommended, giving smaller channels a chance.

To take advantage of this, new videos have to optimize their video titles and thumbnails to get a high click-through rate; otherwise, the new videos will get downranked over time if nobody clicks on them.

YouTube has also rolled out Video Chapters that enables a video to be "sliced up into sections" so viewers can easily identify specific answers to specific questions. You can choose to enable this option in your video or you can create your own video chapters with keywords that align with your video topic. This is beneficial to helping YouTube surface your videos on search results.

Next Steps

Wow, the YouTube algorithm is a remarkable piece of technology.

Now that you understand every aspect of how the YouTube algorithm works under the hood, what else can you do to quickly grow your (successful) YouTube channel?

I recommend reading my very tactical guide to YouTube growth to help you crack the YouTube algorithm.

It leaves out many of the technical details of the YouTube algorithm that we covered in this article in favor of giving practical advice and recommendations on how to implement the key lessons about the YouTube algorithm.

Afterward you can check out some of my YouTube growth articles.

Build a content business with Tim Queen