As I understand it the reason is that it is too computationally expensive to tailor a feed for each user, I remember seeing somewhere on the Facebook developer blog that that for any given user they take a batch, let’s say 1 million posts/piece-of-content from their wider network, this will likely overlap if not be the same as many other users. From that it personalizes and prunes to whatever number of items they elect to show in your feed (say 100).
Out of the millions of pieces of content posted every day, it couldn’t possibly prune it down for every user a-new every 24 hours, if there’s a 200 million people using a platform at least once every 24 hours, that quickly rockets up. So they use restricted pools which go through successive filters. The problem is if you’re not in the right pool to begin with, or your interests exist across multiple pools—then there is no hope of getting content that is tailored to you.
This is a half-remembered explanation of a much more intricate and multi-layered process as described by Facebook, and I don’t know how similar/different Youtube, TikTok and other platforms are.
Computational costs make sense. I kinda assumed that if Google can calculate PageRank (a numeric value for every URL, depending on the content of documents in other URLs), calculating this would be a piece of cake. But I guess it’s the opposite.
PageRank can be calculated iteratively—you assign the same starting value 1 to all pages, and then just keep updating, and in theory if you keep doing it infinitely long, you will get the theoretically correct value, but if you check at any moment after finite time, at least the value will be something, and quite often it will be close enough to the theoretically correct value. It scales well.
On the other hand, similarity between users X and Y probably needs to be calculated individually for each pair of users. (I am not sure about this, maybe there also is a smart way to do that iteratively.)
If YouTube just puts me into a generic bag like “people from Slovakia”, or even a narrower one like “people from Slovakia who watch discussion videos”, that would explain a lot.
I think it’s unlikely that computation is the bottleneck here, even if it used large batches I would still expect it to provide a better feed than what I usually see, I think the problem is more likely to be that they’re incentivized to give us addictive content rather than what we actually like, and my hope is that something like X or other socials with different incentives could do a lot better.
As I understand it the reason is that it is too computationally expensive to tailor a feed for each user, I remember seeing somewhere on the Facebook developer blog that that for any given user they take a batch, let’s say 1 million posts/piece-of-content from their wider network, this will likely overlap if not be the same as many other users. From that it personalizes and prunes to whatever number of items they elect to show in your feed (say 100).
Out of the millions of pieces of content posted every day, it couldn’t possibly prune it down for every user a-new every 24 hours, if there’s a 200 million people using a platform at least once every 24 hours, that quickly rockets up. So they use restricted pools which go through successive filters. The problem is if you’re not in the right pool to begin with, or your interests exist across multiple pools—then there is no hope of getting content that is tailored to you.
This is a half-remembered explanation of a much more intricate and multi-layered process as described by Facebook, and I don’t know how similar/different Youtube, TikTok and other platforms are.
Computational costs make sense. I kinda assumed that if Google can calculate PageRank (a numeric value for every URL, depending on the content of documents in other URLs), calculating this would be a piece of cake. But I guess it’s the opposite.
PageRank can be calculated iteratively—you assign the same starting value 1 to all pages, and then just keep updating, and in theory if you keep doing it infinitely long, you will get the theoretically correct value, but if you check at any moment after finite time, at least the value will be something, and quite often it will be close enough to the theoretically correct value. It scales well.
On the other hand, similarity between users X and Y probably needs to be calculated individually for each pair of users. (I am not sure about this, maybe there also is a smart way to do that iteratively.)
If YouTube just puts me into a generic bag like “people from Slovakia”, or even a narrower one like “people from Slovakia who watch discussion videos”, that would explain a lot.
I think it’s unlikely that computation is the bottleneck here, even if it used large batches I would still expect it to provide a better feed than what I usually see, I think the problem is more likely to be that they’re incentivized to give us addictive content rather than what we actually like, and my hope is that something like X or other socials with different incentives could do a lot better.
I don’t believe it’s doing a good job of delivering me ‘addictive’ content. Am I in denial or am I a fringe case and for the most part it is good?
From my experience they’re getting pretty good, depends on the social but IG reels or YT can keep me entertained with nothing-content for hours