That said, I’d be surprised if the feed-creation algorithm had as many parameters as GPT-3, considering how often it has to be run per day...
The relevant quantities here are the compute cost of each model usage (inference)—e.g. the cost of compute for choosing the next post to place on a feed—and the impact of such a potential usage on FB’s revenue.
This post by Gwern suggests that OpenAI was able to run a single GPT-3 inference (i.e. generate a single token) at a cost of $0.00006 (6 cents for 1,000 tokens) or less. I’m sure it’s worth to FB much more than $0.00006 to choose well the next post that a random user sees.
OK, but how big is the “context window” for post selection? Probably the algorithm reviews thousands of potential posts rather than just a dozen. So that’s 2 OOMs more, so 6 cents per 10 things in your feed… yeah maybe that’s doable but that seems like a lot to me. Let’s see, suppose 2 billion people go on FB each day for an average of an hour, seeing an average of 500 things… that’s a trillion things, so six hundred billion cents, or six billion dollars per day… this feels like probably more than FB makes in ad revenue? Even if I’m wrong and it’s only as expensive to choose a post as GPT-3 is to choose a token, then that’s still sixty million dollars a day. This feels like a lot to me. Idk. Maybe I should go look it up, haha.
I didn’t follow this. FB doesn’t need to run a model inference for each possible post that it considers showing (just like OpenAI doesn’t need to run a GPT-3 inference for each possible token that can come next).
(BTW, I think the phrase “context window” would correspond to the model’s input.)
FB’s revenue from advertising in 2019 was $69.7 billion, or $191 million per day. So yea, it seems possible that in 2019 they used a model with an inference cost similar to GPT-3′s, though not one that is 10x more expensive [EDIT: under this analysis’ assumptions]; so I was overconfident in my previous comment.
Yeah maybe I was confused. FB does need to read all the posts it is considering though, and if it has thousands of posts to choose from, that’s probably a lot more than can fit in GPT-3′s context window, so FB’s algorithm needs to be bigger than GPT-3… at least, that’s what I was thinking. But yeah that’s not the right way of thinking about it. Better to just think about how much budget FB can possibly have for model inference, which as you say must be something like $100mil per day tops. That means that maybe it’s GPT-3 sized but can’t be much bigger, and IMO is probably smaller.
(They may spend more on inference compute if doing so would sufficiently increase their revenue. They may train such a more-expensive model just to try it out for a short while, to see whether they’re better off using it.)
The relevant quantities here are the compute cost of each model usage (inference)—e.g. the cost of compute for choosing the next post to place on a feed—and the impact of such a potential usage on FB’s revenue.
This post by Gwern suggests that OpenAI was able to run a single GPT-3 inference (i.e. generate a single token) at a cost of $0.00006 (6 cents for 1,000 tokens) or less. I’m sure it’s worth to FB much more than $0.00006 to choose well the next post that a random user sees.
OK, but how big is the “context window” for post selection? Probably the algorithm reviews thousands of potential posts rather than just a dozen. So that’s 2 OOMs more, so 6 cents per 10 things in your feed… yeah maybe that’s doable but that seems like a lot to me. Let’s see, suppose 2 billion people go on FB each day for an average of an hour, seeing an average of 500 things… that’s a trillion things, so six hundred billion cents, or six billion dollars per day… this feels like probably more than FB makes in ad revenue? Even if I’m wrong and it’s only as expensive to choose a post as GPT-3 is to choose a token, then that’s still sixty million dollars a day. This feels like a lot to me. Idk. Maybe I should go look it up, haha.
I didn’t follow this. FB doesn’t need to run a model inference for each possible post that it considers showing (just like OpenAI doesn’t need to run a GPT-3 inference for each possible token that can come next).
(BTW, I think the phrase “context window” would correspond to the model’s input.)
FB’s revenue from advertising in 2019 was $69.7 billion, or $191 million per day. So yea, it seems possible that in 2019 they used a model with an inference cost similar to GPT-3′s, though not one that is 10x more expensive [EDIT: under this analysis’ assumptions]; so I was overconfident in my previous comment.
Yeah maybe I was confused. FB does need to read all the posts it is considering though, and if it has thousands of posts to choose from, that’s probably a lot more than can fit in GPT-3′s context window, so FB’s algorithm needs to be bigger than GPT-3… at least, that’s what I was thinking. But yeah that’s not the right way of thinking about it. Better to just think about how much budget FB can possibly have for model inference, which as you say must be something like $100mil per day tops. That means that maybe it’s GPT-3 sized but can’t be much bigger, and IMO is probably smaller.
(They may spend more on inference compute if doing so would sufficiently increase their revenue. They may train such a more-expensive model just to try it out for a short while, to see whether they’re better off using it.)
Good points, especially the second one.