The frontrunners right now are OpenAI and DeepMind.
I’m not sure about this. Note that not all companies are equally incentivized to publish their ML research (some companies may be incentivized to be secretive about their ML work and capabilities due to competition/regulation dynamics). I don’t see how we can know whether GPT-3 is further along on the route to AGI than FB’s feed-creation algorithm, or the most impressive algo-trading system etc.
The other places have the money, but less talent
I don’t know where the “less talent” estimate is coming from. I won’t be surprised if there are AI teams with a much larger salary budget than any team at OpenAI/DeepMind, and I expect the “amount of talent” to correlate with salary budget (among prestigious AI labs).
and more importantly don’t seem to be acting as if they think short timelines are possible.
I’m not sure how well we can estimate the beliefs and motivations of all well-resourced AI teams in the world.
Also, a team need not be trying to create AGI (or believe they can) in order to create AGI. It’s sufficient that they are incentivized to create systems that model the world as well as possible; which is the case for many teams, including ones working on feed-creation in social media services and algo-trading systems. (The ability to plan and find solutions to arbitrary problems in the real world naturally arises from the ability to model it, in the limit.)
Fair points. I don’t have the expertise to evaluate this myself; my thoughts above were mostly based on what I’d heard other people say. That said, I’d be surprised if the feed-creation algorithm had as many parameters as GPT-3, considering how often it has to be run per day… Not sure about the trading algos… yeah I wish I knew more about those examples, they are both good.
That said, I’d be surprised if the feed-creation algorithm had as many parameters as GPT-3, considering how often it has to be run per day...
The relevant quantities here are the compute cost of each model usage (inference)—e.g. the cost of compute for choosing the next post to place on a feed—and the impact of such a potential usage on FB’s revenue.
This post by Gwern suggests that OpenAI was able to run a single GPT-3 inference (i.e. generate a single token) at a cost of $0.00006 (6 cents for 1,000 tokens) or less. I’m sure it’s worth to FB much more than $0.00006 to choose well the next post that a random user sees.
OK, but how big is the “context window” for post selection? Probably the algorithm reviews thousands of potential posts rather than just a dozen. So that’s 2 OOMs more, so 6 cents per 10 things in your feed… yeah maybe that’s doable but that seems like a lot to me. Let’s see, suppose 2 billion people go on FB each day for an average of an hour, seeing an average of 500 things… that’s a trillion things, so six hundred billion cents, or six billion dollars per day… this feels like probably more than FB makes in ad revenue? Even if I’m wrong and it’s only as expensive to choose a post as GPT-3 is to choose a token, then that’s still sixty million dollars a day. This feels like a lot to me. Idk. Maybe I should go look it up, haha.
I didn’t follow this. FB doesn’t need to run a model inference for each possible post that it considers showing (just like OpenAI doesn’t need to run a GPT-3 inference for each possible token that can come next).
(BTW, I think the phrase “context window” would correspond to the model’s input.)
FB’s revenue from advertising in 2019 was $69.7 billion, or $191 million per day. So yea, it seems possible that in 2019 they used a model with an inference cost similar to GPT-3′s, though not one that is 10x more expensive [EDIT: under this analysis’ assumptions]; so I was overconfident in my previous comment.
Yeah maybe I was confused. FB does need to read all the posts it is considering though, and if it has thousands of posts to choose from, that’s probably a lot more than can fit in GPT-3′s context window, so FB’s algorithm needs to be bigger than GPT-3… at least, that’s what I was thinking. But yeah that’s not the right way of thinking about it. Better to just think about how much budget FB can possibly have for model inference, which as you say must be something like $100mil per day tops. That means that maybe it’s GPT-3 sized but can’t be much bigger, and IMO is probably smaller.
(They may spend more on inference compute if doing so would sufficiently increase their revenue. They may train such a more-expensive model just to try it out for a short while, to see whether they’re better off using it.)
I’m not sure about this. Note that not all companies are equally incentivized to publish their ML research (some companies may be incentivized to be secretive about their ML work and capabilities due to competition/regulation dynamics). I don’t see how we can know whether GPT-3 is further along on the route to AGI than FB’s feed-creation algorithm, or the most impressive algo-trading system etc.
I don’t know where the “less talent” estimate is coming from. I won’t be surprised if there are AI teams with a much larger salary budget than any team at OpenAI/DeepMind, and I expect the “amount of talent” to correlate with salary budget (among prestigious AI labs).
I’m not sure how well we can estimate the beliefs and motivations of all well-resourced AI teams in the world. Also, a team need not be trying to create AGI (or believe they can) in order to create AGI. It’s sufficient that they are incentivized to create systems that model the world as well as possible; which is the case for many teams, including ones working on feed-creation in social media services and algo-trading systems. (The ability to plan and find solutions to arbitrary problems in the real world naturally arises from the ability to model it, in the limit.)
Fair points. I don’t have the expertise to evaluate this myself; my thoughts above were mostly based on what I’d heard other people say. That said, I’d be surprised if the feed-creation algorithm had as many parameters as GPT-3, considering how often it has to be run per day… Not sure about the trading algos… yeah I wish I knew more about those examples, they are both good.
The relevant quantities here are the compute cost of each model usage (inference)—e.g. the cost of compute for choosing the next post to place on a feed—and the impact of such a potential usage on FB’s revenue.
This post by Gwern suggests that OpenAI was able to run a single GPT-3 inference (i.e. generate a single token) at a cost of $0.00006 (6 cents for 1,000 tokens) or less. I’m sure it’s worth to FB much more than $0.00006 to choose well the next post that a random user sees.
OK, but how big is the “context window” for post selection? Probably the algorithm reviews thousands of potential posts rather than just a dozen. So that’s 2 OOMs more, so 6 cents per 10 things in your feed… yeah maybe that’s doable but that seems like a lot to me. Let’s see, suppose 2 billion people go on FB each day for an average of an hour, seeing an average of 500 things… that’s a trillion things, so six hundred billion cents, or six billion dollars per day… this feels like probably more than FB makes in ad revenue? Even if I’m wrong and it’s only as expensive to choose a post as GPT-3 is to choose a token, then that’s still sixty million dollars a day. This feels like a lot to me. Idk. Maybe I should go look it up, haha.
I didn’t follow this. FB doesn’t need to run a model inference for each possible post that it considers showing (just like OpenAI doesn’t need to run a GPT-3 inference for each possible token that can come next).
(BTW, I think the phrase “context window” would correspond to the model’s input.)
FB’s revenue from advertising in 2019 was $69.7 billion, or $191 million per day. So yea, it seems possible that in 2019 they used a model with an inference cost similar to GPT-3′s, though not one that is 10x more expensive [EDIT: under this analysis’ assumptions]; so I was overconfident in my previous comment.
Yeah maybe I was confused. FB does need to read all the posts it is considering though, and if it has thousands of posts to choose from, that’s probably a lot more than can fit in GPT-3′s context window, so FB’s algorithm needs to be bigger than GPT-3… at least, that’s what I was thinking. But yeah that’s not the right way of thinking about it. Better to just think about how much budget FB can possibly have for model inference, which as you say must be something like $100mil per day tops. That means that maybe it’s GPT-3 sized but can’t be much bigger, and IMO is probably smaller.
(They may spend more on inference compute if doing so would sufficiently increase their revenue. They may train such a more-expensive model just to try it out for a short while, to see whether they’re better off using it.)
Good points, especially the second one.