Your argument as I understand it is: the economic incentive to make the model bigger might disappear if the cost of computing the recommendation outweighs the gain of having “better” recommendations.
I think this is definitely relevant, but I don’t feel like I have enough information to decide if the argument holds or not. Notably, it goes back to the parameter that we discussed in a call: whether increasing the model size/compute/dataset size improves the performance for the real world task until AGI is reached, or whether there’s some interval in which the performance for the real world tasks stays the same. If we’re more in the case of the former, then the economic incentive probably still exists; if we’re in the latter case, then I can see a point where making the model bigger just looks like throwing money out the window.
Meanwhile youtubers make 1-3 cents per ad view according to a quick google, which suggests that even at this level the algo would be costing more than it makes, probably.
We care about what YouTube makes, not what youtubers make, right?
EDIT: I think a simililarly-large reason to bet against Youtube algo reaching AGI level is that youtube isn’t trying to make their algo an AGI. It might happen eventually, but long before it does, some other company that also has loads of money and that is actually trying to get AGI will have beaten them to it.
I agree with that. And similarly, YouTube has less incentive to do private research on AGI if it’s not trying to reach it, as opposed to a company pushing towards AGI.
Yes, we care about what YouTube makes, not what youtubers make. My brief google didn’t turn up anything about what YouTube makes but I assume it’s not more than a few times greater than what youtubers make… but I might be wrong about that!
I agree we don’t have enough information to decide if the argument holds or not. I think that even if bigger models are always qualitatively better, the issue is whether the monetary returns outweigh the increasing costs. I suspect they won’t, at least in the case of the youtube algo. Here’s my argument I guess, in more detail:
1. Suppose that currently the cost of compute for the algo is within an OOM of the revenue generated by it. (Seems plausible to me but I don’t actually know)
2. Then to profitably scale up the algo by, say, 2 ooms, the money generated by the algo would have to go up by, like, 1.5 ooms.
3. But it’s implausible that a 2-oom increase in size of algo would result in that much increase in revenue. Like, yeah, the ads will be better targeted, people will be spending more, etc. But 1.5 OOMs more? When I imagine a world where Youtube viewers spend 10x more money as a result of youtube ads, I imagine those ads being so incredibly appealing that people go to youtube just to see the ads because they are so relevant and interesting. And I feel like that’s possible, but it’s implausible that making the model 2 ooms bigger would yield that result.
… you know now that I write it out, I’m no longer so sure! GPT-3 was a lot better than GPT-2, and it was 2 OOMs bigger. Maybe youtube really could make 1.5 OOMs more revenue by making their model 2 OOMs bigger. And then maybe they could increase revenue even further by making it bigger still, etc. on up to AGI.
If the main source of revenue is people buying stuff after seeing an ad on YouTube, then I agree with your point in the middle of the comment, that it seems hardly possible for the revenue to go 1.5 OOMs more by only 2OOMs on model size. I bet that there would be a big discontinuity here, where you need massive investment to actually see any significant improvement.
On the other hand, if the main source of revenue is money payed for the number of views of ads, then I believe a better model could improve that relatively smoothly. In part because just giving people interesting stuff to see makes them look at more ads.
Isn’t there a close connection between money payed for number of views of ads and people buying stuff after seeing an ad on YouTub? I thought that the situation is something like this: People see ads and buy stuff --> Data is collected on how much extra money the ad brought in --> youtube charges advertisers accordingly. The only way for youtube to charge advertisers significantly more is first for people to buy significantly more stuff as a result of seeing ads.
Thanks for the feedback.
Your argument as I understand it is: the economic incentive to make the model bigger might disappear if the cost of computing the recommendation outweighs the gain of having “better” recommendations.
I think this is definitely relevant, but I don’t feel like I have enough information to decide if the argument holds or not. Notably, it goes back to the parameter that we discussed in a call: whether increasing the model size/compute/dataset size improves the performance for the real world task until AGI is reached, or whether there’s some interval in which the performance for the real world tasks stays the same. If we’re more in the case of the former, then the economic incentive probably still exists; if we’re in the latter case, then I can see a point where making the model bigger just looks like throwing money out the window.
We care about what YouTube makes, not what youtubers make, right?
I agree with that. And similarly, YouTube has less incentive to do private research on AGI if it’s not trying to reach it, as opposed to a company pushing towards AGI.
Yes, we care about what YouTube makes, not what youtubers make. My brief google didn’t turn up anything about what YouTube makes but I assume it’s not more than a few times greater than what youtubers make… but I might be wrong about that!
I agree we don’t have enough information to decide if the argument holds or not. I think that even if bigger models are always qualitatively better, the issue is whether the monetary returns outweigh the increasing costs. I suspect they won’t, at least in the case of the youtube algo. Here’s my argument I guess, in more detail:
1. Suppose that currently the cost of compute for the algo is within an OOM of the revenue generated by it. (Seems plausible to me but I don’t actually know)
2. Then to profitably scale up the algo by, say, 2 ooms, the money generated by the algo would have to go up by, like, 1.5 ooms.
3. But it’s implausible that a 2-oom increase in size of algo would result in that much increase in revenue. Like, yeah, the ads will be better targeted, people will be spending more, etc. But 1.5 OOMs more? When I imagine a world where Youtube viewers spend 10x more money as a result of youtube ads, I imagine those ads being so incredibly appealing that people go to youtube just to see the ads because they are so relevant and interesting. And I feel like that’s possible, but it’s implausible that making the model 2 ooms bigger would yield that result.
… you know now that I write it out, I’m no longer so sure! GPT-3 was a lot better than GPT-2, and it was 2 OOMs bigger. Maybe youtube really could make 1.5 OOMs more revenue by making their model 2 OOMs bigger. And then maybe they could increase revenue even further by making it bigger still, etc. on up to AGI.
If the main source of revenue is people buying stuff after seeing an ad on YouTube, then I agree with your point in the middle of the comment, that it seems hardly possible for the revenue to go 1.5 OOMs more by only 2OOMs on model size. I bet that there would be a big discontinuity here, where you need massive investment to actually see any significant improvement.
On the other hand, if the main source of revenue is money payed for the number of views of ads, then I believe a better model could improve that relatively smoothly. In part because just giving people interesting stuff to see makes them look at more ads.
Isn’t there a close connection between money payed for number of views of ads and people buying stuff after seeing an ad on YouTub? I thought that the situation is something like this: People see ads and buy stuff --> Data is collected on how much extra money the ad brought in --> youtube charges advertisers accordingly. The only way for youtube to charge advertisers significantly more is first for people to buy significantly more stuff as a result of seeing ads.