Thinking about this a bit more, do you have any insight on Tesla? I can believe that it’s outside DM and GB’s culture to run with the scaling hypothesis, but watching Karpathy’s presentations (which I think is the only public information on their AI program?) I get the sense they’re well beyond $10m/run by now. Considering that self-driving is still not there—and once upon a time I’d have expected driving to be easier than Harry Potter parodies—it suggests that language is special in some way. Information density? Rich, diff’able reward signal?
Self driving is very unforgiving of mistakes. The text generation on the other hand doesn’t have similar failure conditions and bad content can be easily fixed.
Tesla publishes nothing and I only know a little from Karpathy’s occasional talks, which are as much about PR (to keep Tesla owners happy and investing in FSD, presumably) & recruiting as anything else. But their approach seems heavily focused on supervised learning in CNNs and active learning using their fleet to collect new images, and to have nothing to do with AGI plans. They don’t seem to even be using DRL much. It is extremely unlikely that Tesla is going to be relevant to AGI or progress in the field in general given their secrecy and domain-specific work. (I’m not sure how well they’re doing even at self-driving cars—I keep reading about people dying when their Tesla runs into a stationary object on a highway in the middle of the day, which you’d think they’d’ve solved by now...)
I’m pretty sure I remember hearing they use unsupervised learning to form their 3D model of their local environment, and that’s the most important part, no?
I believe that is referring to the baseline driver assistance system, and not the advanced “full self driving” one (that has to be paid for separately). Though it’s hard to tell that level of detail from a mainstream media report.
I just realized with a start that this is _absolutely_ going to happen. We are going to, in the not-too-distant-future see a GPT-x (or similar) be ported to a Tesla and drive it.
It frustrates me that there are not enough people IRL I can excitedly talk about how big of a deal this is.
Presumably, because with a big-enough X, we can generate text descriptions of scenes from cameras and feed them in to get driving output more easily than the seemingly fairly slow process to directly train a self-driving system that is safe. And if GPT-X is effectively magic, that’s enough.
I’m not sure I buy it, though. I think that once people agree that scaling just works, we’ll end up scaling the NNs used for self driving instead, and just feed them much more training data.
There might be some architectures that are more scaleable then others. As far as I understand the present models for self driving have for the most part a lot of hardcoded elements. That might make them more complicated to scale.
My hypothesis: Language models work by being huge. Tesla can’t use huge models because they are limited by the size of the computers on their cars. They could make bigger computers, but then that would cost too much per car and drain the battery too much (e.g. a 10x bigger computer would cut dozens of miles off the range and also add $9,000 to the car price, at least.)
[EDIT: oops, I thought you were talking about the direct power consumption of the computation, not the extra hardware weight. My bad.]
It’s not about the power consumption.
The air conditioner in your car uses 3 kW, and GPT-3 takes 0.4 kWH for 100 pages of output—thus a dedicated computer on AC power could produce 700 pages per hour, going substantially faster than AI Dungeon (literally and metaphorically). So a model as large as GPT-3 could run on the electricity of a car.
The hardware would be more expensive, of course. But that’s different.
Huh, thanks—I hadn’t run the numbers myself, so this is a good wake-up call for me. I was going off what Elon said. (He said multiple times that power efficiency was an important design constraint on their hardware because otherwise it would reduce the range of the car too much.) So now I’m just confused. Maybe Elon had the hardware weight in mind, but still...
Maybe the real problem is just that it would add too much to the price of the car?
Re hardware limit: flagging the implicit assumption here that network speeds are spotty/unreliable enough that you can’t or are unwilling to safely do hybrid on-device/cloud processing for the important parts of self-driving cars.
(FWIW I think the assumption is probably correct).
Thinking about this a bit more, do you have any insight on Tesla? I can believe that it’s outside DM and GB’s culture to run with the scaling hypothesis, but watching Karpathy’s presentations (which I think is the only public information on their AI program?) I get the sense they’re well beyond $10m/run by now. Considering that self-driving is still not there—and once upon a time I’d have expected driving to be easier than Harry Potter parodies—it suggests that language is special in some way. Information density? Rich, diff’able reward signal?
Self driving is very unforgiving of mistakes. The text generation on the other hand doesn’t have similar failure conditions and bad content can be easily fixed.
Tesla publishes nothing and I only know a little from Karpathy’s occasional talks, which are as much about PR (to keep Tesla owners happy and investing in FSD, presumably) & recruiting as anything else. But their approach seems heavily focused on supervised learning in CNNs and active learning using their fleet to collect new images, and to have nothing to do with AGI plans. They don’t seem to even be using DRL much. It is extremely unlikely that Tesla is going to be relevant to AGI or progress in the field in general given their secrecy and domain-specific work. (I’m not sure how well they’re doing even at self-driving cars—I keep reading about people dying when their Tesla runs into a stationary object on a highway in the middle of the day, which you’d think they’d’ve solved by now...)
I’m pretty sure I remember hearing they use unsupervised learning to form their 3D model of their local environment, and that’s the most important part, no?
Curious if you have updated on this at all, given AI Day announcements?
They still running into stationary objects? The hardware is cool, sure, but unclear how much good it’s doing them...
I believe that is referring to the baseline driver assistance system, and not the advanced “full self driving” one (that has to be paid for separately). Though it’s hard to tell that level of detail from a mainstream media report.
hey man wanna watch this language model drive my car
I just realized with a start that this is _absolutely_ going to happen. We are going to, in the not-too-distant-future see a GPT-x (or similar) be ported to a Tesla and drive it.
It frustrates me that there are not enough people IRL I can excitedly talk about how big of a deal this is.
Can you explain why GPT-x would be well-suited to that modality?
Presumably, because with a big-enough X, we can generate text descriptions of scenes from cameras and feed them in to get driving output more easily than the seemingly fairly slow process to directly train a self-driving system that is safe. And if GPT-X is effectively magic, that’s enough.
I’m not sure I buy it, though. I think that once people agree that scaling just works, we’ll end up scaling the NNs used for self driving instead, and just feed them much more training data.
There might be some architectures that are more scaleable then others. As far as I understand the present models for self driving have for the most part a lot of hardcoded elements. That might make them more complicated to scale.
Agreed, but I suspect that replacing those hard-coded elements will get easier over time as well.
Andrej Karpathy talks about exactly that in a recent presentation: https://youtu.be/hx7BXih7zx8?t=1118
My hypothesis: Language models work by being huge. Tesla can’t use huge models because they are limited by the size of the computers on their cars. They could make bigger computers, but then that would cost too much per car and drain the battery too much (e.g. a 10x bigger computer would cut dozens of miles off the range and also add $9,000 to the car price, at least.)
[EDIT: oops, I thought you were talking about the direct power consumption of the computation, not the extra hardware weight. My bad.]
It’s not about the power consumption.
The air conditioner in your car uses 3 kW, and GPT-3 takes 0.4 kWH for 100 pages of output—thus a dedicated computer on AC power could produce 700 pages per hour, going substantially faster than AI Dungeon (literally and metaphorically). So a model as large as GPT-3 could run on the electricity of a car.
The hardware would be more expensive, of course. But that’s different.
Huh, thanks—I hadn’t run the numbers myself, so this is a good wake-up call for me. I was going off what Elon said. (He said multiple times that power efficiency was an important design constraint on their hardware because otherwise it would reduce the range of the car too much.) So now I’m just confused. Maybe Elon had the hardware weight in mind, but still...
Maybe the real problem is just that it would add too much to the price of the car?
Yes. GPU/ASICs in a car will have to sit idle almost all the time, so the costs of running a big model on it will be much higher than in the cloud.
Re hardware limit: flagging the implicit assumption here that network speeds are spotty/unreliable enough that you can’t or are unwilling to safely do hybrid on-device/cloud processing for the important parts of self-driving cars.
(FWIW I think the assumption is probably correct).