Daniel Kokotajlo comments on Are we in an AI overhang?

Daniel Kokotajlo 27 Jul 2020 21:37 UTC
8 points
My hypothesis: Language models work by being huge. Tesla can’t use huge models because they are limited by the size of the computers on their cars. They could make bigger computers, but then that would cost too much per car and drain the battery too much (e.g. a 10x bigger computer would cut dozens of miles off the range and also add $9,000 to the car price, at least.)
- orthonormal 27 Jul 2020 22:58 UTC
  9 points
  Parent
  [EDIT: oops, I thought you were talking about the direct power consumption of the computation, not the extra hardware weight. My bad.]
  It’s not about the power consumption.
  The air conditioner in your car uses 3 kW, and GPT-3 takes 0.4 kWH for 100 pages of output—thus a dedicated computer on AC power could produce 700 pages per hour, going substantially faster than AI Dungeon (literally and metaphorically). So a model as large as GPT-3 could run on the electricity of a car.
  The hardware would be more expensive, of course. But that’s different.
  - Daniel Kokotajlo 28 Jul 2020 12:41 UTC
    6 points
    Parent
    Huh, thanks—I hadn’t run the numbers myself, so this is a good wake-up call for me. I was going off what Elon said. (He said multiple times that power efficiency was an important design constraint on their hardware because otherwise it would reduce the range of the car too much.) So now I’m just confused. Maybe Elon had the hardware weight in mind, but still...
    Maybe the real problem is just that it would add too much to the price of the car?
    - CarlShulman 29 Jul 2020 15:43 UTC
      4 points
      Parent
      Maybe the real problem is just that it would add too much to the price of the car?
      Yes. GPU/ASICs in a car will have to sit idle almost all the time, so the costs of running a big model on it will be much higher than in the cloud.
- Linch 8 Aug 2020 9:49 UTC
  3 points
  Parent
  Re hardware limit: flagging the implicit assumption here that network speeds are spotty/unreliable enough that you can’t or are unwilling to safely do hybrid on-device/cloud processing for the important parts of self-driving cars.
  (FWIW I think the assumption is probably correct).