Lone Pine comments on A central AI alignment problem: capabilities generalization, and the sharp left turn

Lone Pine 30 Jun 2022 10:56 UTC
4 points
1
The whole strawberry thing is really confusing to me; it just doesn’t map to any natural problem that humans actually care about. And when EY says “and nothing else,” it’s not clear what the actual boundaries to impact are. In order to create a strawberry, the AI must modify other things about the world. At the very least, you have to get the atoms for the strawberry from somewhere. And since the requirement is for the two strawberries to be “identical on the cellular level” (which IMO is not a scientifically ground concept), the AI would presumably have to invent some advanced technology, presumably nanotech, which requires modifying the world too. Even if the AI does nanotech R&D entirely in simulation, there is still some limited impact (due to energy use of the computers, as well as the need to print DNA or whatever to manifest the nanotech in reality, etc).

I think the test should be the Wozniak test: a simple robot enters a random home and uses the tools available to make coffee. That’s a much more sensible test, we can easily verify whether there is any impact outside of the home, and if EY is right, it should be just as difficult to do consistently, given the difficulty of alignment.
- Rob Bensinger 30 Jun 2022 19:25 UTC
  7 points
  2
  Parent
  The whole strawberry thing is really confusing to me; it just doesn’t map to any natural problem that humans actually care about.
  It maps to the pivotal acts I think are most promising; the order of difficulty seems similar to me, and the kind of problem seems similar to me too.
  And when EY says “and nothing else,” it’s not clear what the actual boundaries to impact are.
  I think EY mainly means ‘without the AI steering or directly impacting the macro-state of the world’. The task obviously isn’t possible if you literally can’t affect any atoms in the universe outside the strawberries itself. But IMO it would work fine for this challenge if the humans put a bunch of resources into a room (or into a football stadium, if needed), and got the AI to execute the task without having a direct large impact on the macro-state of the world (nor an indirect impact that involves the AI deliberately steering the world toward a new macro-state in some way).
  I think the test should be the Wozniak test: a simple robot enters a random home and uses the tools available to make coffee. [...] EY is right, it should be just as difficult to do consistently, given the difficulty of alignment.
  This seems much easier to do right, because (a) the robot can get by with being dramatically less smart, and (b) the task itself is extremely easy for humans for understand, oversee, and verify. (Indeed, the task is so simple that in real life you could just have a human hook up a camera to the robot and steer the robot by remote control.)
  For the Wozniak test, the capabilities of the system can be dramatically weaker, and the alignment is dramatically easier. This doesn’t obviously capture the things I see as hard as alignment.