jimmy comments on Interacting with a Boxed AI

jimmy 2 Apr 2022 2:06 UTC
5 points
“How many bits” isn’t a very well defined question. Give it 14k words to persuade you to let it out, and it might succeed. Give it 140k chances to predict “rain or no rain, in this location and time?” and it has no chance. The problem is that if it works well for that, you’ll probably want to start using it for more… and more..
It’s this path that’s the concerning thing. So long as we’re sufficiently worried about having our minds changed, we’re really good at ignoring good arguments and deluding ourselves into missing the fact that the arguments really ought to be compelling according to our own ways of reasoning. When people get their mind changed (against their best interests or not) it tends to be because their guards were not up, or were not up in the right places, or were not up high enough.
The question, so far as I can tell, is whether you recognize that you ought to be scared away from going further before or after you reach the point of no return where the AI has enough influence over you that it can convince you to give it power faster than you can ensure the power to be safe. Drugs make a useful analogy here. How much heroin can you take before you get addicted? Well, it depends on how terrified you are of addiction and how competent you are at these things. “Do you get hooked after one dose?” isn’t really the right question if quitting after the first dose is so easy that you let yourself take more. If you recognized the threat as significant enough, it’s possible to get shaken out of things quite far down the road (the whole “hitting rock bottom” thing sometimes provides enough impetus to scare people sober).
Superhuman AI is a new type of threat that’s likely very easy to underestimate (perhaps to an extent that is also easy to underestimate), but I don’t think the idea of “Give it carefully chosen ways to influence the world, and then stop and reassess” is doomed to failure regardless of the level of care.
- weightt an 14 Apr 2022 11:20 UTC
  1 point
  Parent
  Give it 140k chances to predict “rain or no rain, in this location and time?” and it has no chance.
  Well i think it can just encode some message in this bits and you or your colleagues will eventually check it