An AI in a simulated world isn’t prohibited from improving itself.
More to the point, I didn’t imagine I would save the world by writing one comment on LW :-) My idea of progress is solving small problems conclusively. Eliezer has spent a lot of effort convincing everybody here that AI containment is not just useless—it’s impossible. (Hence the AI-box experiments, the arguments against oracle AIs, etc.) If we update to thinking it’s possible after all, I think that would be enough progress for the day.
I don’t think it’s really an airtight proof—there’s a lot that a sufficiently powerful intelligence could learn about its questioners and their environment from a question; and when we can’t even prove there’s no such thing as a Langford Basilisk, we can’t establish an upper bound on the complexity of a safe answer. Essentially, researchers would be constrained by their own best judgement in the complexity of the questions and of the responses.
Of course, all that’s rather unlikely, especially as it (hopefully) wouldn’t be able to upgrade its hardware—but you’re right, software-only self-improvement would still be possible.
Yes, I agree. It would be safest to use such “AI bombs” for solving hard problems with short and machine-checkable solutions, like proving math theorems, designing algorithms or breaking crypto. There’s not much point for the AI to insert backdoors into the answer if it only cares about the verifier’s response after a trillion cycles, but the really paranoid programmer may also include a term in the AI’s utility function to favor shorter answers over longer ones.
An AI in a simulated world isn’t prohibited from improving itself.
More to the point, I didn’t imagine I would save the world by writing one comment on LW :-) My idea of progress is solving small problems conclusively. Eliezer has spent a lot of effort convincing everybody here that AI containment is not just useless—it’s impossible. (Hence the AI-box experiments, the arguments against oracle AIs, etc.) If we update to thinking it’s possible after all, I think that would be enough progress for the day.
I don’t think it’s really an airtight proof—there’s a lot that a sufficiently powerful intelligence could learn about its questioners and their environment from a question; and when we can’t even prove there’s no such thing as a Langford Basilisk, we can’t establish an upper bound on the complexity of a safe answer. Essentially, researchers would be constrained by their own best judgement in the complexity of the questions and of the responses.
Of course, all that’s rather unlikely, especially as it (hopefully) wouldn’t be able to upgrade its hardware—but you’re right, software-only self-improvement would still be possible.
Yes, I agree. It would be safest to use such “AI bombs” for solving hard problems with short and machine-checkable solutions, like proving math theorems, designing algorithms or breaking crypto. There’s not much point for the AI to insert backdoors into the answer if it only cares about the verifier’s response after a trillion cycles, but the really paranoid programmer may also include a term in the AI’s utility function to favor shorter answers over longer ones.