If the AI decides to, it can write a completely different program from scratch, run it, and then turn itself off.
It’s not clear to me what you mean by “turn itself off” here if the AI doesn’t have direct access to whatever architecture it’s running on. I would phrase the point slightly differently: an AI can always write a completely different program from scratch and then commit to simulating it if it ever determines that this is a reasonable thing to do. This wouldn’t be entirely equivalent to actual self-modification because it might be slower, but it presumably leads to largely the same problems.
Assuming something at least as clever as a clever human doesn’t have access to something just because you think you’ve covered the holes you’re aware of is dangerous.
Sure. The point I was trying to make isn’t “let’s assume that the AI doesn’t have access to anything we don’t want it to have access to,” it’s “let’s weaken the premises necessary to lead to the conclusion that an AI can simulate self-modifications.”
It’s not clear to me what you mean by “turn itself off” here if the AI doesn’t have direct access to whatever architecture it’s running on. I would phrase the point slightly differently: an AI can always write a completely different program from scratch and then commit to simulating it if it ever determines that this is a reasonable thing to do. This wouldn’t be entirely equivalent to actual self-modification because it might be slower, but it presumably leads to largely the same problems.
Assuming something at least as clever as a clever human doesn’t have access to something just because you think you’ve covered the holes you’re aware of is dangerous.
Sure. The point I was trying to make isn’t “let’s assume that the AI doesn’t have access to anything we don’t want it to have access to,” it’s “let’s weaken the premises necessary to lead to the conclusion that an AI can simulate self-modifications.”