Furthermore, I might find myself in a situation where I was highly confident that Quirrell could not produce a self destruct in < n moves, but was uncertain of the result after that.
This seems to me much like probabilistic reasoning in a game like Chess or Go. There are moves that make things more or less complicated. There are moves that are clearly stable, and equally clearly sufficient or insufficient. Sometimes you take the unstable move, not knowing whether it’s good or bad, because the stable moves are insufficient. Sometimes you simply can’t read far enough into the future, and you have to hope that your opponent left you this option because they couldn’t either.
It seems to me to be relevant not only what Quirrell’s motivations are, but how much smarter he is. When playing Go, you don’t take the unstable moves against a stronger player. You trade your lead from the handicap stones for a more stable board position, and you keep making that trade until the game ends, and you hope you traded well enough—not that you got the good end of the trade, because you didn’t need to.
In the real world… What certainty of safety do you need on a strong AI that will stop all the suffering in the world, when screwing up will end the human race? Is one chance in a thousand good enough? How about one chance in 10^9? Certainly that question has an answer, and the answer isn’t complete certainty.
That said… I’m not sure allowing any sort of reasoning with uncertainty over future modifications helps anything here. In fact, I think it makes the problem harder.
It seems to me to be relevant not only what Quirrell’s motivations are, but how much smarter he is.
Right, I’m assuming insanely much.
Certainly that question has an answer, and the answer isn’t complete certainty. That said… I’m not sure allowing any sort of reasoning with uncertainty over future modifications helps anything here. In fact, I think it makes the problem harder.
I agree completely. The real problem we have to solve is much harder than the toy scenario in this post; the point of the toy scenario was to help focusing on one particular aspect of the problem.
Furthermore, I might find myself in a situation where I was highly confident that Quirrell could not produce a self destruct in < n moves, but was uncertain of the result after that.
This seems to me much like probabilistic reasoning in a game like Chess or Go. There are moves that make things more or less complicated. There are moves that are clearly stable, and equally clearly sufficient or insufficient. Sometimes you take the unstable move, not knowing whether it’s good or bad, because the stable moves are insufficient. Sometimes you simply can’t read far enough into the future, and you have to hope that your opponent left you this option because they couldn’t either.
It seems to me to be relevant not only what Quirrell’s motivations are, but how much smarter he is. When playing Go, you don’t take the unstable moves against a stronger player. You trade your lead from the handicap stones for a more stable board position, and you keep making that trade until the game ends, and you hope you traded well enough—not that you got the good end of the trade, because you didn’t need to.
In the real world… What certainty of safety do you need on a strong AI that will stop all the suffering in the world, when screwing up will end the human race? Is one chance in a thousand good enough? How about one chance in 10^9? Certainly that question has an answer, and the answer isn’t complete certainty.
That said… I’m not sure allowing any sort of reasoning with uncertainty over future modifications helps anything here. In fact, I think it makes the problem harder.
Right, I’m assuming insanely much.
I agree completely. The real problem we have to solve is much harder than the toy scenario in this post; the point of the toy scenario was to help focusing on one particular aspect of the problem.