Some care needs to be taken to define the behavior if T enters an infinite loop–we want to minimize the probability that the human accidentally hangs the terminal, with catastrophic consequences, but we cannot provide a complete safety-net without running into unresolvable issues with self-reference.
If I cause an infinite loop in my python shell, my computer does not crash and I can just kill the shell and start what I was doing over from the beginning. I don’t understand why this wouldn’t work in your scenario.
You kill the shell not when it is in an infinite loop, but when it takes more than a few seconds to run. We can set up such a safety net, allowing the human to run anything that takes (say) less than a million years to run, without risk of crashing. This is the sort of thing I was referring to by “some care.”
Ultimately we do want the human to be able to run arbitrarily expensive subroutines, which prohibits using any heuristic of the form “stop this computation if it goes on for more than N steps.”
Ultimately we do want the human to be able to run arbitrarily expensive subroutines, which prohibits using any heuristic of the form “stop this computation if it goes on for more than N steps.”
What if we keep this heuristic but also define T to have an instruction that is equivalent to calling a halting-problem oracle (with each call counting as one step)? Of course that makes it harder for the outer AGI to reason about how to maximize its utility, but the increase in difficulty doesn’t seem very large relative to the difficulty in the original proposal.
If I cause an infinite loop in my python shell, my computer does not crash and I can just kill the shell and start what I was doing over from the beginning. I don’t understand why this wouldn’t work in your scenario.
You kill the shell not when it is in an infinite loop, but when it takes more than a few seconds to run. We can set up such a safety net, allowing the human to run anything that takes (say) less than a million years to run, without risk of crashing. This is the sort of thing I was referring to by “some care.”
Ultimately we do want the human to be able to run arbitrarily expensive subroutines, which prohibits using any heuristic of the form “stop this computation if it goes on for more than N steps.”
What if we keep this heuristic but also define T to have an instruction that is equivalent to calling a halting-problem oracle (with each call counting as one step)? Of course that makes it harder for the outer AGI to reason about how to maximize its utility, but the increase in difficulty doesn’t seem very large relative to the difficulty in the original proposal.