Designing an agent which is guaranteed to terminate is not, in itself, a solution to AI safety. Indeed, this desideratum is already satisfied by Minsky’s ultimate machine. At the very least, we have to design an agent which will be powerful enough to permanently defend us against malicious AIs without adverse side effects. So, we can indeed have AIs that are incentivized to complete some task in a short amount of time, but it is not clear how to formulate the task of “defending against malicious AIs” for such an agent. The closest thing is probably Paul Christiano’s approval-directed agents, where the AI generates some output (e.g. plan of defense against malicious AIs) which a human has to approve. There are problems with this: for one thing, a plan which a human would approve might still be a bad plan (or even a dangerous memetic virus), for another, the module inside the AI responsible for modeling humans is susceptible to acausal attack.
Designing an agent which is guaranteed to terminate is not, in itself, a solution to AI safety. Indeed, this desideratum is already satisfied by Minsky’s ultimate machine. At the very least, we have to design an agent which will be powerful enough to permanently defend us against malicious AIs without adverse side effects. So, we can indeed have AIs that are incentivized to complete some task in a short amount of time, but it is not clear how to formulate the task of “defending against malicious AIs” for such an agent. The closest thing is probably Paul Christiano’s approval-directed agents, where the AI generates some output (e.g. plan of defense against malicious AIs) which a human has to approve. There are problems with this: for one thing, a plan which a human would approve might still be a bad plan (or even a dangerous memetic virus), for another, the module inside the AI responsible for modeling humans is susceptible to acausal attack.
I agree it’s not a complete solution, but it might be a good path towards creating a task-AI, which is a potentially important unsolved sub-problem.