I spoke with Huw about this idea.
I was thinking along similar lines at some point, but only for “safe-shutdown”, e.g. if you had a self-driving car that anticipated encountering a dangerous situation and wanted to either:
pull over immediately
cede control to a human operator
It seems intuitive to give it a shutdown policy that triggers in such cases, and that aims to minimize a combined objective of time-to-shutdown and risk-of-shutdown.
(Of course, this doesn’t deal with interrupting the agent, ala Armstrong and Orseau.)
Huw pointed out that a similar strategy can be used for any “genie”-style goal (i.e. you want an agent to do one thing as efficiently as possible, and then shut-down until you give it another command), which made me substantially more interested in it.
This seems similar in spirit to giving your agent a short horizon, but now you also have regular terminations, by default, which has some extra pros and cons.
I spoke with Huw about this idea. I was thinking along similar lines at some point, but only for “safe-shutdown”, e.g. if you had a self-driving car that anticipated encountering a dangerous situation and wanted to either:
pull over immediately
cede control to a human operator
It seems intuitive to give it a shutdown policy that triggers in such cases, and that aims to minimize a combined objective of time-to-shutdown and risk-of-shutdown. (Of course, this doesn’t deal with interrupting the agent, ala Armstrong and Orseau.)
Huw pointed out that a similar strategy can be used for any “genie”-style goal (i.e. you want an agent to do one thing as efficiently as possible, and then shut-down until you give it another command), which made me substantially more interested in it.
This seems similar in spirit to giving your agent a short horizon, but now you also have regular terminations, by default, which has some extra pros and cons.