Very interesting paper, congratulations on the collaboration.
I have a question about theta. When you initially introduce it, theta lies in [0,1]. But it seems that if you choose theta = (0n)n, just a sequence of 0s, all policies are interruptible. Is there much reason to initially allow such a wide ranging theta—why not restrict them to converge to 1 from the very beginning? (Or have I just totally missed the point?)
We’re working on the theta problem at the moment. Basically we’re currently defining interruptibility in terms of convergence to optimality. Hence we need the agent to explore sufficiently, hence we can’t set theta=1. But we want to be able to interrupt the agent in practice, so we want theta to tend to one.
Yup, I think I understand that, and agree you need to at least tend to one. I’m just wondering why you initially use the loser definition of theta (where it doesn’t need to tend to one, and can instead be just 0 )
Very interesting paper, congratulations on the collaboration.
I have a question about theta. When you initially introduce it, theta lies in [0,1]. But it seems that if you choose theta = (0n)n, just a sequence of 0s, all policies are interruptible. Is there much reason to initially allow such a wide ranging theta—why not restrict them to converge to 1 from the very beginning? (Or have I just totally missed the point?)
We’re working on the theta problem at the moment. Basically we’re currently defining interruptibility in terms of convergence to optimality. Hence we need the agent to explore sufficiently, hence we can’t set theta=1. But we want to be able to interrupt the agent in practice, so we want theta to tend to one.
Yup, I think I understand that, and agree you need to at least tend to one. I’m just wondering why you initially use the loser definition of theta (where it doesn’t need to tend to one, and can instead be just 0 )
When defining safe interruptibility, we let theta tend to 1. We probably didn’t specify that earlier, when we were just introducing the concept?