Larks comments on Google Deepmind and FHI collaborate to present research at UAI 2016

Larks 10 Jul 2016 3:33 UTC
0 points
Very interesting paper, congratulations on the collaboration.

I have a question about theta. When you initially introduce it, theta lies in [0,1]. But it seems that if you choose theta = (0n)n, just a sequence of 0s, all policies are interruptible. Is there much reason to initially allow such a wide ranging theta—why not restrict them to converge to 1 from the very beginning? (Or have I just totally missed the point?)
- Stuart_Armstrong 10 Jul 2016 5:09 UTC
  0 points
  Parent
  We’re working on the theta problem at the moment. Basically we’re currently defining interruptibility in terms of convergence to optimality. Hence we need the agent to explore sufficiently, hence we can’t set theta=1. But we want to be able to interrupt the agent in practice, so we want theta to tend to one.
  - Larks 12 Jul 2016 0:18 UTC
    0 points
    Parent
    Yup, I think I understand that, and agree you need to at least tend to one. I’m just wondering why you initially use the loser definition of theta (where it doesn’t need to tend to one, and can instead be just 0 )
    - Stuart_Armstrong 12 Jul 2016 13:50 UTC
      0 points
      Parent
      When defining safe interruptibility, we let theta tend to 1. We probably didn’t specify that earlier, when we were just introducing the concept?