I assume A gets reward whenever a robot is on the button?
Yes. If A needs to be there in person, then SA can carry it there (after suitably crippling it).
Is the +1 a typo?
Yes, thanks; re-written it to be Ωγk+1.
I’d also like to note that, while implicitly expanding the action space in the way you did (e.g. ”A can issue requests to SA, and also program arbitrary non-Markovian policies into it”) is valid, I want to explicitly point it out.
Yep. That’s a subset of “It can use its arms to manipulate anything in the eight squares around itself.”, but it’s worth pointing it out explicitly.
Yes. If A needs to be there in person, then SA can carry it there (after suitably crippling it).
Yes, thanks; re-written it to be Ωγk+1.
Yep. That’s a subset of “It can use its arms to manipulate anything in the eight squares around itself.”, but it’s worth pointing it out explicitly.