Well, as I understand it, RR usually penalizes wrt a baseline—it doesn’t penalize absolute loss in reachability. If the agent can reach states s1 - s1000 if it follows the baseline policy, that means it’ll penalize losing access to (states similar to) those states. That doesn’t mean RR will make the agent maximize its options. In fact, if the baseline doesn’t involve power-seeking, some variants of RR will penalize the agent for maximizing its available options / power.
I think RR may have interesting interactions with the reachability preservation tendencies exposed by my power-seeking theorems. However, controlling for other design choices, I don’t think the results suggest that RR incentivizes power-seeking. I don’t currently think the power-seeking theorems suggest that either RR or AUP is particularly bad wrt power-seeking behavior.
Well, as I understand it, RR usually penalizes wrt a baseline—it doesn’t penalize absolute loss in reachability. If the agent can reach states s1 - s1000 if it follows the baseline policy, that means it’ll penalize losing access to (states similar to) those states. That doesn’t mean RR will make the agent maximize its options. In fact, if the baseline doesn’t involve power-seeking, some variants of RR will penalize the agent for maximizing its available options / power.
I think RR may have interesting interactions with the reachability preservation tendencies exposed by my power-seeking theorems. However, controlling for other design choices, I don’t think the results suggest that RR incentivizes power-seeking. I don’t currently think the power-seeking theorems suggest that either RR or AUP is particularly bad wrt power-seeking behavior.
Thanks for the answer, that makes sense.