Thanks a lot for another great episode! I want to be clear (because you don’t always get that many comments) that I think you’re doing a great service to the field. These podcasts are great way for beginners to get a better grip on complex topics, and even for people who know about some of the ideas, the length and the depth of the conversation usually unravels cool new takes.
This might suggest a mitigation strategy of minimizing the degree to which AI systems have large effects on the world that absolutely necessary for achieving their objective.
Not sure what the actual sentence you wanted to write was. “are not absolutely necessary” maybe?
On the relative reachability, maybe it’s because I’m reading Alex Turner’s power-seeking paper at the moment, but I wonder if there is an interesting relationship. The latter focus on showing when and why actions that “seek power” (which intuitively means “lead to more choices”) are favored by optimal policies, which seems to also study how optimal policies will maintain relative reachability. But that’s probably also showing that relative reachability is not what you want, because it is coherent with power seeking, a pretty big side effect. (Victoria and Alex probably thought about that quite a lot, but maybe it’s interesting for some readers here).
Reading a bit further, I guess that’s partially addressed by the discussion of the Sushi environment. But not completely: even compared to the inaction baseline, if one of two actions reduces my options less than the other, this is both power-seeking and incentivized here AFAIU.
And Victoria actually points it out at the end
Daniel Filan: Victoria would also like to mention that if we don’t manage to eliminate the interference incentive, then the agent may have an incentive to gain control of its environment to keep it stable and prevent humans from eliminating options, which is not great for human autonomy.
Well, as I understand it, RR usually penalizes wrt a baseline—it doesn’t penalize absolute loss in reachability. If the agent can reach states s1 - s1000 if it follows the baseline policy, that means it’ll penalize losing access to (states similar to) those states. That doesn’t mean RR will make the agent maximize its options. In fact, if the baseline doesn’t involve power-seeking, some variants of RR will penalize the agent for maximizing its available options / power.
I think RR may have interesting interactions with the reachability preservation tendencies exposed by my power-seeking theorems. However, controlling for other design choices, I don’t think the results suggest that RR incentivizes power-seeking. I don’t currently think the power-seeking theorems suggest that either RR or AUP is particularly bad wrt power-seeking behavior.
Thanks a lot for another great episode! I want to be clear (because you don’t always get that many comments) that I think you’re doing a great service to the field. These podcasts are great way for beginners to get a better grip on complex topics, and even for people who know about some of the ideas, the length and the depth of the conversation usually unravels cool new takes.
Not sure what the actual sentence you wanted to write was. “are not absolutely necessary” maybe?
On the relative reachability, maybe it’s because I’m reading Alex Turner’s power-seeking paper at the moment, but I wonder if there is an interesting relationship. The latter focus on showing when and why actions that “seek power” (which intuitively means “lead to more choices”) are favored by optimal policies, which seems to also study how optimal policies will maintain relative reachability. But that’s probably also showing that relative reachability is not what you want, because it is coherent with power seeking, a pretty big side effect. (Victoria and Alex probably thought about that quite a lot, but maybe it’s interesting for some readers here).
Reading a bit further, I guess that’s partially addressed by the discussion of the Sushi environment. But not completely: even compared to the inaction baseline, if one of two actions reduces my options less than the other, this is both power-seeking and incentivized here AFAIU.
And Victoria actually points it out at the end
Well, as I understand it, RR usually penalizes wrt a baseline—it doesn’t penalize absolute loss in reachability. If the agent can reach states s1 - s1000 if it follows the baseline policy, that means it’ll penalize losing access to (states similar to) those states. That doesn’t mean RR will make the agent maximize its options. In fact, if the baseline doesn’t involve power-seeking, some variants of RR will penalize the agent for maximizing its available options / power.
I think RR may have interesting interactions with the reachability preservation tendencies exposed by my power-seeking theorems. However, controlling for other design choices, I don’t think the results suggest that RR incentivizes power-seeking. I don’t currently think the power-seeking theorems suggest that either RR or AUP is particularly bad wrt power-seeking behavior.
Thanks for the answer, that makes sense.
You’re quite right, let me fix that.
And also thanks for your kind words :)