I think there may be another leftover from the old setup:
We are interested in creating agents that robustly do not press the button.
Shouldn’t this be interested in creating agents that robustly do press the button? I.e. then they’re reliably myopic. Or am I misunderstanding something?
I don’t understand—isn’t the opposite true here?
Yep—I switched the setup at some point and forgot to switch this sentence. Thanks.
I think there may be another leftover from the old setup:
Shouldn’t this be interested in creating agents that robustly do press the button? I.e. then they’re reliably myopic. Or am I misunderstanding something?
Yep, thanks. Fixed.