It looks like I have some explaining to do if I want to convince you that O-maximizers aren’t a subset of reward maximizers—in particular, that my argument in appendix B doesn’t apply to O-maximizers.
To recap, my position is that both expected reward maximisers and expected utility maximisers are universal learners—and so can perform practically any series of non-self-destructive actions in a configurable manner in response to inputs. So, I don’t think either system necessarily exhibits the “characteristic behaviour” you describe.
To recap, my position is that both expected reward maximisers and expected utility maximisers are universal learners—and so can perform practically any series of non-self-destructive actions in a configurable manner in response to inputs. So, I don’t think either system necessarily exhibits the “characteristic behaviour” you describe.