timtyler comments on New FAI paper: ‘Learning What to Value’ by Daniel Dewey

timtyler 6 May 2011 21:17 UTC
0 points

It looks like I have some explaining to do if I want to convince you that O-maximizers aren’t a subset of reward maximizers—in particular, that my argument in appendix B doesn’t apply to O-maximizers.

To recap, my position is that both expected reward maximisers and expected utility maximisers are universal learners—and so can perform practically any series of non-self-destructive actions in a configurable manner in response to inputs. So, I don’t think either system necessarily exhibits the “characteristic behaviour” you describe.