danieldewey comments on New FAI paper: ‘Learning What to Value’ by Daniel Dewey

danieldewey 3 May 2011 20:31 UTC
3 points
Thanks for posting this around! It’s great to see it creating discussion.

I’m working on replies to the points you, Bill Hibbard, and Curt Welch have made. It looks like I have some explaining to do if I want to convince you that O-maximizers aren’t a subset of reward maximizers—in particular, that my argument in appendix B doesn’t apply to O-maximizers.
- timtyler 6 May 2011 21:17 UTC
  0 points
  Parent
  
  It looks like I have some explaining to do if I want to convince you that O-maximizers aren’t a subset of reward maximizers—in particular, that my argument in appendix B doesn’t apply to O-maximizers.
  
  To recap, my position is that both expected reward maximisers and expected utility maximisers are universal learners—and so can perform practically any series of non-self-destructive actions in a configurable manner in response to inputs. So, I don’t think either system necessarily exhibits the “characteristic behaviour” you describe.