Stuart_Armstrong comments on Siren worlds and the perils of over-optimised search

Stuart_Armstrong 16 May 2014 10:33 UTC
1 point
I’ve been thinking about this, and I haven’t found any immediately useful way of using your idea, but I’ll keep it in the back of my mind… We haven’t found a good way of identifying agency in the abstract sense (“was cosmic phenonmena X caused by an agent, and if so, which one?” kind of stuff), so this might be a useful simpler problem...
- [deleted] 16 May 2014 14:35 UTC
  2 points
  Parent
  Upon further research, it turns out that preference learning is a field within machine learning, so we can actually try to address this at a much more formal level. That would also get us another benefit: supervised learning algorithms don’t wirehead.
  
  Notably, this fits with our intuition that morality must be “taught” (ie: via labelled data) to actual human children, lest they simply decide that the Good and the Right consists of eating a whole lot of marshmallows.
  
  And if we put that together with a conservation heuristic for acting under moral uncertainty (say: optimize for expectedly moral expected utility, thus requiring higher moral certainty for less-extreme moral decisions), we might just start to make some headway on managing to construct utility functions that would mathematically reflect what their operators actually intend for them to do.
  
  I also have an idea written down in my notebook, which I’ve been refining, that sort of extends from what Luke had written down here. Would it be worth a post?