I’ve been thinking about this, and I haven’t found any immediately useful way of using your idea, but I’ll keep it in the back of my mind… We haven’t found a good way of identifying agency in the abstract sense (“was cosmic phenonmena X caused by an agent, and if so, which one?” kind of stuff), so this might be a useful simpler problem...
Upon further research, it turns out that preference learning is a field within machine learning, so we can actually try to address this at a much more formal level. That would also get us another benefit: supervised learning algorithms don’t wirehead.
Notably, this fits with our intuition that morality must be “taught” (ie: via labelled data) to actual human children, lest they simply decide that the Good and the Right consists of eating a whole lot of marshmallows.
And if we put that together with a conservation heuristic for acting under moral uncertainty (say: optimize for expectedly moral expected utility, thus requiring higher moral certainty for less-extreme moral decisions), we might just start to make some headway on managing to construct utility functions that would mathematically reflect what their operators actually intend for them to do.
I also have an idea written down in my notebook, which I’ve been refining, that sort of extends from what Luke had written down here. Would it be worth a post?
I’ve been thinking about this, and I haven’t found any immediately useful way of using your idea, but I’ll keep it in the back of my mind… We haven’t found a good way of identifying agency in the abstract sense (“was cosmic phenonmena X caused by an agent, and if so, which one?” kind of stuff), so this might be a useful simpler problem...
Upon further research, it turns out that preference learning is a field within machine learning, so we can actually try to address this at a much more formal level. That would also get us another benefit: supervised learning algorithms don’t wirehead.
Notably, this fits with our intuition that morality must be “taught” (ie: via labelled data) to actual human children, lest they simply decide that the Good and the Right consists of eating a whole lot of marshmallows.
And if we put that together with a conservation heuristic for acting under moral uncertainty (say: optimize for expectedly moral expected utility, thus requiring higher moral certainty for less-extreme moral decisions), we might just start to make some headway on managing to construct utility functions that would mathematically reflect what their operators actually intend for them to do.
I also have an idea written down in my notebook, which I’ve been refining, that sort of extends from what Luke had written down here. Would it be worth a post?