Why can’t it weight actions based on what we as a society want/like/approve/consent/condone? A behavioristic learner, with reward/punishment and an intention to preserve the semantic significance of the reward/punishment channel.
Most obviously, it’s very easy for a powerful AI to take unexpected control of the reward/punishment channel, and trivial for a superintelligent AGI to do so in Very Bad ways. You’ve tried to block the basic version of this—an AGI pressing its own “society liked this” button—with the phrase ‘semantic significance’, but that’s not really a codable concept. If the AGI isn’t allowed to press the button itself, it might build a machine that would do so. If it isn’t allowed to do that, it might wirehead a human into doing so. If it isn’t allowed /that/, it might put a human near a Paradise Machine and only let them into the box when the button had been pressed. If the AGI’s reward is based on the number of favorable news reports, now you have an AGI that’s rewarded for manipulating its own media coverage. So on, and so forth.
The sort of semantic significance you’re talking about is a pretty big part of Friendliness theory.
The deeper problem is that the things our society wants aren’t necessarily Friendly, especially when extrapolated. One of the secondary benefits of Friendliness research is that it requires the examination of our own interests.
Its policy would be volatile, or at least, more volatile than the common understanding LW has of a set-in-stone utility function.
The ‘set-in-stone’ nature of a utility function is actually a desired benefit, albeit a difficult one to achieve (“Lob’s Problem” and the more general issue of value drift). A machine with undirected volatility in its utility function will take random variations in its choices, and there are orders of magnitude more wrong random answers than correct ones on this matter.
If you can direct the drift, that’s less of an issue, but then you could just make /that/ direction the utility function.
Where did LW’s/EY’s concept of utility function come from, and why did they assume it was an essential part of AI?
The basic idea of goal maximization is a fairly common thing when working with evolutionary algorithms (see XKCD for a joking example), because it’s such a useful model. While there are other types of possible minds, maximizers of /some/ kind with unbounded or weakly bounded potential are the most relevant to MIRI’s concerns because they have the greatest potential for especially useful and especially harmful results.
Most obviously, it’s very easy for a powerful AI to take unexpected control of the reward/punishment channel, and trivial for a superintelligent AGI to do so in Very Bad ways. You’ve tried to block the basic version of this—an AGI pressing its own “society liked this” button—with the phrase ‘semantic significance’, but that’s not really a codable concept. If the AGI isn’t allowed to press the button itself, it might build a machine that would do so. If it isn’t allowed to do that, it might wirehead a human into doing so. If it isn’t allowed /that/, it might put a human near a Paradise Machine and only let them into the box when the button had been pressed. If the AGI’s reward is based on the number of favorable news reports, now you have an AGI that’s rewarded for manipulating its own media coverage. So on, and so forth.
The sort of semantic significance you’re talking about is a pretty big part of Friendliness theory.
The deeper problem is that the things our society wants aren’t necessarily Friendly, especially when extrapolated. One of the secondary benefits of Friendliness research is that it requires the examination of our own interests.
The ‘set-in-stone’ nature of a utility function is actually a desired benefit, albeit a difficult one to achieve (“Lob’s Problem” and the more general issue of value drift). A machine with undirected volatility in its utility function will take random variations in its choices, and there are orders of magnitude more wrong random answers than correct ones on this matter.
If you can direct the drift, that’s less of an issue, but then you could just make /that/ direction the utility function.
The basic idea of goal maximization is a fairly common thing when working with evolutionary algorithms (see XKCD for a joking example), because it’s such a useful model. While there are other types of possible minds, maximizers of /some/ kind with unbounded or weakly bounded potential are the most relevant to MIRI’s concerns because they have the greatest potential for especially useful and especially harmful results.