I do pretty much mean wireheading, but also similar situations where the AI doesn’t go as far as wireheading, like making us eat chocolate forever.
I feel like these scenarios can be broken down into two categories, scenarios where the AI succeeds in “making us happy”, but through unorthodox means, and scenarios where the AI tries, but fails, to “make us happy” which can quickly go into S-risk territory.
The main reason why I wondered if the chance of these kind of outcomes might be fairly high was because “make people happy” seems like the kind of goal a lot of people would give an AGI, either because they don’t believe or understand the risks or because they think it is aligned to be safe and not wirehead people for example.
Perhaps, as another question in this thread talks about, making a wireheading AGI might be an easier target than more commonly touted alignment goals and maybe it would be decided that it is preferable to extinction or disempowerment or whatever.
Making a wireheading AGI probably would be easier than getting a properly aligned one, because maximisers are generally simpler than properly aligned AGIS, since they have fewer things to do correctly (I’m being very vague here—sorry).
That being said, having a coherent target is a different problem than being able to aim it in the first place. Both are very important, but it seems that being able to tell an AI to do something and being quite confident in it doing so (with the ability to correct it in case of problems).
I’m cynical, but I reckon that giving a goal like “make people happy” is less likely than “make me rich” or “make me powerful”.
I do pretty much mean wireheading, but also similar situations where the AI doesn’t go as far as wireheading, like making us eat chocolate forever.
I feel like these scenarios can be broken down into two categories, scenarios where the AI succeeds in “making us happy”, but through unorthodox means, and scenarios where the AI tries, but fails, to “make us happy” which can quickly go into S-risk territory.
The main reason why I wondered if the chance of these kind of outcomes might be fairly high was because “make people happy” seems like the kind of goal a lot of people would give an AGI, either because they don’t believe or understand the risks or because they think it is aligned to be safe and not wirehead people for example.
Perhaps, as another question in this thread talks about, making a wireheading AGI might be an easier target than more commonly touted alignment goals and maybe it would be decided that it is preferable to extinction or disempowerment or whatever.
Making a wireheading AGI probably would be easier than getting a properly aligned one, because maximisers are generally simpler than properly aligned AGIS, since they have fewer things to do correctly (I’m being very vague here—sorry).
That being said, having a coherent target is a different problem than being able to aim it in the first place. Both are very important, but it seems that being able to tell an AI to do something and being quite confident in it doing so (with the ability to correct it in case of problems).
I’m cynical, but I reckon that giving a goal like “make people happy” is less likely than “make me rich” or “make me powerful”.