How likely is the “Everyone ends up hooked up to morphine machines and kept alive forever” scenario? Is it considered less likely than extinction for example?
Obviously it doesn’t have to be specifically that, but something to the affect of it.
Also, is this scenario included as an existential risk in the overall X-risk estimates that people make?
I’d reckon it a lot less likely, just because it’s a lot simpler to kill everyone than to keep them alive and hooked up to a machine. That is, there are lots and lots of scenarios where everyone is dead, but a lot fewer where people end up wireheaded.
Wireheading is often given as a specific example of outer misalignment, where the agent was told to make people happy and does it in a very unorthodox manner.
I do pretty much mean wireheading, but also similar situations where the AI doesn’t go as far as wireheading, like making us eat chocolate forever.
I feel like these scenarios can be broken down into two categories, scenarios where the AI succeeds in “making us happy”, but through unorthodox means, and scenarios where the AI tries, but fails, to “make us happy” which can quickly go into S-risk territory.
The main reason why I wondered if the chance of these kind of outcomes might be fairly high was because “make people happy” seems like the kind of goal a lot of people would give an AGI, either because they don’t believe or understand the risks or because they think it is aligned to be safe and not wirehead people for example.
Perhaps, as another question in this thread talks about, making a wireheading AGI might be an easier target than more commonly touted alignment goals and maybe it would be decided that it is preferable to extinction or disempowerment or whatever.
Making a wireheading AGI probably would be easier than getting a properly aligned one, because maximisers are generally simpler than properly aligned AGIS, since they have fewer things to do correctly (I’m being very vague here—sorry).
That being said, having a coherent target is a different problem than being able to aim it in the first place. Both are very important, but it seems that being able to tell an AI to do something and being quite confident in it doing so (with the ability to correct it in case of problems).
I’m cynical, but I reckon that giving a goal like “make people happy” is less likely than “make me rich” or “make me powerful”.
How likely is the “Everyone ends up hooked up to morphine machines and kept alive forever” scenario? Is it considered less likely than extinction for example?
Obviously it doesn’t have to be specifically that, but something to the affect of it.
Also, is this scenario included as an existential risk in the overall X-risk estimates that people make?
Do you mean something more specific than general wireheading?
I’d reckon it a lot less likely, just because it’s a lot simpler to kill everyone than to keep them alive and hooked up to a machine. That is, there are lots and lots of scenarios where everyone is dead, but a lot fewer where people end up wireheaded.
Wireheading is often given as a specific example of outer misalignment, where the agent was told to make people happy and does it in a very unorthodox manner.
I do pretty much mean wireheading, but also similar situations where the AI doesn’t go as far as wireheading, like making us eat chocolate forever.
I feel like these scenarios can be broken down into two categories, scenarios where the AI succeeds in “making us happy”, but through unorthodox means, and scenarios where the AI tries, but fails, to “make us happy” which can quickly go into S-risk territory.
The main reason why I wondered if the chance of these kind of outcomes might be fairly high was because “make people happy” seems like the kind of goal a lot of people would give an AGI, either because they don’t believe or understand the risks or because they think it is aligned to be safe and not wirehead people for example.
Perhaps, as another question in this thread talks about, making a wireheading AGI might be an easier target than more commonly touted alignment goals and maybe it would be decided that it is preferable to extinction or disempowerment or whatever.
Making a wireheading AGI probably would be easier than getting a properly aligned one, because maximisers are generally simpler than properly aligned AGIS, since they have fewer things to do correctly (I’m being very vague here—sorry).
That being said, having a coherent target is a different problem than being able to aim it in the first place. Both are very important, but it seems that being able to tell an AI to do something and being quite confident in it doing so (with the ability to correct it in case of problems).
I’m cynical, but I reckon that giving a goal like “make people happy” is less likely than “make me rich” or “make me powerful”.