Whatever the ultimate relevance of such ideas, it is clearly possible to divorce the notion of Friendly AI from all of them.
Take for example Pascal’s mugging, if you can’t solve it then you need to implement a hack that is largely based on human intuition. Therefore in order to estimate the possibility of solving friendly AI, or to distinguish it from the implementation of fail-safe mechanisms, one needs to account for the difficulty in solving all those sub-problems you mentioned.
As multifoliaterose wrote, we don’t even know “how one would start to research the problem of getting a hypothetical AGI to recognize humans as distinguished beings.”
...it is simply an exercise in creating artificial intelligence that does the right thing.
Yes, but solving metaethics, to figure out what we mean when we use the word “right”, already seems to be ridiculously difficult.
What you need to show is that there is a possibility to solve friendly AI before someone stumbles upon AGI. A possibility that would outweigh the (in my opinion) vastly less effective but easier possibility of creating fail-safe mechanisms that might prevent a full-scale extinction scenario or help us to employ an AGI to solve friendly AI.
Take for example Pascal’s mugging, if you can’t solve it then you need to implement a hack that is largely based on human intuition.
What’s the problem? Someone tells you about huge utility—they are probably trying to manipulate you. Tell them to show you the utility. That does not seem to be much of a hack.
Take for example Pascal’s mugging, if you can’t solve it then you need to implement a hack that is largely based on human intuition. Therefore in order to estimate the possibility of solving friendly AI, or to distinguish it from the implementation of fail-safe mechanisms, one needs to account for the difficulty in solving all those sub-problems you mentioned.
As multifoliaterose wrote, we don’t even know “how one would start to research the problem of getting a hypothetical AGI to recognize humans as distinguished beings.”
Yes, but solving metaethics, to figure out what we mean when we use the word “right”, already seems to be ridiculously difficult.
What you need to show is that there is a possibility to solve friendly AI before someone stumbles upon AGI. A possibility that would outweigh the (in my opinion) vastly less effective but easier possibility of creating fail-safe mechanisms that might prevent a full-scale extinction scenario or help us to employ an AGI to solve friendly AI.
What’s the problem? Someone tells you about huge utility—they are probably trying to manipulate you. Tell them to show you the utility. That does not seem to be much of a hack.