Create a moral framework that incentivizes assholes to cooperate.
Specifically, create a set of laws for a “community”, with the laws applying only to members, that would attract finance guys, successful “unicorn” startup owners, politicians, drug dealers at the “regional manager” level, and other assholes.
Win condition: a “trust app” that everyone uses, that tells users how trustworthy every single person they meet is.
Lose condition: startup fund assholes end up with majority ownership of the first smarter-than-human-level general AI, and no one’s given smart people an incentive not to hurt dumb people.
If you can’t incentivize smart selfish people to “cooperate” instead of “defect”, then why do you think you can incentivize an AI to be friendly? What’s to stop a troll from deleting the “Friendly” part the second the AI source code hits the Internet? Keep in mind that the 4chan community has a similar ethos to LW: namely “anything that can be destroyed by a basement dweller should be”.
Making the “trust app” would be a great thing. I spent some time thinking about it, but my sad conclusion is that as soon as the app would become popular, it would fail somehow. For example, if it is not anonymous, people could use real-world pressures to force people to give them positive ratings. The psychopaths would threaten to sue people who label them as psychopaths, or even use violence directly against them. On the other hand, if the ratings are anonymous, a charming psychopath could sic their followers to give many negative ratings to their enemy. At the end, the ratings of a psychopath who hurt many people could look pretty similar to ratings of a decent person who pissed off a vengeful psychopath.
Not sure what to do here. Maybe the usage itself of the “trust app” should be an information you only tell your trusted friends; and maybe create different personas for each group of friends. But then the whole network becomes sparse, so you will not be able to get information on most people you will care about. Also, there is still a risk that if the app becomes popular, there will be a social pressure to create an official persona, which will be further pressured to give socially acceptable ratings. (Your friends will still know your secret persona, but because of the sparse network, it will be mostly useless to them anyway.)
Modest proposal for Friendly AI research:
Create a moral framework that incentivizes assholes to cooperate.
Specifically, create a set of laws for a “community”, with the laws applying only to members, that would attract finance guys, successful “unicorn” startup owners, politicians, drug dealers at the “regional manager” level, and other assholes.
Win condition: a “trust app” that everyone uses, that tells users how trustworthy every single person they meet is.
Lose condition: startup fund assholes end up with majority ownership of the first smarter-than-human-level general AI, and no one’s given smart people an incentive not to hurt dumb people.
If you can’t incentivize smart selfish people to “cooperate” instead of “defect”, then why do you think you can incentivize an AI to be friendly? What’s to stop a troll from deleting the “Friendly” part the second the AI source code hits the Internet? Keep in mind that the 4chan community has a similar ethos to LW: namely “anything that can be destroyed by a basement dweller should be”.
So, capitalism?
That seems like a horrible idea.
We can, of course, just not unconditionally and not all the time. Creatures which always cooperate are social insects.
Unrelated to AI:
Making the “trust app” would be a great thing. I spent some time thinking about it, but my sad conclusion is that as soon as the app would become popular, it would fail somehow. For example, if it is not anonymous, people could use real-world pressures to force people to give them positive ratings. The psychopaths would threaten to sue people who label them as psychopaths, or even use violence directly against them. On the other hand, if the ratings are anonymous, a charming psychopath could sic their followers to give many negative ratings to their enemy. At the end, the ratings of a psychopath who hurt many people could look pretty similar to ratings of a decent person who pissed off a vengeful psychopath.
Not sure what to do here. Maybe the usage itself of the “trust app” should be an information you only tell your trusted friends; and maybe create different personas for each group of friends. But then the whole network becomes sparse, so you will not be able to get information on most people you will care about. Also, there is still a risk that if the app becomes popular, there will be a social pressure to create an official persona, which will be further pressured to give socially acceptable ratings. (Your friends will still know your secret persona, but because of the sparse network, it will be mostly useless to them anyway.)
A trust app is going to end up with all the same issues credit ratings have.