“human alignment” as you put it seems undesirable to me — i want people to get their values satisfied and then conflicts resolved in some reasonable manner, i don’t want to change people’s values so they’re easier to satisfy-all-at-once. changing other people’s values is very rude and, almost always, a violation of their current values.
any idea how you’d envision “making people love their neighbor as themselves” ? sounds like modifying everyone on earth like that would be much more difficult than, say, changing the mind of the people who would make the AIs that are gonna kill everyone.
I agree with this. It’s super undesirable. On the other hand, so are wars and famines and what have you. Tradeoffs exist.
Think of it like the financial system. Some people are going for a high score in the money economy, and that powers both good and bad things. If we built coordination markets, some people would hyperfixate on them in a very unhealthy way, become fabulously wealthy in reputation terms, and then be exposed as child molestors or what have you. Again, tradeoffs exist.
oh, so this is a temporary before-AI-inevitably-either-kills-everyone-or-solves-everything thing, not a plan for making the AI-that-solves-everything-including-X-risk?
It’s an adjunct to the AI that solves everything, maybe? It can coexist with everything else in human society, and I would argue that it will improve those things along all the axes that any of have a right to care about.
And like, the only way you can get people to stop building the AI that’s gonna kill everyone is some sort of massive labor strike against the companies building that stuff. Another enormous coordination problem—it’s not in any one capabilities researcher’s self-interest to stop the train, but if the train doesn’t slow down, then we all die.
I think CEV is approximately the right framework. The real correct framework would be something like PAC CEV.
I’m using human alignment problem to mean making people love their neighbor as themselves. Again, PAC is the best you’re ever going to get.
“human alignment” as you put it seems undesirable to me — i want people to get their values satisfied and then conflicts resolved in some reasonable manner, i don’t want to change people’s values so they’re easier to satisfy-all-at-once. changing other people’s values is very rude and, almost always, a violation of their current values.
any idea how you’d envision “making people love their neighbor as themselves” ? sounds like modifying everyone on earth like that would be much more difficult than, say, changing the mind of the people who would make the AIs that are gonna kill everyone.
I agree with this. It’s super undesirable. On the other hand, so are wars and famines and what have you. Tradeoffs exist.
Think of it like the financial system. Some people are going for a high score in the money economy, and that powers both good and bad things. If we built coordination markets, some people would hyperfixate on them in a very unhealthy way, become fabulously wealthy in reputation terms, and then be exposed as child molestors or what have you. Again, tradeoffs exist.
oh, so this is a temporary before-AI-inevitably-either-kills-everyone-or-solves-everything thing, not a plan for making the AI-that-solves-everything-including-X-risk?
It’s an adjunct to the AI that solves everything, maybe? It can coexist with everything else in human society, and I would argue that it will improve those things along all the axes that any of have a right to care about.
And like, the only way you can get people to stop building the AI that’s gonna kill everyone is some sort of massive labor strike against the companies building that stuff. Another enormous coordination problem—it’s not in any one capabilities researcher’s self-interest to stop the train, but if the train doesn’t slow down, then we all die.