Our alignment philosophy is simple: we cannot align AI’s to human values until we know approximately what human values actually are, and we cannot know that until we solve the human alignment problem.
what do you make of coherent extrapolated volition, which is the usual solution for solving alignment without having a full understanding of our values?
what do you mean by “human alignment problem”? here it seems that you mean “understanding the values of humans”, but many people use that term to mean a variety of things (usually they use it to mean “making humans aligned with one another”)
“human alignment” as you put it seems undesirable to me — i want people to get their values satisfied and then conflicts resolved in some reasonable manner, i don’t want to change people’s values so they’re easier to satisfy-all-at-once. changing other people’s values is very rude and, almost always, a violation of their current values.
any idea how you’d envision “making people love their neighbor as themselves” ? sounds like modifying everyone on earth like that would be much more difficult than, say, changing the mind of the people who would make the AIs that are gonna kill everyone.
I agree with this. It’s super undesirable. On the other hand, so are wars and famines and what have you. Tradeoffs exist.
Think of it like the financial system. Some people are going for a high score in the money economy, and that powers both good and bad things. If we built coordination markets, some people would hyperfixate on them in a very unhealthy way, become fabulously wealthy in reputation terms, and then be exposed as child molestors or what have you. Again, tradeoffs exist.
oh, so this is a temporary before-AI-inevitably-either-kills-everyone-or-solves-everything thing, not a plan for making the AI-that-solves-everything-including-X-risk?
It’s an adjunct to the AI that solves everything, maybe? It can coexist with everything else in human society, and I would argue that it will improve those things along all the axes that any of have a right to care about.
And like, the only way you can get people to stop building the AI that’s gonna kill everyone is some sort of massive labor strike against the companies building that stuff. Another enormous coordination problem—it’s not in any one capabilities researcher’s self-interest to stop the train, but if the train doesn’t slow down, then we all die.
what do you make of coherent extrapolated volition, which is the usual solution for solving alignment without having a full understanding of our values?
what do you mean by “human alignment problem”? here it seems that you mean “understanding the values of humans”, but many people use that term to mean a variety of things (usually they use it to mean “making humans aligned with one another”)
I think CEV is approximately the right framework. The real correct framework would be something like PAC CEV.
I’m using human alignment problem to mean making people love their neighbor as themselves. Again, PAC is the best you’re ever going to get.
“human alignment” as you put it seems undesirable to me — i want people to get their values satisfied and then conflicts resolved in some reasonable manner, i don’t want to change people’s values so they’re easier to satisfy-all-at-once. changing other people’s values is very rude and, almost always, a violation of their current values.
any idea how you’d envision “making people love their neighbor as themselves” ? sounds like modifying everyone on earth like that would be much more difficult than, say, changing the mind of the people who would make the AIs that are gonna kill everyone.
I agree with this. It’s super undesirable. On the other hand, so are wars and famines and what have you. Tradeoffs exist.
Think of it like the financial system. Some people are going for a high score in the money economy, and that powers both good and bad things. If we built coordination markets, some people would hyperfixate on them in a very unhealthy way, become fabulously wealthy in reputation terms, and then be exposed as child molestors or what have you. Again, tradeoffs exist.
oh, so this is a temporary before-AI-inevitably-either-kills-everyone-or-solves-everything thing, not a plan for making the AI-that-solves-everything-including-X-risk?
It’s an adjunct to the AI that solves everything, maybe? It can coexist with everything else in human society, and I would argue that it will improve those things along all the axes that any of have a right to care about.
And like, the only way you can get people to stop building the AI that’s gonna kill everyone is some sort of massive labor strike against the companies building that stuff. Another enormous coordination problem—it’s not in any one capabilities researcher’s self-interest to stop the train, but if the train doesn’t slow down, then we all die.