It seems incredibly unlikely to me that your organization is going to make it no longer true that people have incompatible values.
If “AI alignment” is taken to mean “the AI wants exactly the same things that humans want” and hence to imply “all humans want the same things” then, sure, mutually incompatible human values ⇒ no AI alignment. But I don’t think that’s what any reasonable person takes “AI alignment” to mean. I would consider that we’d done a pretty good job of “AI alignment” if, say, the state of the world 20 years after the first superhuman AI was such that for all times between now and then, (1) >= 75% of living humans (would) consider the post-AI state better than the pre-AI state and (2) ⇐ 10% of living humans (would) consider the post-AI state much worse than the pre-AI state. (Or something along those lines.) And I don’t see why anything along these lines requires humans never to have incompatible values.
But never mind that: I still don’t see how your coordination-market system could possibly make it no longer true that humans sometimes have incompatible values.
On (2):
I still don’t see how your proposed political-reform organization would be in any way suited to issuing “AI alignment certification”, if that were a thing. And, since you say “hire a world-class mechanistic interpretability team from elsewhere”, it sounds as if you don’t either. So I don’t understand why any of that stuff is in your post; it seems entirely irrelevant to the organization you’re actually hoping to build.
Well, fair enough I suppose. I was personally excited about the AI alignment piece, and thought that coordination markets would help with that.
Humans have always and will always hold incompatible values. That’s why we feel the need to murder each other with such frequency. But, as Steven Pinker argues, while we murder each other in greater numbers every day, we also do it with less frequency every day. Maybe this will converge to a world in which a superhuman AI knows approximately what is expected of it. Maybe it won’t, I don’t know.
On (1):
It seems incredibly unlikely to me that your organization is going to make it no longer true that people have incompatible values.
If “AI alignment” is taken to mean “the AI wants exactly the same things that humans want” and hence to imply “all humans want the same things” then, sure, mutually incompatible human values ⇒ no AI alignment. But I don’t think that’s what any reasonable person takes “AI alignment” to mean. I would consider that we’d done a pretty good job of “AI alignment” if, say, the state of the world 20 years after the first superhuman AI was such that for all times between now and then, (1) >= 75% of living humans (would) consider the post-AI state better than the pre-AI state and (2) ⇐ 10% of living humans (would) consider the post-AI state much worse than the pre-AI state. (Or something along those lines.) And I don’t see why anything along these lines requires humans never to have incompatible values.
But never mind that: I still don’t see how your coordination-market system could possibly make it no longer true that humans sometimes have incompatible values.
On (2):
I still don’t see how your proposed political-reform organization would be in any way suited to issuing “AI alignment certification”, if that were a thing. And, since you say “hire a world-class mechanistic interpretability team from elsewhere”, it sounds as if you don’t either. So I don’t understand why any of that stuff is in your post; it seems entirely irrelevant to the organization you’re actually hoping to build.
Well, fair enough I suppose. I was personally excited about the AI alignment piece, and thought that coordination markets would help with that.
Humans have always and will always hold incompatible values. That’s why we feel the need to murder each other with such frequency. But, as Steven Pinker argues, while we murder each other in greater numbers every day, we also do it with less frequency every day. Maybe this will converge to a world in which a superhuman AI knows approximately what is expected of it. Maybe it won’t, I don’t know.