Could you explain what exactly your organization is intending to do?
This post envisions a world where various kinds of decision-making work differently from how they work at present, and involve various mechanisms that don’t currently exist. But it’s not clear to what extent you’re proposing (1) to try to bring that world into existence, (2) to build the mechanisms that it requires, or (3) to provide services that would be useful in that world.
It also gestures towards some sort of connection between the political reforms you propose and AI alignment, but I don’t really understand what the connection is supposed to be. It seems like you hope (1) to contribute to AI alignment by “solving the human alignment problem” and thus discovering “what human values actually are”, and (2) for your organization to offer “AI alignment certification”. But I don’t understand (1) how even a very smoothly functioning coordination-markets-and-liquid-democracy system would tell us “what human values actually are” in any sense sufficient to be relevant to AI alignment, nor (2) what “AI alignment certification” is supposed to mean or why an organization mostly dedicated to political reform would be either competent to offer it or trusted to do so.
All very good questions. Let me try to answer them in order:
I plan to do 1/2/3 in parallel. I will start small (maybe city of Berkeley, where I live) and increase scope as the technology is proven to work.
1 - human values are what people think they are. To the extent that different humans have incompatible values, the AI alignment problem is strongly unsolvable.
2.A—AI alignment certification would be sort of like a drivers license for AI’s. The AI needs to have a valid license in order to be hooked up to the internet and switched on. Every time the AI does something dishonorable, its license is revoked until mechanistic interpretability researchers can debug the issue.
2.B—I don’t think the organization would be trusted to certify AI’s right away. I think the organization would have to build up a lot of goodwill from doing good works, turn that into capital somehow, and then hire a world-class mechanistic interpretability team from elsewhere. Or the organization could simply contract out the mechanistic interpretability work to other, more experienced labs.
It seems incredibly unlikely to me that your organization is going to make it no longer true that people have incompatible values.
If “AI alignment” is taken to mean “the AI wants exactly the same things that humans want” and hence to imply “all humans want the same things” then, sure, mutually incompatible human values ⇒ no AI alignment. But I don’t think that’s what any reasonable person takes “AI alignment” to mean. I would consider that we’d done a pretty good job of “AI alignment” if, say, the state of the world 20 years after the first superhuman AI was such that for all times between now and then, (1) >= 75% of living humans (would) consider the post-AI state better than the pre-AI state and (2) ⇐ 10% of living humans (would) consider the post-AI state much worse than the pre-AI state. (Or something along those lines.) And I don’t see why anything along these lines requires humans never to have incompatible values.
But never mind that: I still don’t see how your coordination-market system could possibly make it no longer true that humans sometimes have incompatible values.
On (2):
I still don’t see how your proposed political-reform organization would be in any way suited to issuing “AI alignment certification”, if that were a thing. And, since you say “hire a world-class mechanistic interpretability team from elsewhere”, it sounds as if you don’t either. So I don’t understand why any of that stuff is in your post; it seems entirely irrelevant to the organization you’re actually hoping to build.
Well, fair enough I suppose. I was personally excited about the AI alignment piece, and thought that coordination markets would help with that.
Humans have always and will always hold incompatible values. That’s why we feel the need to murder each other with such frequency. But, as Steven Pinker argues, while we murder each other in greater numbers every day, we also do it with less frequency every day. Maybe this will converge to a world in which a superhuman AI knows approximately what is expected of it. Maybe it won’t, I don’t know.
Could you explain what exactly your organization is intending to do?
This post envisions a world where various kinds of decision-making work differently from how they work at present, and involve various mechanisms that don’t currently exist. But it’s not clear to what extent you’re proposing (1) to try to bring that world into existence, (2) to build the mechanisms that it requires, or (3) to provide services that would be useful in that world.
It also gestures towards some sort of connection between the political reforms you propose and AI alignment, but I don’t really understand what the connection is supposed to be. It seems like you hope (1) to contribute to AI alignment by “solving the human alignment problem” and thus discovering “what human values actually are”, and (2) for your organization to offer “AI alignment certification”. But I don’t understand (1) how even a very smoothly functioning coordination-markets-and-liquid-democracy system would tell us “what human values actually are” in any sense sufficient to be relevant to AI alignment, nor (2) what “AI alignment certification” is supposed to mean or why an organization mostly dedicated to political reform would be either competent to offer it or trusted to do so.
All very good questions. Let me try to answer them in order:
I plan to do 1/2/3 in parallel. I will start small (maybe city of Berkeley, where I live) and increase scope as the technology is proven to work.
1 - human values are what people think they are. To the extent that different humans have incompatible values, the AI alignment problem is strongly unsolvable.
2.A—AI alignment certification would be sort of like a drivers license for AI’s. The AI needs to have a valid license in order to be hooked up to the internet and switched on. Every time the AI does something dishonorable, its license is revoked until mechanistic interpretability researchers can debug the issue.
2.B—I don’t think the organization would be trusted to certify AI’s right away. I think the organization would have to build up a lot of goodwill from doing good works, turn that into capital somehow, and then hire a world-class mechanistic interpretability team from elsewhere. Or the organization could simply contract out the mechanistic interpretability work to other, more experienced labs.
There are whole realms of theory about how to reconcile orthogonal values.
On (1):
It seems incredibly unlikely to me that your organization is going to make it no longer true that people have incompatible values.
If “AI alignment” is taken to mean “the AI wants exactly the same things that humans want” and hence to imply “all humans want the same things” then, sure, mutually incompatible human values ⇒ no AI alignment. But I don’t think that’s what any reasonable person takes “AI alignment” to mean. I would consider that we’d done a pretty good job of “AI alignment” if, say, the state of the world 20 years after the first superhuman AI was such that for all times between now and then, (1) >= 75% of living humans (would) consider the post-AI state better than the pre-AI state and (2) ⇐ 10% of living humans (would) consider the post-AI state much worse than the pre-AI state. (Or something along those lines.) And I don’t see why anything along these lines requires humans never to have incompatible values.
But never mind that: I still don’t see how your coordination-market system could possibly make it no longer true that humans sometimes have incompatible values.
On (2):
I still don’t see how your proposed political-reform organization would be in any way suited to issuing “AI alignment certification”, if that were a thing. And, since you say “hire a world-class mechanistic interpretability team from elsewhere”, it sounds as if you don’t either. So I don’t understand why any of that stuff is in your post; it seems entirely irrelevant to the organization you’re actually hoping to build.
Well, fair enough I suppose. I was personally excited about the AI alignment piece, and thought that coordination markets would help with that.
Humans have always and will always hold incompatible values. That’s why we feel the need to murder each other with such frequency. But, as Steven Pinker argues, while we murder each other in greater numbers every day, we also do it with less frequency every day. Maybe this will converge to a world in which a superhuman AI knows approximately what is expected of it. Maybe it won’t, I don’t know.