The fundamental problem is that any effective AI alignment technique is also a censorship technique, and thus you can’t advance AI alignment very much without also allowing people to censor an AI effectively, because a lot of alignment work is aiming to make AIs be censored in particular ways.
I disagree with the use of “any”. In principle, an effective alignment technique could create an AI that isn’t censored, but does have certain values/preferences over the world. You could call that censorship, but that doesn’t seem like the right or common usage. I agree that in practice many/most things currently purporting to be effective alignment techniques fit the word more, though.
I admit this is possible, so I almost certainly am overconfident here (which matters a little), though I believe a lot of common methods that do work for alignment also allow you to censor an AI.
If you take early writing of Eliezer, the idea is AI should be aligned with Coherent Extrapolated Volition. That’s a different goal from aligning AI with the views of credentialed experts or the leadership of AI companies.
“How do you regulate AI companies so that they aren’t enforcing Californian values on the rest of the United States and the world?” is an alignment question. If you have a good answer to that question, it would be easier to convince someone worried about those companies having enforced Californian values via censorship industrial complex doing the same thing with AI to regulate AI companies.
If you ignore the alignment questions that people like David Sachs care about, it’s hard to convince them that you are sincere about the other alignment questions.
A crux here is that I basically don’t think Coherent Extrapolated Volition of humanity type alignment strategies work, and I also think that it is irrelevant that we can’t align an AI to the CEV of humanity.
The fundamental problem is that any effective AI alignment technique is also a censorship technique, and thus you can’t advance AI alignment very much without also allowing people to censor an AI effectively, because a lot of alignment work is aiming to make AIs be censored in particular ways.
I disagree with the use of “any”. In principle, an effective alignment technique could create an AI that isn’t censored, but does have certain values/preferences over the world. You could call that censorship, but that doesn’t seem like the right or common usage. I agree that in practice many/most things currently purporting to be effective alignment techniques fit the word more, though.
I admit this is possible, so I almost certainly am overconfident here (which matters a little), though I believe a lot of common methods that do work for alignment also allow you to censor an AI.
If you take early writing of Eliezer, the idea is AI should be aligned with Coherent Extrapolated Volition. That’s a different goal from aligning AI with the views of credentialed experts or the leadership of AI companies.
“How do you regulate AI companies so that they aren’t enforcing Californian values on the rest of the United States and the world?” is an alignment question. If you have a good answer to that question, it would be easier to convince someone worried about those companies having enforced Californian values via censorship industrial complex doing the same thing with AI to regulate AI companies.
If you ignore the alignment questions that people like David Sachs care about, it’s hard to convince them that you are sincere about the other alignment questions.
A crux here is that I basically don’t think Coherent Extrapolated Volition of humanity type alignment strategies work, and I also think that it is irrelevant that we can’t align an AI to the CEV of humanity.