habryka comments on Conversation with Eliezer: What do you want the system to do?

habryka 30 Jun 2022 3:23 UTC
17 points
2

I’d say “mainstream opinion” (in either ML broadly, “safety” or “ethics,” AI policy) is generally focused on misuse relative to alignment—even without conditioning on “competitive alignment solution.” I normally disagree with this mainstream opinion, and I didn’t mean to endorse the opinion in virtue of its mainstream-ness, but to identify it as the mainstream opinion. If you don’t like the word “mainstream” or view the characterization as contentious, feel free to ignore it, I think it’s pretty tangential to my post.

Thanks, that clarifies things. I did misunderstand that sentence to refer to something like the “AI Alignment mainstream”, which feels like a confusing abstraction to me, though I feel like I could have figured it out if I had thought a bit harder before commenting.

For the record, my current model is that “AI ethics” or “AI policy” doesn’t really have a consistent model here, so I am not really sure whether I agree with you that this is indeed the opinion of most of the AI ethics or AI policy community. E.g. I can easily imagine both an AI ethics article saying that if we have really powerful AI, the most important thing is not misuse risk, but moral personhood of the AIs, or the “broader societal impact of the AIs”, both of which feel more misalignment shaped, but I really don’t know (my model of AI ethics people think that whether the AI is misaligned has an effect of whether it “deserves” moral personhood).

I do expect the AI policy community to be more focused on misuse, because they have a lot of influence from national security, which sure is generally focused on misuse and “weapons” as an abstraction, but I again don’t really trust my models here. During the cold war a lot of the policy community ended up in a weird virtue signaling arms race that ended up having a strong consensus in favor of a weird flavor of cosmopolitanism, which I really didn’t expect when I first started looking into this, so I don’t really trust my models of what actual consensus will be when it comes to transformative AI (and don’t really trust current local opinions on AI to be good proxies for that).