Daniel Kokotajlo comments on Principles for the AGI Race

Daniel Kokotajlo 4 Sep 2024 19:51 UTC
17 points
4
Maybe instead of focusing on a number (10x vs. 1.1x) the focus should be on other factors, like “How large and diverse is the group of non-CoI’d people who thought carefully about this decision?” and “How much is it consensus among that group that this is better for humanity, vs. controversial?”

In the case where e.g. the situation and safety cases have been made public, and e.g. the public is aware that the US AGI project is currently stalled due to not having a solution for deceptive alignment that we know will work, but meanwhile China is proceeding because they just don’t think deceptive alignment is a thing at all, and moreover the academic ML community not just in the USA but around the world has looked at the safety case and the literature and model organisms etc. and generally is like “yeah probably deceptive alignment won’t be an issue so long as we do XY and Z, but we can’t rule it out even then” and the tiny minority that thinks otherwise seems pretty unreasonable, then I’d feel pretty happy with the decision to proceed with AGI capabilities advancements in the USA subject to doing XY and Z. (Though even then I’d also be like: Let’s at least try to come to some sort of deal with China)

Whereas if e.g. the safety case and situation hasn’t been made public, and the only technical alignment experts who’ve thought deeply about the situation and safety case are (a) corporate employees and (b) ~10 picked advisors with security clearances brought in by the government… OR if there’s still tons of controversy with large serious factions saying “XY and Z are not enough; deceptive alignment is a likely outcome even so”… then if we proceed anyway I’d be thinking ‘are we the baddies?’
- ryan_greenblatt 4 Sep 2024 21:33 UTC
  7 points
  3
  Parent
  Yeah, this all seems reasonable to me for the record, though I think any such proposal of this sort of norms needs to handle the fact that public discourse is sometimes very insane.