ryan_greenblatt comments on Principles for the AGI Race

ryan_greenblatt 1 Sep 2024 23:38 UTC
16 points
3
Principle 2: Don’t take actions which impose significant risks to others without overwhelming evidence of net benefit

[...]

Significant margin of benefits over costs, accounting for possibility your calculations are incorrect (1.1x benefits over costs doesn’t justify, maybe 10x benefits over costs could justify, if you’re confident you aren’t making 10x errors, maybe ideally you have higher standards)

This seems likely to be crippling for many actors with a 10x margin. My guess is that a 10x margin is too high, though I’m not confident. (Note that it is possible that this policy is crippling and is also a good policy.)

Another way to put this is that most honest and responsible actors with 10x margins won’t ever take actions that impose large harms in the case of AI.

Examples of things which seem crippling:
- AI labs right now don’t robustly secure algorithmic secrets from my understanding based on public knowledge. So, they impose large harms with their ongoing activities to the extent that an actor stealing these secrets is very harmful (as I think is likely). [Low confidence] I think 10x benefit on top of this will be unlikely even for a pretty responsible AI lab without devoting a crippling amount of resources into algorithmic security. If responsible actors followed this policy, they likely wouldn’t exist.
- Suppose that China is developing AI in a strictly more unsafe (with respect to misaligned) relative to a US government project. Suppose that the USG project thinks they would impose a 8% chance of AI takeover with their current plans while they estimate the chinese project would impose a 50% chance of AI takeover (suppose that e.g. the Chinese project recently saw a prior version of their AI attempt to escape, model organisms indicate their training process often results in egregious misalignment and the project is using approximately no precautions). Suppose the safety estimates are basically reasonable, e.g., they are based on third parties without a COI with access to both projects. A 10x risk margin would prevent the US project from proceeding in this case.
(I think this maybe requires some clarification of what we mean by “harm” and “risk”. I assume we mean a deontological notion of harm such that we consider your actions in isolation. For instance, if you shoot a man in the head while 2 other people also shoot him in the head simultaneously, you’re responsible for a high fraction of that harm. I don’t consider harms from failing to prevent something or harms from your actions ending up causing a bad outcome via a long and unpredictable causal chain (e.g. you pass some not-directly-harmful policy that ultimately makes AI takeover more likely though you thought it would help in expectation).)
- Daniel Kokotajlo 4 Sep 2024 19:51 UTC
  17 points
  4
  Parent
  Maybe instead of focusing on a number (10x vs. 1.1x) the focus should be on other factors, like “How large and diverse is the group of non-CoI’d people who thought carefully about this decision?” and “How much is it consensus among that group that this is better for humanity, vs. controversial?”
  
  In the case where e.g. the situation and safety cases have been made public, and e.g. the public is aware that the US AGI project is currently stalled due to not having a solution for deceptive alignment that we know will work, but meanwhile China is proceeding because they just don’t think deceptive alignment is a thing at all, and moreover the academic ML community not just in the USA but around the world has looked at the safety case and the literature and model organisms etc. and generally is like “yeah probably deceptive alignment won’t be an issue so long as we do XY and Z, but we can’t rule it out even then” and the tiny minority that thinks otherwise seems pretty unreasonable, then I’d feel pretty happy with the decision to proceed with AGI capabilities advancements in the USA subject to doing XY and Z. (Though even then I’d also be like: Let’s at least try to come to some sort of deal with China)
  
  Whereas if e.g. the safety case and situation hasn’t been made public, and the only technical alignment experts who’ve thought deeply about the situation and safety case are (a) corporate employees and (b) ~10 picked advisors with security clearances brought in by the government… OR if there’s still tons of controversy with large serious factions saying “XY and Z are not enough; deceptive alignment is a likely outcome even so”… then if we proceed anyway I’d be thinking ‘are we the baddies?’
  - ryan_greenblatt 4 Sep 2024 21:33 UTC
    7 points
    3
    Parent
    Yeah, this all seems reasonable to me for the record, though I think any such proposal of this sort of norms needs to handle the fact that public discourse is sometimes very insane.