faul_sname comments on On “first critical tries” in AI alignment

faul_sname 27 Jul 2024 5:29 UTC
LW: 10 AF: 4
4
AF
Does any specific human or group of humans currently have “control” in the sense of “that which is lost in a loss-of-control scenario”? If not, that indicates to me that it may be useful to frame the risk as “failure to gain control”.
- Joe Collman 29 Jul 2024 9:54 UTC
  LW: 6 AF: 3
  0
  AF Parent
  It may be better to think about it that way, yes—in some cases, at least.
  Probably it makes sense to throw in some more variables.
  Something like:
  - To stand x chance of property p applying to system s, we’d need to apply resources r.
  In these terms, [loss of control] is something like [ensuring important properties becomes much more expensive (or impossible)].
  - faul_sname 29 Jul 2024 10:22 UTC
    5 points
    2
    Parent
    I think the most important part of your “To stand x chance of property p applying to system s, we’d need to apply resources r” model is the word “we”.
    
    Currently, there exists no “we” in the world that can ensure that nobody in the world does some form of research, or at least no “we” that can do that in a non-cataclysmic way. The International Atomic Energy Agency comes the closest of any group I’m aware of, but the scope is limited and also it does its thing mainly by controlling access to specific physical resources rather than by trying to prevent a bunch of people from doing a thing with resources they already possess.
    
    If “gain a DSA (or cause some trusted other group to gain a DSA) over everyone who could plausibly gain a DSA in the future” is a required part of your threat mitigation strategy, I am not optimistic about the chances for success but I’m even less optimistic about the chances of that working if you don’t realize that’s the game you’re trying to play.
    - Joe Collman 29 Jul 2024 10:53 UTC
      6 points
      0
      Parent
      I don’t think [gain a DSA] is the central path here.
      It’s much closer to [persuade some broad group that already has a lot of power collectively].
      I.e. the likely mechanism is not: [add the property [has DSA] to [group that will do the right thing]].
      But closer to: [add the property [will do the right thing] to [group that has DSA]].