habryka comments on tlevin’s Shortform

habryka 29 Aug 2024 23:51 UTC
2 points
0
Same reason as knowing how many nukes your opponents has reduces racing. If you are conservative the uncertainty in how far ahead your opponent is causes escalating races, even if you would both rather not escalate (as long as your mean is well-calibrated).
E.g. let’s assume you and your opponent are de-facto equally matched in the capabilities of your system, but both have substantial uncertainty, e.g. assign 30% probability to your opponent being substantially ahead of you. Then if you think those 30% of worlds are really bad, you probably will invest a bunch more into developing your systems (which of course your opponent will observe, increase their own investment, and then you repeat).
However, if you can both verify how many nukes you have, you can reach a more stable equilibrium even under more conservative assumptions.
- tlevin 29 Aug 2024 23:56 UTC
  3 points
  2
  Parent
  Gotcha. A few disanalogies though—the first two specifically relate to the model theft/shared access point, the latter is true even if you had verifiable API access:
  1. Me verifying how many nukes you have doesn’t mean I suddenly have that many nukes, unlike AI model capabilities, though due to compute differences it does not mean we suddenly have the same time distance to superintelligence.
  2. Me having more nukes only weakly enables me to develop more nukes faster, unlike AI that can automate a lot of AI R&D.
  3. This model seems to assume you have an imprecise but accurate estimate of how many nukes I have, but companies will probably be underestimating the proximity of each other to superintelligence, for the same reason that they’re underestimating their own proximity to superintelligence, until it’s way more salient/obvious.
  - habryka 30 Aug 2024 0:11 UTC
    2 points
    0
    Parent
    Me verifying how many nukes you have doesn’t mean I suddenly have that many nukes, unlike AI model capabilities, though due to compute differences it does not mean we suddenly have the same time distance to superintelligence.
    It’s not super clear whether from a racing perspective having an equal number of nukes is bad. I think it’s genuinely messy (and depends quite sensitively on how much actors are scared of losing vs. happy about winning vs. scared of racing).
    I do also currently think that the compute-component will likely be a bigger deal than the algorithmic/weights dimension, making the situation more analogous to nukes, but I do think there is a lot of uncertainty on this dimension.
    Me having more nukes only weakly enables me to develop more nukes faster, unlike AI that can automate a lot of AI R&D.
    Yeah, totally agree that this is an argument against proliferation, and an important one. While you might not end up with additional racing dynamics, the fact that more global resources can now use the cutting edge AI system to do AI R&D is very scary.
    This model seems to assume you have an imprecise but accurate estimate of how many nukes I have, but companies will probably be underestimating the proximity of each other to superintelligence, for the same reason that they’re underestimating their own proximity to superintelligence, until it’s way more salient/obvious.
    In-general I think it’s very hard to predict whether people will overestimate or underestimate things. I agree that literally right now countries are probably underestimating it, but an overreaction in the future also wouldn’t surprise me very much (in the same way that COVID started with an underreaction, and then was followed by a massive overreaction).
    - tlevin 30 Aug 2024 0:27 UTC
      3 points
      2
      Parent
      It’s not super clear whether from a racing perspective having an equal number of nukes is bad. I think it’s genuinely messy (and depends quite sensitively on how much actors are scared of losing vs. happy about winning vs. scared of racing).
      
      Importantly though, once you have several thousand nukes the strategic returns to more nukes drop pretty close to zero, regardless of how many your opponents have, while if you get the scary model’s weights and then don’t use them to push capabilities even more, your opponent maybe gets a huge strategic advantage over you. I think this is probably true, but the important thing is whether the actors think it might be true.
      In-general I think it’s very hard to predict whether people will overestimate or underestimate things. I agree that literally right now countries are probably underestimating it, but an overreaction in the future also wouldn’t surprise me very much (in the same way that COVID started with an underreaction, and then was followed by a massive overreaction).
      Yeah, good point.