Rob Bensinger comments on Response to Aschenbrenner’s “Situational Awareness”

Rob Bensinger 7 Jun 2024 22:44 UTC
5 points
0
Leopold’s scenario requires that the USG come to deeply understand all the perils and details of AGI and ASI (since they otherwise don’t have a hope of building and aligning a superintelligence), but then needs to choose to gamble its hegemony, its very existence, and the lives of all its citizens on a half-baked mad science initiative, when it could simply work with its allies to block the tech’s development and maintain the status quo at minimal risk.
Success in this scenario requires a weird combination of USG prescience with self-destructiveness: enough foresight to see what’s coming, but paired with a weird compulsion to race to build the very thing that puts its existence at risk, when it would potentially be vastly easier to spearhead an international alliance to prohibit this technology.
- O O 8 Jun 2024 8:22 UTC
  14 points
  5
  Parent
  Indeed, forecasters have been surprised by how slowly safety/robustness/etc. have progressed in recent years
  Interesting, do you have a link to these safety predictions? I was not aware of this.
  - Mo Putera 8 Jun 2024 11:55 UTC
    6 points
    0
    Parent
    I’m guessing Rob is referring to footnote 54 in What do XPT forecasts tell us about AI risk?:
    And while capabilities have been increasing very rapidly, research into AI safety, does not seem to be keeping pace, even if it has perhaps sped-up in the last two years. An isolated, but illustrative, data point of this can be seen in the results of the 2022 section of a Hypermind forecasting tournament: on most benchmarks, forecasters underpredicted progress, but they overpredicted progress on the single benchmark somewhat related to AI safety.
    That last link is to Jacob Steinhardt’s tweet linking to his 2022 post AI Forecasting: One Year In, on the results of their 2021 forecasting contest. Quote:
    Progress on a robustness benchmark was slower than expected, and was the only benchmark to fall short of forecaster predictions. This is somewhat worrying, as it suggests that machine learning capabilities are progressing quickly, while safety properties are progressing slowly. …
    As a reminder, the four benchmarks were:
    MATH, a mathematics problem-solving dataset;
    MMLU, a test of specialized subject knowledge using high school, college, and professional multiple choice exams;
    Something Something v2, a video recognition dataset; and
    CIFAR-10 robust accuracy, a measure of adversarially robust vision performance.
    ...
    Here are the actual results, as of today:
    MATH: 50.3% (vs. 12.7% predicted)
    MMLU: 67.5% (vs. 57.1% predicted)
    Adversarial CIFAR-10: 66.6% (vs. 70.4% predicted)
    Something Something v2: 75.3% (vs. 73.0% predicted)
    That’s all I got, no other predictions.
    - Rob Bensinger 8 Jun 2024 18:52 UTC
      2 points
      0
      Parent
      Yep, I had in mind AI Forecasting: One Year In.
- Akash 7 Jun 2024 23:07 UTC
  12 points
  7
  Parent
  when it would potentially be vastly easier to spearhead an international alliance to prohibit this technology.
  I would be interested in reading more about the methods that could be used to prohibit the proliferation of this technology (you can assume a “wake-up” from the USG).
  I think one of the biggest fears would be that any sort of international alliance would not have perfect/robust detection capabilities, so you’re always risking the fact that someone might be running a rogue AGI project.
  Also, separately, there’s the issue of “at some point, doesn’t it become so trivially easy to develop AGI that we still need the International Community Good Guys to develop AGI [or do something else] that gets us out of the acute risk period?” When you say “prohibit this technology”, do you mean “prohibit this technology from being developed outside of the International Community Good Guys Cluster” or do you mean “prohibit this technology in its entirety?”
- O O 8 Jun 2024 8:28 UTC
  1 point
  −1
  Parent
  Leopold’s scenario requires that the USG come to deeply understand all the perils and details of AGI and ASI (since they otherwise don’t have a hope of building and aligning a superintelligence), but then needs to choose to gamble its hegemony, its very existence
  Alternatively, they either don’t buy the perils or believes there’s a chance the other chance may not? I think there is an assumption made in this statement and a lot of proposed strategies in this thread. If not everyone is being cooperative and doesn’t buy the high p(doom) arguments then this all falls apart. Nuclear war essentially has a localized p(doom) of 1, yet both superpowers still built them. I am highly skeptical of any potential solution to any of this. It requires everyone (and not just say half) to buy the arguments to begin with.
  - Rob Bensinger 8 Jun 2024 22:22 UTC
    6 points
    0
    Parent
    Alternatively, they either don’t buy the perils or believes there’s a chance the other chance may not?
    If they “don’t buy the perils”, and the perils are real, then Leopold’s scenario is falsified and we shouldn’t be pushing for the USG to build ASI.
    If there are no perils at all, then sure, Leopold’s scenario and mine are both false. I didn’t mean to imply that our two views are the only options.
    Separately, Leopold’s model of “what are the dangers?” is different from mine. But I don’t think the dangers Leopold is worried about are dramatically easier to understand than the dangers I’m worried about (in the respective worlds where our worries are correct). Just the opposite: the level of understanding you need to literally solve alignment for superintelligences vastly exceeds the level you need to just be spooked by ASI and not want it to be built. Which is the point I was making; not “ASI is axiomatically dangerous”, but “this doesn’t count as a strike against my plan relative to Leopold’s, and in fact Leopold is making a far bigger ask of government than I am on this front”.
    Nuclear war essentially has a localized p(doom) of 1
    I don’t know what this means. If you’re saying “nuclear weapons kill the people they hit”, I don’t see the relevance; guns also kill the people they hit, hut that doesn’t make a gun strategically similar to a smarter-than-human AI system.
    - O O 9 Jun 2024 6:11 UTC
      2 points
      −1
      Parent
      I don’t know what this means. If you’re saying “nuclear weapons kill the people they hit”, I don’t see the relevance; guns also kill the people they hit, hut that doesn’t make a gun strategically similar to a smarter-than-human AI system.
      It is well known nuclear weapons result in MAD, or localized annihilation. It was still built. But my more important point is this sort of thinking requires most to be convinced there is a high p(doom) and more importantly, also convinced that the other side believes that there is a high p(doom). If either of those are false, then not building doesn’t work. If the other side is building it, then you have to build it anyways just in case your theoretical p(doom) arguments are wrong. Again this is just arguing your way around a pretty basic prisoner’s dilemma.
      And think about the fact that we will develop AGIs (note not ASI) anyways and alignment (or at least control) will almost certainly work for them.^[1] Prisoner’s dilemma indicates you have to match the drone warfare capabilities of the other side regardless of p(doom).
      In the world where the USG understands there are risks but thinks of it closer to something with decent odds of being solvable, we build it anyways. The gameboard is 20% of dying, 80% of handing the light cone to your enemy if the other side builds it and you do not. I think this is the most probable option, making all Pause efforts doomed. High p(doom) folks can’t even convince low p(doom) folks in Lesswrong, the subset of optimists most likely to be receptive to their arguments, that they are wrong. There is no chance you won’t simply be a faction in the USG like environmentalists are.
      But let’s pretend for a moment that the USG buys the high risk doomer argument for superintelligence. The USG and CCP are both rushing to build AGIs regardless, since AGI can be controlled and not having a drone swarm means you lose military relevance. Because of how fuzzy the line between ASI and AGI in this world will be, I think it’s very plausible enough people will be convinced the CCP isn’t convinced alignment is too hard and will build it anyways.
      Even people with high p(doom)’s might have a nagging part of their mind saying that what if alignment just works. If alignment just works (again this is impossible to disprove since if we could prove /disprove it we wouldn’t need to consider pausing to begin with, it would be self-evident), then great you just handed your entire nation’s future to the enemy.
      We have some time to solve alignment, but a long term pause will be downright impossible. What we need to do is tackle the technical problem asap instead of trying to pause. The race conditions are set, the prisoner’s dilemma is locked in.
      ^
      I think they will certainly work. We have a long history of controlling humans and forcing them to do things that they don’t want to do. Practically every argument about p(doom) relies on the AI being smarter than us. If it’s not, then it’s just an insanely useful tool. All the solutions that sound “dumb” with ASI, like having an off switch, air gapping, etc. work with weak enough but still useful systems.