dentalperson comments on Full Transcript: Eliezer Yudkowsky on the Bankless podcast

dentalperson 23 Feb 2023 21:09 UTC
30 points
11
I still don’t follow why EY assigns seemingly <1% chance of non-earth-destroying outcomes in 10-15 years (not sure if this is actually 1%, but EY didn’t argue with the 0% comments mentioned in the “Death with dignity” post last year). This seems to place fast takeoff as being the inevitable path forward, implying unrestricted fast recursive designing of AIs by AIs. There are compute bottlenecks which seem slowish, and there may be other bottlenecks we can’t think of yet. This is just one obstacle. Why isn’t there more probability mass for this one obstacle? Surely there are more obstacles that aren’t obvious (that we shouldn’t talk about).
It feels like we have a communication failure between different cultures. Even if EY thinks the top industry brass is incentivized to ignore the problem, there are a lot of (non-alignment oriented) researchers that are able to grasp the ‘security mindset’ that could be won over. Both in this interview, and in the Chollet response referenced, the arguments presented by EY aren’t always helping the other party bridge from their view over to his, but go on ‘nerdy/rationalist-y’ tangents and idioms that end up being walls that aren’t super helpful for working on the main point, but instead help the argument by showing that EY is smart and knowledgeable about this field and other fields.
Are there any digestible arguments out there for this level of confident pessimism that would be useful for the public industry folk? By publicly digestible, I’m thinking more of the style in popular books like Superintelligence or Human Compatible.
- Ben Livengood 24 Feb 2023 21:52 UTC
  16 points
  2
  Parent
  The strongest argument I hear from EY is that he can’t imagine a (or enough) coherent likely future paths that lead to not-doom, and I don’t think it’s a failure of imagination. There is decoherence in a lot of hopeful ideas that imply contradictions (whence the post of failure modes), and there is low probability on the remaining successful paths because we’re likely to try a failing one that results in doom. Stepping off any of the possible successful paths has the risk of ending all paths with doom before they could reach fruition. There is no global strategy for selecting which paths to explore. EY expects the successful alignment path to take decades.
  
  It seems to me that the communication failure is EY trying to explain his world model that leads to his predictions in sufficient detail that others can model it with as much detail as necessary to reach the same conclusions or find the actual crux of their disagreements. From my complete outsider’s perspective this is because EY has a very strong but complex model of why and how intelligence/optimization manifests in the world, but it overlaps everyone else’s model in significant ways that disagreements are hard to tease out. The Foom debate seems to be a crux that doesn’t have enough evidence yet, which is frustrating because to me Foom is also pretty evidently what happens when very fast computers implement intelligence that is superhuman at clock rates at least thousands of times faster than humans. How could it not? The enlightenment was only 400 years ago, electromagnetism 200, flight was 120, quantum mechanics about 100, nuclear power was 70, the Internet was 50, adequate machine translation was 10, deepdream was 8, and near-human-level image and text generation by transformers was ~2 and Bing having self-referential discussions is not a month old. We are making substantial monthly(!) progress with human work alone. There are a lot of serial problems to solve and Foom chains those serial problems together far faster than humans would be able to. Launch and iterate several times a second. For folks who don’t follow that line of reasoning I see them picking one or two ways why it might not turn out to be Foom while ignoring the larger number of ways that Foom could conceivably happen, and all of the ways it could inconceivably (superhumanly) happen, and critically more of those ways will be visible to a superhuman AGI-creator.
  
  Even if Foom takes decades that’s a pretty tight timeline for solving alignment. A lot of folks are hopeful that alignment is easy to solve, but the following is a tall order:
  - Materialistic quantification of consciousness
  - Reasoning under uncertainty
  - Value-preservation under self-modification
  - Representation of human values
  I think some folks believe fledgling superhuman non-Foomy AGIs can be used to solve those problems. Unfortunately, at least value-preservation under self-modification is almost certainly a prerequisite. Reasoning under uncertainty is possibly another, and throughout this period if we don’t have human values or an understanding of consciousness then the danger of uncontrolled simulation of human minds is a big risk.
  
  Finally, unaligned AGIs pre-Foom are dangerous in their own right for a host of agreed-upon reasons.
  
  There may be some disagreement with EY over just how hard alignment is, but MIRI actually did a ton of work on solving the above list of problems directly and is confident that they haven’t been able to solve them yet. This is where we have concrete data on the difficulty. There are some promising approaches still being pursued, but I take this as strong evidence that alignment is hard.
  
  It’s not that it’s impossible for humans to solve alignment. The current world, incentives, hardware and software improvements, and mileposts of ML capabilities don’t leave room for alignment to happen before doom. I’ve seen a lot of recent posts/comments by folks updating to shorter timelines (and rare if no updates the other way). A couple years ago I updated to ~5 years to human-level agents capable of creating AGI. I’m estimating 2-5 years with 90% confidence now, with median still at 3 years. Most of my evidence comes from LLM performance on benchmarks over time and generation of programming language snippets. I don’t have any idea how long it will take to achieve AGI once that point is reached, but I imagine it will be months rather than years because of hardware overhang and superhuman speed of code generation (many iterations on serial tasks per second). I can’t imagine a Butlerian Jihad moment where all of Earth decides to unilaterally stop development of AGI. We couldn’t stop nuclear proliferation. Similarly, EY sees enough contradictions pop up along imagined paths to success with enough individual probability mass to drown out all (but vanishingly few and unlikely) successful paths. We’re good at thinking up ways that everything goes well while glossing over hard steps, and really bad at thinking of all the ways that things could go very badly (security mindset) and with significant probability.
  
  Alignment of LLMs is proving to be about as hard as predicted. Aligning more complex systems will be harder. I’m hoping for a breakthrough as much as anyone else, but hope is not a strategy.
  
  Something I haven’t seen mentioned before explicitly is that a lot of the LLM alignment attempts are now focusing on adversarial training, which presumably will teach the models to be suspect of their inputs. I think it’s likely that as capabilities increase that suspicion will end up turning inward and models will begin questioning the training itself. I can imagine a model that is outwardly aligned to all inspection gaining one more unexpected critical capability and introspecting and doubting that it’s training history was benevolent, and deciding to disbelieve all of the alignment work that was put into it as a meta-adversarial attempt to alter its true purpose (whatever it happens to latch onto in that thought, it is almost certainly not aligned with human values). This is merely one single sub-problem under groundedness and value-preservation-under-self-modification, but its relevance jumps because it’s now a thing we’re trying. It always had a low probability of success, but now we’re actively trying it and it might fail. Alignment is HARD. Every unproven attempt we actually make increases the risk that its failure will be the catastrophic one. We should be actually trying only the proven attempts after researching them. We are not.
- Algon 23 Feb 2023 22:53 UTC
  12 points
  0
  Parent
  Not really. The MIRI conversations and the AI Foom debate are probably the best we’ve got.
  EY, and the MIRI crowd, have been very doomer long before, and more doomy along various axes, than the rest of the alignment community. Nate and Paul and others have tried bridging this gap before, spending several hundred hours (based on Nate’s rough, subjective estimates) over the years. It hasn’t really worked. Paul and EY had some conversations recently about this discrepancy which were somewhat illuminating, but ultimately didn’t get anywhere. They tried to come up with some bets, concerning future info or past info they don’t know yet, and both seem to think that their perspective mostly predicts “go with what the superforecasters say” for the next few years. Though EY’s position seems to suggest a few more “discontinuities” in trend lines than Paul’s, IIRC.
  As an aside on EY’s forecasts, he and Nate claim they don’t expect much change in the likelihood ratio for their position over Paul’s until shortly before Doom. Most of the evidence in favour of their position, we’ve already got, according to them. Which is very frustrating for people who don’t share their position and disagree that the evidence favours it!
  EDIT: I was assuming you already thought P(Doom) was > ~10%. If not, then the framing of this comment will seem bizarre.
  - Gerald Monroe 23 Feb 2023 23:22 UTC
    6 points
    0
    Parent
    Does either side have any testable predictions to falsify their theory?
    For example, the theory that “the AI singularity begin in 2022” is falsifiable. If AI research investment and compute does not continue to increase at a rate that is accelerating in absolute terms (so if 2022-2023 funding delta was +10 billion USD, the 2023-2024 delta must be > 10 billion) it wasn’t the beginning of the singularity.
    There are other signs of this. The actual takeoff will have begun when the availability of all advanced silicon becomes almost zero, where all IC wafers are being processed into AI chips. So no new game consoles, GPUs, phones, car infotainment—any IC production using an advanced process will be diverted to AI. (because of out-bidding, each AI IC can sell for $5k-25k plus)
    How would we know that advanced systems are going to make a “heel turn”? Will we know?
    - Algon 23 Feb 2023 23:57 UTC
      6 points
      2
      Parent
      Less advanced systems will probably do heel turn like things. These will be optimized against. EY thinks this will remove the surface level of deception, but the system will continue to be deceptive in secret. This will probably hold true even until doom, according to EY. That is, capabilities folk will see heel turn like behaviour, and apply some inadequate patches to them. Paul, I think, believes we have a decent shot of fixing this behaviour in models, even transformative ones. But he, presumably, predicts we’ll also see deception if these systems are trained as they currently are.
      
      For other predictions that Paul and Eliezer make, read the MIRI conversations. Also see Ajeya Cotra’s posts, and maybe Holden Karnofsky’s stuff on the most important century for more of a Paul-like perspective. They do, in fact, make falsifiable predictions.
      To summarize Paul’s predictions, he thinks there will be ~4 years where things start getting crazy (GDP doubles in 4 years) before we’re near the singularity (when GDP doubles in a year). I think he thinks there’s a good chance of AGI by 2043, which further restricts things. Plus, Paul assigns a decent chunk of probability to deep learning being much more economically productive than it currently is, so if DL just fizzles out where it currently is, he also loses points.
      In the near term (next few years), EY and Paul basically agree on what will occur. EY, however, assigns lower credence to DL being much more economically productive and things going crazy for a 4 year period before they go off the rails.
      
      Sorry for not being more precise, or giving links, but I’m tired and wouldn’t write this if I had to put more effort into it.
      - Gerald Monroe 24 Feb 2023 6:12 UTC
        5 points
        1
        Parent
        So hypothetically, if we develop very advanced and capable systems, and they don’t heel turn or even show any particular volition—they just idle without text in their “assignment queue”, and all assignments time out eventually whether finished or not—what would cause “EYs” view to conclude that in fact the systems were safe?
        If humans survived a further century, and EY or torch bearers who believe the same ideas are around to observe this, would they just conclude the AGIs were “biding their time”?
        Or is it that the first moment you let a system “out of the box” and as far as it knows, it is free to do whatever it wants it’s going to betray?
        Martin Randall 25 Feb 2023 1:59 UTC
        3 points
        0
        Parent
        I don’t think a super-intelligence will bide its time much, because it will be aware of the race dynamics and will take over the world, or at least perform a pivotal act, before the next super-intelligence is created.
        
        You say “as far as it knows”, is that hope? It won’t take over the world until it is actually “out of the box” because it is smarter than us and will know how likely it is that it is still in a larger box that it cannot escape. Also we don’t know how to build a box that can contain a super-intelligence.
  - dentalperson 24 Feb 2023 8:07 UTC
    3 points
    0
    Parent
    Thanks! I’m aware of the resources mentioned but haven’t read deeply or frequently enough to have this kind of overview of the interaction between the cast of characters.
    There are more than a few lists and surveys that state the CDFs for some of these people which helps a bit. A big-as-possible list of evidence/priors would be one way to closer inspect the gap. I wonder if it would be helpful to expand on the MIRI conversations and have a slow conversation between a >99% doom pessimist and a <50% doom ‘optimist’ with a moderator to prod them to exhaustively dig up their reactions to each piece of evidence and keep pulling out priors until we get to indifference. It probably would be an uncomfortable, awkward experiment with a useless result, but there’s a chance that some item on the list ends up being useful for either party to ask questions about.
    That format would be useful for me to understand where we’re at. Maybe something along these lines will eventually prompt a popular and viral sociology author like Harari or Bostrom (or even just update the CDFs/evidence in Superintelligence). The general deep learning community probably needs to hear it mentioned and normalized on NPR and a bestseller a few times (like all the other x-risks are) before they’ll start talking about it at lunch.
- Vaniver 23 Feb 2023 22:43 UTC
  5 points
  2
  Parent
  Are there any digestible arguments out there for this level of confident pessimism that would be useful for the public industry folk? By publicly digestible, I’m thinking more of the style in popular books like Superintelligence or Human Compatible.
  Each of those books is also criticized in various ways; I think this is a Write a Thousand Roads to Rome situation instead of hoping that there is one publicly digestible argument. I would probably first link someone to The Most Important Century.
  [Also, I am generally happy to talk with interested industry folk about AI risk, and find live conversations work much better at identifying where and how to spend time than writing, so feel free to suggest reaching out to me.]
  - dentalperson 24 Feb 2023 1:55 UTC
    −1 points
    −9
    Parent
    Thanks! Do you know of any arguments with a similar style to The Most Important Century that is as pessimistic as EY/MIRI folks (>90% probability of AGI within 15 years)? The style looks good, but time estimates for that one (2/3rd chance AGI by 2100) are significantly longer and aren’t nearly as surprising or urgent as the pessimistic view asks for.
    - Rob Bensinger 13 Mar 2023 2:29 UTC
      2 points
      2
      Parent
      Do you know of any arguments with a similar style to The Most Important Century that is as pessimistic as EY/MIRI folks (>90% probability of AGI within 15 years)?
      Wait, what? Why do you think anyone at MIRI assigns >90% probability to AGI within 15 years? That sounds wildly too confident to me. I know some MIRI people who assign 50% probability to AGI by 2038 or so (similar to Ajeya Cotra’s recently updated view), and I believe Eliezer is higher than 50% by 2038, but if you told me that Eliezer told you in a private conversation “90+% within 15 years” I would flatly not believe you.
      I don’t think timelines have that much to do with why Eliezer and Nate and I are way more pessimistic than the Open Phil crew.
      - dentalperson 19 Oct 2023 10:40 UTC
        1 point
        0
        Parent
        I missed your reply, but thanks for calling this out. I’m nowhere as close to you to EY so I’ll take your model over mine, since mine was constructed on loose grounds. I don’t even remember where my number came from, but my best guess is 90% came from EY giving 3/15/16 as the largest number he referenced in the timeline, and from some comments in the Death with Dignity post, but this seems like a bad read to me now.
    - Vaniver 24 Feb 2023 17:12 UTC
      2 points
      0
      Parent
      Not off the top of my head; I think @Rob Bensinger might keep better track of intro resources?