Alexander Gietelink Oldenziel comments on Alexander Gietelink Oldenziel’s Shortform

Alexander Gietelink Oldenziel 4 Oct 2024 13:34 UTC
15 points
0
Entropy and AI Forecasting
Until relatively recently (2018-2019?) I did not seriously entertain the possibility that AGI in our lifetime was possible. This was a mistake, an epistemic error. A rational observer calmly and objectively considering the evidence for AI progress over the prior decades—especially in the light of rapid progress in deep learning—should have come to the reasonable position that AGI within 50 years was a serious possibility (>10%).
AGI plausibly arriving in our lifetime was a reasonable position. Yet this possibility was almost universally ridiculed or ignored or by academics and domain experts. One can find quite funny interview with AI experts on Lesswrong from 15 years ago. The only AI expert agreeing with the Yudkowskian view of AI in our lifetime was Jurgen Schmidthuber. The other dozen AI experts denied it as unknowable or even denied the hypothetical possibility of AGI.
Yudkowsky earns a ton of Bayes points for anticipating the likely arrival of AGI in our lifetime long before the deep learning took off.
**************************
We are currently experiencing a rapid AI takeoff, plausibly culminating in superintelligence by the end of this decade. I know of only two people who anticipated something like what we are seeing far ahead of time; Hans Moravec and ~~Jan Leike~~ Shane Legg*. Both forecast fairly precise dates decades before it happened—and the reasons why they thought it would happen are basically the reasons it did (i.e. Moravec very early on realized the primacy of compute). Moreover, they didn’t forecast a whole lot of things that didn’t happen (like Kurzweil).
Did I make an epistemic error by not believing them earlier? Well for starters I wasn’t really plugged in to the AI scene so I hadn’t heard of them or their views. But suppose I did; should I have beieved them? I’d argue I shouldn’t give their view back then more a little bit of credence.
Entropy is a mysterious physics word for irreducible uncertainty; the uncertainty that remains about the future even after accounting for all the data. In hindsight, we can say that massive GPU training on next-token prediction of all internet text data was (almost**) all you need for AGI. But was this forecasteable?
For every Moravec and ~~Leike~~ Legg who turns out to be extraordinairly right in forecasting the future there re dozens that weren’t. Even in 2018 when the first evidence for strong scaling laws on text-data was being published by Baidu I’d argue that an impartial observer should have only updated a moderate amount. Actually even OpenAI itself wasn’t sold on unsupervised learning on textdata until early gpt showed signs of life—they thought (like many other players in the field, e.g. DeepMind) that RL (in diverse environments) was the way to go.
To me the takeaway is that explicit forecasting can be useful but it is exactly the blackswan events that are irreducibly uncertain (high entropy) that move history.
*the story is that ~~Leike~~ Legg’s timelines have been 2030 for the past two decades.
** regular readers will know my beef with the pure scaling hypothesis.
- interstice 4 Oct 2024 18:17 UTC
  11 points
  7
  Parent
  
  I know of only two people who anticipated something like what we are seeing far ahead of time; Hans Moravec and Jan Leike
  
  I didn’t know about Jan’s AI timelines. Shane Legg also had some decently early predictions of AI around 2030(~2007 was the earliest I knew about)
  - Mark Xu 4 Oct 2024 21:03 UTC
    16 points
    12
    Parent
    shane legg had 2028 median back in 2008, see e.g. https://e-discoveryteam.com/2023/11/17/shane-leggs-vision-agi-is-likely-by-2028-as-soon-as-we-overcome-ais-senior-moments/
    - interstice 5 Oct 2024 4:25 UTC
      2 points
      0
      Parent
      That’s probably the one I was thinking of.
  - Alexander Gietelink Oldenziel 5 Oct 2024 8:13 UTC
    6 points
    0
    Parent
    Oh no uh-oh I think I might have confused Shane Legg with Jan Leike
- Bogdan Ionut Cirstea 5 Oct 2024 10:05 UTC
  2 points
  0
  Parent
  Fwiw, in 2016 I would have put something like 20% probability on what became known as ‘the scaling hypothesis’. I still had past-2035 median timelines, though.
  - Alexander Gietelink Oldenziel 5 Oct 2024 10:58 UTC
    2 points
    0
    Parent
    What did you mean exactly in 2016 by the scaling hypothesis ?
    
    Having past 2035 timelines and believing in the pure scaling maximalist hypothesis (which fwiw i don’t believe in for reasons i have explained elsewhere) are in direct conflict so id be curious if you could more exactly detail your beliefs back then.
    - Bogdan Ionut Cirstea 5 Oct 2024 11:21 UTC
      4 points
      0
      Parent
      What did you mean exactly in 2016 by the scaling hypothesis ?
      Something like ‘we could have AGI just by scaling up deep learning / deep RL, without any need for major algorithmic breakthroughs’.
      Having past 2035 timelines and believing in the pure scaling maximalist hypothesis (which fwiw i don’t believe in for reasons i have explained elsewhere) are in direct conflict so id be curious if you could more exactly detail your beliefs back then.
      I’m not sure this is strictly true, though I agree with the ‘vibe’. I think there were probably a couple of things in play:
      I still only had something like 20% on scaling, and I expected much more compute would likely be needed, especially in that scenario, but also more broadly (e.g. maybe something like the median in ‘bioanchors’ − 35 OOMs of pretraining-equivalent compute, if I don’t misremember; though I definitely hadn’t thought very explicitly about how many OOMs of compute at that time) - so I thought it would probably take decades to get to the required amount of compute.
      I very likely hadn’t thought hard and long enough to necessarily integrate/make coherent my various beliefs.
      Probably at least partly because there seemed to be a lot of social pressure from academic peers against even something like ’20% on scaling’, and even against taking AGI and AGI safety seriously at all. This likely made it harder to ‘viscerally feel’ what some of my beliefs might imply, and especially that it might happen very soon (which also had consequences in delaying when I’d go full-time into working on AI safety; along with thinking I’d have more time to prepare for it, before going all in).
- Noosphere89 4 Oct 2024 16:33 UTC
  −1 points
  −10
  Parent
  Yeah, I do think that Moravec and Leike got the AI situation most correct, and yeah people were wrong to dismiss Yudkowsky for having short timelines.
  
  This was the thing they got most correct, which is interesting because unfortunately, Yudkowsky got almost everything else incorrect about how superhuman AIs would work, and also got the alignment situation very wrong as well, which is very important to take note of.
  
  LW in general got short timelines and the idea that AI will probably be the biggest deal in history correct, but went wrong in assuming they knew well about how AI would eventually work (remember the times when Eliezer Yudkowsky dismissed neural networks working for capabilities instead of legible logic?) and also got the alignment situation very wrong, due to way overcomplexifying human values and relying on the evopsych frame way too much for human values, combined with not noticing that the differences between humans and evolution that mattered for capabilities also mattered for alignment.
  
  I believe a lot of the issue comes down to incorrectly conflating the logical possibility of misalignment with the probability of misalignement being high enough that we should take serious action, and the interlocutors they talked with often denied the possibility that misalignment could happen at all, but LWers then didn’t realize that reality doesn’t grade on a curve, and though their arguments were better than their interlocutors, that didn’t mean they were right.
  - Alexander Gietelink Oldenziel 4 Oct 2024 17:00 UTC
    10 points
    8
    Parent
    Yudkowsky didnt dismiss neural networks iirc. He just said that there were a lot of different approaches to AI and from the Outside View it didnt seem clear which was promising—and plausibly on an Inside View it wasnt very clear that aritificial neural networks were going to work and work so well.
    
    Re:alignment I dont follow. We dont know who will be proved ultimately right on alignment so im not sure how you can make such strong statements about whether Yudkowsky was right or wrong on this aspect.
    
    We havent really gained that much bits on this question and plausibly will not gain many until later (by which time it might be too late if Yudkowsky is right).
    
    I do agree that Yudkowsky’s statements occasionally feel too confidently and dogmatically pessimistic on the question of Doom. But I would argue that the problem is that we simply dont know well because of irreducible uncertainty—not that Doom is unlikely.
    - Noosphere89 4 Oct 2024 17:43 UTC
      6 points
      2
      Parent
      Mostly, I’m annoyed by how much his argumentation around alignment matches the pattern of dismissing various approaches to alignment using similar reasoning to how he dismissed neural networks:
      Even if it was correct to dismiss neural networks years ago, it isn’t now, so it’s not a good sign that the arguments rely on this issue:
      https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky#HpPcxG9bPDFTB4i6a
      I am going to argue that we do have quite a lot of bits on alignment, and the basic argument can be summarized like this:
      Human values are much less complicated than people thought, and also more influenced by data than people thought 15-20 years ago, and thus much, much easier to specify than people thought 15-20 years ago.
      That’s the takeaway I have from current LLMs handling human values, and I basically agree with Linch’s summary of Matthew Barnett’s post on the historical value misspecification argument of what that means in practice for alignment:
      https://www.lesswrong.com/posts/i5kijcjFJD6bn7dwq/evaluating-the-historical-value-misspecification-argument#N9ManBfJ7ahhnqmu7
      It’s not about LLM safety properties, but about what has been revealed about our values.
      Another way to say it is that we don’t need to reverse-engineer social instincts for alignment, contra @Steven Byrnes, because we can massively simplify what the social instinct parts of our brain that contribute to alignment are doing in code, because while the mechanisms for how humans get their morality and not be psychopaths are complicated, it doesn’t matter, because we can replicate it’s function with much simpler code and data, and go to a more blank-slate design for AIs:
      
      https://www.lesswrong.com/posts/PTkd8nazvH9HQpwP8/building-brain-inspired-agi-is-infinitely-easier-than#If_some_circuit_in_the_brain_is_doing_something_useful__then_it_s_humanly_feasible_to_understand_what_that_thing_is_and_why_it_s_useful__and_to_write_our_own_CPU_code_that_does_the_same_useful_thing_
      (A similar trick is one path to solving robotics for AIs, but note this is only one part, it might be that the solution routes through a different mechanism).
      Really, I’m not mad about his original ideas, because they might have been correct, and it wasn’t obviously incorrect, I’m just mad that he didn’t realize that he had to update to reality more radically than he had realized, and seems to conflate the bad argument for AI will understand our values, therefore it’s safe, with the better argument that LLMs show it’s easier to specify values without drastically wrong results, and that it’s not a complete solution to alignment, but a big advance on outer alignment in the usual dichotomy.
      What links here?
      Noosphere89's comment on Alexander Gietelink Oldenziel’s Shortform by Alexander Gietelink Oldenziel (4 Oct 2024 18:30 UTC; 4 points)
      - Alexander Gietelink Oldenziel 4 Oct 2024 17:59 UTC
        6 points
        2
        Parent
        It’s a plausible argument imho. Time will tell.
        
        To my mind an important dimension, perhaps the most important dimensions is how values be evolve under reflection.
        
        It’s quite plausible to me that starting with an AI that has pretty aligned values it will self-reflect into evil. This is certainly not unheard of in the real world (let alone fiction!). Of course it’s a question about the basin of attraction around helpfulness and harmlessness. I guess I have only weak priors on what this might look like under reflection, although plausibly friendliness is magic.
        Garrett Baker 4 Oct 2024 18:14 UTC
        4 points
        0
        Parent
        
        It’s quite plausible to me that starting with an AI that has pretty aligned values it will self-reflect into evil.
        
        I disagree, but could be a difference in definition of what “perfectly aligned values” means. Eg if the AI is dumb (for an AGI) and in a rush, sure. If its a superintelligence already, even in a rush, seems unlikely. [edit:] If we have found an SAE feature which seems to light up for good stuff, and down for bad stuff 100% of the time, then we clamp it, then yeah, that could go away on reflection.
        Noosphere89 4 Oct 2024 18:13 UTC
        4 points
        0
        Parent
        Another way to say it is how values evolve in OOD situations.
        
        My general prior, albeit reasonably weak is that the best single way to predict how values evolve is looking at their data sources, as well as what data they received up to now, and the second best way to predict it is looking at what their algorithms are, especially for social situations, and that most of the other factors don’t matter nearly as much.
  - quetzal_rainbow 4 Oct 2024 18:18 UTC
    8 points
    4
    Parent
    Yudkowsky got almost everything else incorrect about how superhuman AIs would work,
    I think this statement is incredibly overconfident, because literally nobody knows how superhuman AI would work.
    And, I think, this is general shape of problem: incredible number of people got incredibly overindexed on how LLMs worked in 2022-2023 and drew conclusions which seem to be plausible, but not as probable as these people think.
    - Noosphere89 4 Oct 2024 18:30 UTC
      4 points
      0
      Parent
      Okay, I talked more on what conclusions we can draw from LLMs that actually generalize to superhuman AI here, so go check that out:
      
      https://www.lesswrong.com/posts/tDkYdyJSqe3DddtK4/alexander-gietelink-oldenziel-s-shortform#mPaBbsfpwgdvoK2Z2
      
      The really short summary is human values are less complicated and more dependent on data than people thought, and we can specify our values rather easily without it going drastically wrong:
      
      This is not a property of LLMs, but of us.
      - Garrett Baker 4 Oct 2024 21:51 UTC
        2 points
        0
        Parent
        
        here
        
        is that supposed to be a link?
        Noosphere89 4 Oct 2024 23:02 UTC
        4 points
        0
        Parent
        I rewrote the comment to put the link immediately below the first sentence.
        Noosphere89 4 Oct 2024 21:53 UTC
        4 points
        0
        Parent
        The link is at the very bottom of the comment.