Logan Zoellner comments on What happens next?

Logan Zoellner 29 Dec 2024 17:23 UTC
2 points
0
you were saying that gpt4o is comparable to a 115 IQ human
gpt4o is not literally equivalent to a 115 IQ human.
Use whatever word you want for the concept “score produced when an LLM takes an IQ test”.
- Matt Goldenberg 29 Dec 2024 19:03 UTC
  4 points
  0
  Parent
  But is this comparable to G? Is it what we want to measure?
  - Logan Zoellner 29 Dec 2024 20:21 UTC
    2 points
    0
    Parent
    I have no idea what you want to measure.
    
    I only know that LLMs are continuing to steadily increase in some quality (which you are free to call “fake machine IQ” or whatever you want) and that If they continue to make progress at the current rate there will be consequences and we should prepare to deal with those consequences.
    - Matt Goldenberg 29 Dec 2024 20:25 UTC
      2 points
      0
      Parent
      I think there’s a world where AIs continue to saturate benchmarks and the consequences are that the companies getting to say they saturate those benchmarks.
      Especially at the tails of those benchmarks I imagine it won’t be about the consequences we care about like general reasoning, ability to act autonomously, etc.
      - Logan Zoellner 29 Dec 2024 20:45 UTC
        2 points
        0
        Parent
        on a metaphysical level I am completely on board with “there is no such thing as IQ. Different abilities are completely uncorrelated. Optimizing for metric X is uncorrelated with desired quality Y...”
        
        On a practical level, however, I notice that every time OpenAI announces they have a newer shinier model, it both scores higher on whatever benchmark and is better at a bunch of practical things I care about.
        Imagine there was a theoretically correct metric called the_thing_logan_actually_cares_about. I notice in my own experience there is a strong correlation between “fake machine IQ” and the_thing_logan_actually_cares_about. I further note that if one makes a linear fit against:
        Progress_over_time + log(training flops) + log(inference flops)
        It nicely predicts both the_thing_logan_actually_cares_about and “fake machine IQ”.
        _liminaldrift 30 Dec 2024 1:48 UTC
        1 point
        0
        Parent
        This reminds me of this LessWrong post.
        
        If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics
        
        https://www.lesswrong.com/posts/9Tw5RqnEzqEtaoEkq/if-it-s-worth-doing-it-s-worth-doing-with-made-up-statistics