Daniel Kokotajlo comments on Updating my AI timelines

Daniel Kokotajlo 8 Dec 2022 17:26 UTC
5 points
0
A year and a half ago I wrote this detailed story of how the next five years would go. Which parts of it do you disagree with?
- Vitor 9 Dec 2022 0:46 UTC
  15 points
  4
  Parent
  Sure, let me do this as an exercise (ep stat: babble mode). Your predicions are pretty sane overall, but I’d say you handwave away problems (like integration over a variety of domains, long-term coherent behavior, and so on) that I see as (potentially) hard barriers to progress.
  
  2022
  - 2022 is basically over and I can’t get a GPT instance to order me a USB stick online.
  2023
  - basically agree, this is where we’re at right now (perhaps with the intensity turned down a notch)
  2024
  - you’re postulating that “It’s easy to make a bureaucracy and fine-tune it and get it to do some pretty impressive stuff, but for most tasks it’s not yet possible to get it to do OK all the time.” I have a fundamental disagreement here. I don’t think these tools will be effective at doing any task autonomously (fooling other humans doesn’t count, neither does forcing humans to only interact with a company through one of these). Currently (2022) chatGPT is arguably useful as a babbling tool, stimulating human creativity and allowing it to make templating easier (this includes things like easy coding tasks). I don’t see anything in your post that justifies the implicit jump in capabilities you’ve snuck in here.
  - broadly agree with your ideas on propaganda, from the production side (i.e. that lots of companies/governments will be doing lots of this stuff). But I think that general attitudes in the population will shift (cynicism etc) and provide some amount of herd immunity. Note that the influence of the woke movement is already fading, shortly after it went truly mainstream and started having visible influence in average people’s lives. This is not a coincidence.
  2025
  - Doing well at diplomacy is not very related to general reasoning skills. I broadly agree with Zvi’s take and also left some of my thoughts there.
  - I’m very skeptical that bureaucracies will be the way forward. They work for trivial tasks but reliably get lost in the weeds and start talking to themselves in circles for anything requiring a non-trivial amount of context.
  - disagree on orders of magnitude improvements in hardware. You’re proposing a 100x decrease in costs compared to 2020, when it’s not even clear our civilization is capable of keeping hardware at current levels generally available, let alone cope with a significant increase in demand. Semiconductor production is much more centralized/fragile than people think, so even though billions of these things are produced per year, the efficient market hypothesis does not apply to this domain.
  2026
  - Here you’re again postulating jumps in capabilities that I don’t see justified. You talk about the “general understanding and knowledge of pretrained transformers”, when understanding is definitely not there, and knowledge keeps getting corrupted by the AI’s tendency to synthesize falsities as confidently as truths. Insofar as the AI can be said to be intelligent at all, it’s all symbol manipulation at a high simulacron level. Integration with real-world tasks keeps mysteriously failing as the AI flounders around in a way that is simultaneously very sophisticated, but oh so very reminiscent of 2022.
  - disagree about your thoughts on propaganda, which is just an obvious extension of my 2024 thoughts above. I also notice that social changes this large take orders of magnitude longer to percolate through society than what you predict, so I disagree with your predictions even conditioned on your views of the raw effectiveness of these systems.
  - “chatbots quickly learn about themselves” etc. Here you’re conflating the regurgitation of desirable phrases with actual understanding. I notice that as you write your timeline, your language morphs to make your AIs more and more conscious, but you’re not justifying this in any way other than… something something self-referential, something something trained on their own arxiv papers. I don’t mean to be overly harsh, but here you seem to be sneaking in the very thing that’s under debate!
  - Daniel Kokotajlo 9 Dec 2022 5:38 UTC
    7 points
    4
    Parent
    Excellent, thanks for this detailed critique! I think this might be the best that post has gotten thus far, I’ll probably link to it in the future.
    
    Point by point reply, in case you are interested:
    
    2022-2023: Agree. Note that I didn’t forecast that an AI could buy you a USB stick by 2022; I said people were dreaming of such things but that they didn’t actually work yet.
    
    2024: We definitely have a real disagreement about AI capabilities here; I do expect fine-tuned bureaucracies to be useful for some fairly autonomous things by 2024. (For example, the USB stick thing I expect to work fine by 2024). Not just babbling and fooling humans and forcing people to interact with a company through them.
    
    Re propaganda/persuasion: I am not sure we disagree here, but insofar as we disagree I think you are correct. We agree about what various political actors will be doing with their models—propaganda, censorship, etc. We disagree about how big an effect this will have on the populace. Or at least, 2021-me disagrees with 2022-you. I think 2022-me has probably come around to your position as well; like you say, it just takes time for these sorts of things to influence the public + there’ll probably be a backlash / immunity effect. Idk
    
    .2025: I admit I overestimated how hard diplomacy would turn out to be. In my defense, Cicero only won because the humans didn’t know they were up against a bot. Moreover it’s a hyper-specialized architecture trained extensively on Diplomacy, so it indeed doesn’t have general reasoning skills at all
    
    .We continue to disagree about the potential effectiveness of fine-tuned bureaucracies. To be clear I’m not confident, but it’s my median prediction
    
    .I projected a 10x decrease in hardware costs, and also a 10x improvement in algorithms/software, from 2020 to 2025. I stand by that prediction
    
    .2026:
    
    We disagree about whether understanding is (or will be) there. I think yes, you think no. I don’t think that these AIs will be “merely symbol manipulators” etc. I don’t think the data-poisoning effect will be strong enough to prevent this
    
    .As mentioned above, I do take the point that society takes a long time to change and probably I shouldn’t expect the propaganda etc. to make that much of a difference in just a few years. Idk
    
    .I’m not conflating those things, I know they are different. I am and was asserting that the chatbots would actually have understanding, at least in all the behaviorally relevant senses (though I’d argue also in the philosophical senses as well). You are correct that I didn’t argue for this in the text—but that wasn’t the point of the text, the text was stating my predictions, not attempting to argue for them.
    
    ETA: I almost forgot, it sounds like you mostly agree with my predictions, but think AGI still won’t be nigh even in my 2026 world? Or do you instead think that the various capabilities demonstrated in the story won’t occur in real life by 2026? This is important because if 2026 comes around and things look more or less like I said they would, I will be saying that AGI is very near. Your original claim was that in the next few years the median LW timeline would start visibly clashing with reality; so you must think that things in real-life 2026 won’t look very much like my story at all. I’m guessing the main way it’ll be visibly different, according to you, is that AI still won’t be able to do autonomous things like go buy USB sticks? Also they won’t have true understanding—but what will that look like? Anything else?
    - Vitor 9 Dec 2022 23:12 UTC
      5 points
      −1
      Parent
      I do roughly agree with your predictions, except that I rate the economic impact in general to be lower. Many headlines, much handwringing, but large changes won’t materialize in a way that matters.
      
      To put my main objection succinctly, I simply don’t see why AGI would follow soon from your 2026 world. Can you walk me through it?
      - Daniel Kokotajlo 9 Dec 2022 23:30 UTC
        8 points
        1
        Parent
        OK, well, you should retract your claim that the median LW timeline will soon start to clash with reality then! It sounds like you think reality will look basically as I predicted! (I can’t speak for all of LW of course but I actually have shorter timelines than the median LWer, I think.)
        
        Re AGI happening in 2027 in my world: Yep good question. I wish I had had the nerve to publish my 2027 story. A thorough answer to your question will take hours (days?) to write, and so I beg pardon for instead giving this hasty and incomplete answer:
        
        --For R&D, when I break down the process that happens at AI labs, the process that produces a steady stream of better algorithms, it sure seems like there are large chunks of that loop that can be automated by the kinds of coding-and-research-assistant-bots that exist by 2026 in my story. Plus a few wild cards besides, that could accelerate R&D still further. I actually think completely automating the process is likely, but even if that doesn’t happen, a substantial speedup would be enough to reach the next tier of improvements which would then get us to the tier after that etc.
        --For takeover, the story is similar. I think about what sorts of skills/abilities an AI would need to take over the world, e.g. it would need to be APS-AI as defined in the Carlsmith report on existential risk from power-seeking AI. Then I think about whether the chatbots of 2026 will have all of those skills, and it seems like the answer is yes.
        --Separately, I struggle to think of any important skill/ability that isn’t likely to happen by 2026 in this story. Long-horizon agency? True understanding? General reasoning ability? The strongest candidate is ability to control robots in messy real-world environments, but alas that’s not a blocker, even if AIs can’t do that, they can still accelerate R&D and take over the world.
        
        What do you think the blockers are—the important skills/abilities that no AI will have by 2026?
        Vitor 13 Dec 2022 1:10 UTC
        5 points
        0
        Parent
        
        OK, well, you should retract your claim that the median LW timeline will soon start to clash with reality then! It sounds like you think reality will look basically as I predicted! (I can’t speak for all of LW of course but I actually have shorter timelines than the median LWer, I think.)
        
        I retract the claim in the sense that it was a vague statement that I didn’t expect to be taken literally, which I should have made clearer! But it’s you who operationalized “a few years” as 2026 and “the median less wrong view” as your view.
        
        Anyway, I think I see the outline of our disagreement now, but it’s still kind of hard to pin down.
        
        First, I don’t think that AIs will be put to unsupervised use in any domain where correctness matters, i.e., given fully automated access to valuable resources, like money or compute infrastructure. The algorithms that currently do this have a very constrained set of actions they can take (e.g. an AI chooses an ad to show out of a database of possible ads), and this will remain so.
        
        Second, perhaps I didn’t make clear enough that I think all of the applications will remain in this twilight of almost working, showing some promise, etc, but not actually deployed (that’s what I meant by the economic impact remaining small). So, more thinkpieces about what could happen (with isolated, splashy examples), rather than things actually happening.
        
        Third, I don’t think AIs will be capable of performing tasks that require long attention spans, or that trade off multiple complicated objectives against each other. With current technology, I see AIs constrained to be used for short, self-contained tasks only, with a separate session for each task.
        
        Does that make the disagreement clearer?
        Daniel Kokotajlo 13 Dec 2022 4:22 UTC
        3 points
        0
        Parent
        I stand by my decision to operationalize “a few years” as 2026, and I stand by my decision to use my view as a proxy for the median LW view: since you were claiming that the median LW view was too short-timelinesy, and would soon clash with reality, and I have even shorter timelines than the median LW view and yet (you backtrack-claim) my view won’t soon clash with reality.
        
        Thank you for the clarification of your predictions! It definitely helps, but unfortunately I predict that goalpost-moving will still be a problem. What counts as “domain where correctness matters?” What counts as “very constrained set of actions?” Would e.g. a language-model-based assistant that can browse the internet and buy things for you on Amazon (with your permission of course) be in line with what you expect, or violate your expectations?
        
        What about the applications that I discuss in the story, e.g. the aforementioned smart buyer assistant, the video-game-companion-chatbot, etc.? Do they not count as fully working? Are you predicting that there’ll be prototypes but no such chatbot with more than, say, 100,000 daily paying users?
        
        (Also, what about Copilot? Isn’t it already an example of an application that genuinely works, and isn’t just in the twilight zone?)
        
        What counts as a long attention span? 1000 forward passes? A million? What counts as trading off multiple complicated objectives against each other, and why doesn’t ChatGPT already qualify?
        
        Vitor 13 Dec 2022 15:20 UTC
        3 points
        1
        Parent
        Mmm, I would say the general shape of your view won’t clash with reality, but the magnitude of the impact will.
        
        It’s plausible to me that a smart buyer will go and find the best deal for you when you tell it to buy laptop model X. It’s not plausible to me that you’ll be able to instruct it “buy an updated laptop for me whenever a new model comes out that is good value and sufficiently better than what I already have,” and then let it do its thing completely unsupervised (with direct access to your bank account). That’s what I mean by multiple complicated objectives.
        
        What counts as “domain where correctness matters?” What counts as “very constrained set of actions?” Would e.g. a language-model-based assistant that can browse the internet and buy things for you on Amazon (with your permission of course) be in line with what you expect, or violate your expectations?
        
        Something that goes beyond current widespread use of AI such as spam-filtering. Spam-filtering (or selecting ads on facebook, or flagging hate speech etc) is a domain where the AI is doing a huge number of identical tasks, and a certain % of wrong decisions is acceptable. One wrong decision won’t tank the business. Each copy of the task is done in an independent session (no memory).
        
        An example application where that doesn’t hold is putting the AI in charge of ordering all the material inputs for your factory. Here, a single stupid mistake (didn’t buy something because the price will go down in the future, replaced one product with another, misinterpret seasonal cycles) will lead to a catastrophic stop of the entire operation.
        
        (Also, what about Copilot? Isn’t it already an example of an application that genuinely works, and isn’t just in the twilight zone?)
        
        Copilot is not autonomous. There’s a human tightly integrated into everything it’s doing. The jury is still out on if it works, i.e., do we have anything more than some programmers’ self reports to substantiate that it increases productivity? Even if it does work, it’s just a productivity tool for humans, not something that replaces humans at their tasks directly.
        gwern 13 Dec 2022 17:33 UTC
        10 points
        1
        Parent
        
        Copilot is not autonomous.
        
        A distinction which makes no difference. Copilot-like models are already being used in autonomous code-writing ways, such as AlphaCode which executes generated code to check against test cases, or evolving code, or LaMDA calling out to a calculator to run expressions, or ChatGPT writing and then ‘executing’ its own code (or writing code like SVG which can be interpreted by the browser as an image), or Adept running large Transformers which generate & execute code in response to user commands, or the dozens of people hooking up the OA API to a shell, or… Tool AIs want to be agent AIs.
        Daniel Kokotajlo 9 Dec 2022 23:39 UTC
        2 points
        0
        Parent
        (Oh, also: When I wrote the 2026 story, I did it using my timelines which were something like median 2029. And I had trends to extrapolate, underlying models, etc. And also: Bio-anchors style models, when corrected to have better settings of the various inputs, yield something like 2029 median also. In fact that’s why my median was what it was. So I’d say that multiple lines of evidence are converging.)
        Nathan Helm-Burger 12 Dec 2022 2:10 UTC
        1 point
        0
        Parent
        My expectations are more focused around the parallel paths of Reflective General Reasoning and Recursive Self-Improvement. I think that both of these paths have thresholds beyond which there is a mode shift to a much faster (and accelerating) development pace, and that we are pretty close to both of these thresholds.