Garrett Baker comments on Current AIs Provide Nearly No Data Relevant to AGI Alignment

Garrett Baker 15 Dec 2023 22:59 UTC
26 points
12

They’re not going to produce stellar scientific discoveries where they autonomously invent whole new fields or revolutionize technology.

I disagree with this, and I think you should too, even considering your own views. For example, DeepMind recently discovered 2.2 million new crystals, increasing the number of stable crystals we know about by an order of magnitude. Perhaps you don’t think this is revolutionary, but 5, 10, 15, 50 more papers like it? One of them is bound to be revolutionary.

Maybe you don’t think this is autonomous enough for you. After all its people writing the paper, people who will come up with the ideas of what to use the materials for, and people who built this very particular ML setup in the first place. But then your prediction becomes these tasks will not be automateable by LLMs without making them dangerous. To me these tasks seem pretty basic, likely beyond current LLM abilities, but GPT-5 or 6? Not out of the question given no major architecture or training changes.

(edit note: last sentence was edited in)
- Thane Ruthenis 16 Dec 2023 6:11 UTC
  15 points
  8
  Parent
  Maybe you don’t think this is autonomous enough for you
  Yep. The core thing here is iteration. If an AI can execute a whole research loop on its own – run into a problem it doesn’t know how to solve, figure out what it needs to learn to solve it, construct a research procedure for figuring that out, carry out that procedure, apply the findings, repeat – then research-as-a-whole begins to move at AI speeds. It doesn’t need to wait for a human to understand the findings and figure out where to point it next – it can go off and invent whole new fields at inhuman speeds.
  Which means it can take off; we can meaningfully lose control of it. (Especially if it starts doing AI research itself.)
  Conversely, if there’s a human in the loop, that’s a major bottleneck. As I’d mentioned in the post, I think LLMs and such AIs are a powerful technology, and greatly boosting human research speeds is something where they could contribute. But without a fully closed autonomous loop, that’s IMO not an omnicide risk.
  To me these tasks seem pretty basic, likely beyond current LLM abilities, but GPT-5 or 6? Not out of the question given no major architecture or training changes.
  That’s a point of disagreement: I don’t think GPT-N would be able to do it. I think this post by Nate Soares mostly covers the why. Oh, or this post by janus.
  I don’t expect LLMs to be able to keep themselves “on-target” while choosing novel research topics or properly integrating their findings. That’s something you need proper context-aware consequentialist-y cognition for. It may seem trivial – see janus’ post pointing out that “steering” cognition basically amounts to just injecting a couple decision-bits at the right points – but that triviality is deceptive.
  What links here?
  - Thane Ruthenis's comment on Current AIs Provide Nearly No Data Relevant to AGI Alignment by Thane Ruthenis (16 Dec 2023 6:14 UTC; 2 points)
  - Garrett Baker 16 Dec 2023 9:39 UTC
    6 points
    0
    Parent
    Ok, so if I get a future LLM to write the code to use standard genai tricks to generate novel designs in <area>, write a paper about the results, and the paper is seen as a major revolution in <area>, and this seems to not violate the assumptions Nora and Quintin are making during doom arguments, would this update you? What constraints do you want to put on <area>?
    - Thane Ruthenis 16 Dec 2023 9:50 UTC
      4 points
      0
      Parent
      Nope, because of the “if I get a future LLM to [do the thing]” step. The relevant benchmark is the AI being able to do it on its own. Note also how your setup doesn’t involve the LLM autonomously iterating on its discovery, which I’d pointed out as the important part.
      To expand on that:
      Consider an algorithm that generates purely random text. If you have a system consisting of trillions of human uploads using it, each hitting “rerun” a million times per second, and then selectively publishing only the randomly-generated outputs that are papers containing important mathematical proofs – well, that’s going to generate novel discoveries sooner or later. But the load-bearing part isn’t the random-text algorithm, it’s the humans selectively amplifying those of its outputs that make sense.
      LLM-based discoveries as you’ve proposed, I claim, would be broadly similar. LLMs have a better prior on important texts than a literal uniform distribution, and they could be prompted to further be more likely to generate something useful, which is why it won’t take trillions of uploads and millions of tries. But the load-bearing part isn’t the LLM, it’s the human deciding where to point its cognition and which result to amplify.
      - Garrett Baker 16 Dec 2023 10:10 UTC
        2 points
        0
        Parent
        Paragraph intended as a costly signal I am in fact invested in this conversation, no need to actually read: Sorry for the low effort replies, but by its nature the info I want from you is more costly for you to give than for me to ask for. Thanks for the response, and hopefully thanks also for future responses.
        
        I feel like I’d always be getting an LLM to do something. Like, if I get an LLM to do the field selection for me, does this work?
        
        Maybe more open-endedly: what, concretely, is the closest thing to what I said that would make you update?
        Thane Ruthenis 16 Dec 2023 11:00 UTC
        7 points
        1
        Parent
        Maybe more open-endedly: what, concretely, is the closest thing to what I said that would make you update?
        Oh, nice way to elicit the response you’re looking for!
        The baseline proof-of-concept would go as follows:
        You give the AI some goal, such as writing an analytical software intended to solve some task.
        The AI, over the course of writing the codebase, runs into some non-trivial, previously unsolved mathematical problem. Some formulas need to be tweaked to work in the new context, or there’s some missing math theory that needs to be derived.
        The AI doesn’t hallucinate solutions or swap-in the closest (and invalid) analogue. Instead, it correctly identifies that a problem exists, figures out how it can approach solving it, and goes about doing this.
        As it’s deriving new theory, it sometimes runs into new sub-problems. Likewise, it doesn’t hallucinate solutions, but spins off some subtasks, and solves sub-problems in them.
        Ideally, it even defines experiments or rigorous test procedures for fault-checking its theory empirically.
        In the end, it derives a whole bunch of novel abstractions/functions/terminology, with layers of novel abstractions building up on the preceding layers of novel abstractions, and all of that is coherently optimized to fit into the broader software-engineering task it’s been given.
        The software works. It doesn’t need to be bug-free, the theory doesn’t need to be perfect, but it needs to be about as good as a human programmer would’ve managed, and actually based on some novel derivations.
        This seems like something an LLM, e. g. in an AutoGPT wrapper, should be able to do, if its base model is generally intelligent
        I am a bit wary of reality Goodharting on this test, though. E. g., I can totally imagine some specific niche field in which an LLM, for some reason, can do this, but can’t do it anywhere else. Or some fuzziness around what counts as “novel math” being exploited – e. g., if the AI happens to hit upon re-applying extant math theory to a different field? Or, even more specifically, that there’s some specific research-engineering task that some LLM somewhere manages to ace, but in an one-off manner?
        So I would fortify this a bit: individual or isolated instances don’t count. AIs should be broadly known to be able to engage in this sort of stuff. That should be happening frequently, without much optimization and tailoring made on the human end; about as easily as GPT-4 could be tasked to write a graduate-level essay.
        It’s fine if they can’t do that for literally any field. But it should be a “blacklist” of fields, not a “whitelist” of fields.
        So if we get an AI model that can do this, and it’s based on something relevantly similar to the current paradigm, and it doesn’t violate the LLM-style safety guarantees, I think that would be significant evidence against my model.
        What links here?
        Thane Ruthenis's comment on TurnTrout’s shortform feed by TurnTrout (17 Dec 2023 5:31 UTC; 6 points)
        Garrett Baker 16 Dec 2023 22:12 UTC
        11 points
        9
        Parent
        Maybe a more relevant concern I have with this is it feels like a “Can you write a symphony” type test to me. Like, there are very few people alive right now who could do the process you outline without any outside help, guidance, or prompting.
        Thane Ruthenis 17 Dec 2023 5:49 UTC
        4 points
        1
        Parent
        Yeah, it’s necessarily a high bar. See justification here.
        I’m not happy about only being able to provide high-bar predictions like this, but it currently seems to me to be a territory-level problem.
        Garrett Baker 17 Dec 2023 23:28 UTC
        6 points
        4
        Parent
        It really seems like there should be a lower bar to update though. Like, you say to consider humans as an existence proof of AGI, so likely your theory says something about humans. There must be some testable part of everyday human cognition which relies on this general algorithm, right?
        
        Like, at the very least, what if we looked at fMRIs of human brains while they were engaging in all the tasks you laid out above, and looked at some similarity metric between the scans? You would probably expect there to be lots of similarity compared to, possibly, say Jacob Cannell or Quintin Pope’s predictions. Right?
        
        Even if you don’t think one similarity metric could cover it, you should still be able to come up with some difference of predictions, even if not immediately right now.
        
        Edit: Also I hope you forgive me for not asking for a prediction of this form earlier. It didn’t occur to me.
        Thane Ruthenis 18 Dec 2023 5:55 UTC
        2 points
        0
        Parent
        There must be some testable part of everyday human cognition which relies on this general algorithm, right?
        Well, yes, but they’re of a hard-to-verify “this is how human cognition feels like it works” format. E. g., I sometimes talk about how humans seem to be able to navigate unfamiliar environments without experience, in a way that seems to disagree with baseline shard-theory predictions. But I don’t think that’s been persuading people not already inclined to this view.
        The magical number 7±2 and the associated weirdness is also of the relevant genre.
        Like, at the very least, what if we looked at fMRIs of human brains while they were engaging in all the tasks you laid out above, and looked at some similarity metric between the scans?
        Hm, I guess something like this might work? Not sure regarding the precise operationalization, though.
        Garrett Baker 26 Dec 2023 21:40 UTC
        2 points
        0
        Parent
        Hm, I guess something like this might work? Not sure regarding the precise operationalization, though.
        You willing to do a dialogue about predictions here with @jacob_cannell or @Quintin Pope or @Nora Belrose or others (also a question to those pinged)?
        Expand this thread
        Thane Ruthenis 26 Dec 2023 22:41 UTC
        4 points
        0
        Parent
        If any of the others are particularly enthusiastic about this and expect it to be high-value, sure!
        That said, I personally don’t expect it to be particularly productive.
        These sorts of long-standing disagreements haven’t historically been resolvable via debate (the failure of Hanson vs. Yudkowsky is kind of foundational to the field).
        I think there’s great value in having a public discussion nonetheless, but I think it’s in informing the readers’ models of what different sides believe.
        Thus, inasmuch as we’re having a public discussion, I think it should be optimized for thoroughly laying out one’s points to the audience.
        However, dialogues-as-a-feature seem to be more valuable to the participants, and are actually harder to grok for readers.
        Thus, my preferred method for discussing this sort of stuff is to exchange top-level posts trying to refute each other (the way this post is, to a significant extent, a response to the AI is easy to control article), and then maybe argue a bit in the comments. But not to have a giant tedious top-level argument.
        I’d actually been planning to make a post about the difficulties the “classical alignment views” have with making empirical predictions, and I guess I can prioritize it more?
        But I’m overall pretty burned out on this sort of arguing. (And arguing about “what would count as empirical evidence for you?” generally feels like too-meta fake work, compared to just going out and trying to directly dredge up some evidence.)
        Quintin Pope 26 Dec 2023 21:57 UTC
        2 points
        0
        Parent
        Not entirely sure what @Thane Ruthenis’ position is, but this feels like a maybe relevant piece of information: https://www.science.org/content/article/formerly-blind-children-shed-light-centuries-old-puzzle
        Thane Ruthenis 26 Dec 2023 22:16 UTC
        4 points
        0
        Parent
        Not sure what the relevance is? I don’t believe that “we possess innate (and presumably God-given) concepts that are independent of the senses”, to be clear. “Children won’t be able to instantly understand how to parse a new sense and map its feedback to the sensory modalities they’ve previously been familiar with, but they’ll grok it really fast with just a few examples” was my instant prediction upon reading the titular question.
        jacob_cannell 27 Dec 2023 3:10 UTC
        11 points
        0
        Parent
        I also not sure of the relevance and not following the thread fully, but the summary of that experiment is that it takes some time (measured in nights of sleep which are rough equivalent of big batch training updates) for the newly sighted to develop vision, but less time than infants—presumably because the newly sighted already have full functioning sensor inference world models in another modality that can speed up learning through dense top down priors.
        
        But its way way more than “grok it really fast with just a few examples”—training their new visual systems still takes non-trivial training data & time
        Garrett Baker 17 Dec 2023 0:00 UTC
        4 points
        0
        Parent
        Though, admittedly, the prompt was to modify the original situation I presented, which had an output currently very difficult for any human to produce to begin with. So I don’t quite fault you for responding in kind.
        Bezzi 16 Dec 2023 23:21 UTC
        1 point
        0
        Parent
        Well, for what’s worth, I can write a symphony (following the traditional tonal rules), as this is actually mandated in order to pass some advanced composition classes. I think that letting the AI write a symphony without supervision and then make some composition professor evaluate it could actually be a very good test, because there’s no way a stochastic parrot could follow all the traditional rules correctly for more than a few seconds (an even better test would be to ask it to write a fugue on a given subject, whose rules are even more precise).
        Garrett Baker 16 Dec 2023 17:50 UTC
        4 points
        2
        Parent
        
        So I would fortify this a bit: individual or isolated instances don’t count. AIs should be broadly known to be able to engage in this sort of stuff. That should be happening frequently, without much optimization and tailoring made on the human end; about as easily as GPT-4 could be tasked to write a graduate-level essay.
        
        I think sticking to this would make it difficult for you to update sooner. We should expect small approaches before large approaches here, and private solutions before publicly disclosed solutions.
        
        Relatedly would DeepMind’s recent LLM mathematical proof paper if it were more general count? They give LLMs feedback via an evaluator function, exploiting the NP hard nature of a problem in combinatorics and bin packing (note: I have not read this paper in full).
- Gunnar_Zarncke 15 Dec 2023 23:37 UTC
  7 points
  5
  Parent
  They’re not going to produce stellar scientific discoveries where they autonomously invent whole new fields or revolutionize technology.
  You say it yourself: “DeepMind recently discovered 2.2 million new crystals.” Because a human organization used the tool.
  Though maybe this hints at a risk category the OP didn’t mention: That a combination of humans and advanced AI tools (that themselves are not ASI) together could be effectively an unopposable ASI.
  - Garrett Baker 15 Dec 2023 23:41 UTC
    4 points
    2
    Parent
    So I restate my final paragraph:
    
    Maybe you don’t think this is autonomous enough for you. After all its people writing the paper, people who will come up with the ideas of what to use the materials for, and people who built this very particular ML setup in the first place. But then your prediction becomes these tasks will not be automateable by LLMs without making them dangerous. To me these tasks seem pretty basic, likely beyond current LLM abilities, but GPT-5 or 6? Not out of the question given no major architecture or training changes.
  - Thane Ruthenis 16 Dec 2023 6:32 UTC
    2 points
    0
    Parent
    a combination of humans and advanced AI tools (that themselves are not ASI) together could be effectively an unopposable ASI
    Yeah, I’m not unworried about eternal-dystopia scenarios enabled by this sort of stuff. I’d alluded to it some, when mentioning scaled-up LLMs potentially allowing “perfect-surveillance dirt-cheap totalitarianism”.
    But it’s not quite an AGI killing everyone. Fairly different threat model, deserving of its own analysis.
- Gerald Monroe 15 Dec 2023 23:17 UTC
  6 points
  3
  Parent
  I also thought this. Then we run a facility full of robots and have them synthesize and measure the material properties of all 2.2 million crystals. Replication is cheap and would be automatically done so we don’t waste time on materials that seem good due to an error.
  
  Then a human scientist writes a formula that takes into account several properties for suitability to a given tasks, sorts the spreadsheet of results by the formula, orders built a new device using the top scoring materials, writes a paper with the help of a gpt, publishes and collects the rewards for this amazing new discovery.
  
  So I think the OP is thinking that last 1 percent or 0.1 percent contributed by the humans means the model isn’t fully autonomous? And I have seen a kinda bias on lesswrong where many posters went to elite schools and do elite work and they don’t realize all the other people that are needed for anything to be done. For example every cluster of a million GPUs requires a large crew of technicians and all the factory workers and engineers who designed and built all the hardware.
  
  In terms of human labor hours, 10 AI researchers using a large cluster are greatly outnumbered by the other people involved they don’t see. Possibly thousands of other people working full time when you start considering billion dollar clusters, if just 20 percent of that was paying for human labor at the average salary weighted by Asia.
  
  This means ai driven autonomy can be transformational even if the labor of the most elite workers can’t be done by AI.
  
  In numbers, if just 1 of those AI researchers can be automated, but 90 percent of the factory workers and mine workers, and the total crew was 1000 people including all the invisible contributors in Asia, then for the task of AI research it needs 109 people instead of 1000.
  
  But from the OPs perspective, the model hasn’t automated much, you need 9 elite researchers instead of 10. And actually the next generation of AI is more complex so you hire more people and less new ideas are working as low hanging fruit are plucked. If you focus on just elite contributors, only the most powerful AI can be transformational. I have noticed this bias from several prominent lesswrong posters.
  - Morpheus 16 Dec 2023 1:03 UTC
    5 points
    2
    Parent
    I am confused. I agree with the above scenario, but disagree that the focus is a bias. Sure, for human society the linear speed-up scale is important, but for the dynamics of the intelligence explosion the log-scale seems more important. By your own account, we would rapidly move to a situation, where the most capable humans/institutions are in fact the bottleneck. As anyone who is not able to keep up with the speed of their job being automated away is not going to contribute a lot on the margin of intelligence self-improvement. For example, OpenAI/Microsoft/Deepmind/Anthropic/Meta deciding in the future to design and manufacture their chips in house, because NVIDIA can’t keep up etc… I don’t know if I expect this would make NVIDIA’s stock tank before the world ends. I expect everyone else to profit from slowly generating mundane utility from general AI tools, as is happening today.
    - Gerald Monroe 16 Dec 2023 1:08 UTC
      8 points
      5
      Parent
      Here’s another aspect you may not have considered. “Only” being able to automate the lower 90-99 percent of human industrial tasks results in a conventional industry explosion. Scaling continue until the 1-10 percent of humans still required is the limiting factor.
      
      A world that has 10 to 100 times today’s entire capacity for everything (that means consumer goods, durable goods like cars, weapons, structures if factory prefab) is transformed.
      
      And this feeds back into itself like you realize, the crew of AI researchers trying to automate themselves now has a lot more hardware to work with etc.
  - Garrett Baker 16 Dec 2023 0:17 UTC
    4 points
    2
    Parent
    This seems overall consistent with Thane’s statements in the post? They don’t make any claims about current AIs not being a transformative technology. Indeed, they do state that current AIs are a powerful technology.
    - Gerald Monroe 16 Dec 2023 0:24 UTC
      4 points
      0
      Parent
      Third and last paragraph I try to explain why the OP and prominent experts like Matthew Barnett and Richard Ngos and others all model much harder standards for when AI will be transformative.
      
      For a summary: advancing technology is mostly perspiration not inspiration, automating the perspiration will be transformative.
  - Thane Ruthenis 16 Dec 2023 6:14 UTC
    2 points
    0
    Parent
    This means ai driven autonomy can be transformational even if the labor of the most elite workers can’t be done by AI.
    Oh, totally. But I’m not concerned about transformations to the human society in general, I’m concerned about AGI killing everyone. And what you’ve described isn’t going to lead to AGI killing everyone.
    See my reply here for why I think complete autonomy is crucial.