Thane Ruthenis comments on Current AIs Provide Nearly No Data Relevant to AGI Alignment

Thane Ruthenis 16 Dec 2023 9:50 UTC
4 points
0
Nope, because of the “if I get a future LLM to [do the thing]” step. The relevant benchmark is the AI being able to do it on its own. Note also how your setup doesn’t involve the LLM autonomously iterating on its discovery, which I’d pointed out as the important part.
To expand on that:
Consider an algorithm that generates purely random text. If you have a system consisting of trillions of human uploads using it, each hitting “rerun” a million times per second, and then selectively publishing only the randomly-generated outputs that are papers containing important mathematical proofs – well, that’s going to generate novel discoveries sooner or later. But the load-bearing part isn’t the random-text algorithm, it’s the humans selectively amplifying those of its outputs that make sense.
LLM-based discoveries as you’ve proposed, I claim, would be broadly similar. LLMs have a better prior on important texts than a literal uniform distribution, and they could be prompted to further be more likely to generate something useful, which is why it won’t take trillions of uploads and millions of tries. But the load-bearing part isn’t the LLM, it’s the human deciding where to point its cognition and which result to amplify.
- Garrett Baker 16 Dec 2023 10:10 UTC
  2 points
  0
  Parent
  Paragraph intended as a costly signal I am in fact invested in this conversation, no need to actually read: Sorry for the low effort replies, but by its nature the info I want from you is more costly for you to give than for me to ask for. Thanks for the response, and hopefully thanks also for future responses.
  
  I feel like I’d always be getting an LLM to do something. Like, if I get an LLM to do the field selection for me, does this work?
  
  Maybe more open-endedly: what, concretely, is the closest thing to what I said that would make you update?
  - Thane Ruthenis 16 Dec 2023 11:00 UTC
    7 points
    1
    Parent
    Maybe more open-endedly: what, concretely, is the closest thing to what I said that would make you update?
    Oh, nice way to elicit the response you’re looking for!
    The baseline proof-of-concept would go as follows:
    You give the AI some goal, such as writing an analytical software intended to solve some task.
    The AI, over the course of writing the codebase, runs into some non-trivial, previously unsolved mathematical problem. Some formulas need to be tweaked to work in the new context, or there’s some missing math theory that needs to be derived.
    The AI doesn’t hallucinate solutions or swap-in the closest (and invalid) analogue. Instead, it correctly identifies that a problem exists, figures out how it can approach solving it, and goes about doing this.
    As it’s deriving new theory, it sometimes runs into new sub-problems. Likewise, it doesn’t hallucinate solutions, but spins off some subtasks, and solves sub-problems in them.
    Ideally, it even defines experiments or rigorous test procedures for fault-checking its theory empirically.
    In the end, it derives a whole bunch of novel abstractions/functions/terminology, with layers of novel abstractions building up on the preceding layers of novel abstractions, and all of that is coherently optimized to fit into the broader software-engineering task it’s been given.
    The software works. It doesn’t need to be bug-free, the theory doesn’t need to be perfect, but it needs to be about as good as a human programmer would’ve managed, and actually based on some novel derivations.
    This seems like something an LLM, e. g. in an AutoGPT wrapper, should be able to do, if its base model is generally intelligent
    I am a bit wary of reality Goodharting on this test, though. E. g., I can totally imagine some specific niche field in which an LLM, for some reason, can do this, but can’t do it anywhere else. Or some fuzziness around what counts as “novel math” being exploited – e. g., if the AI happens to hit upon re-applying extant math theory to a different field? Or, even more specifically, that there’s some specific research-engineering task that some LLM somewhere manages to ace, but in an one-off manner?
    So I would fortify this a bit: individual or isolated instances don’t count. AIs should be broadly known to be able to engage in this sort of stuff. That should be happening frequently, without much optimization and tailoring made on the human end; about as easily as GPT-4 could be tasked to write a graduate-level essay.
    It’s fine if they can’t do that for literally any field. But it should be a “blacklist” of fields, not a “whitelist” of fields.
    So if we get an AI model that can do this, and it’s based on something relevantly similar to the current paradigm, and it doesn’t violate the LLM-style safety guarantees, I think that would be significant evidence against my model.
    - Garrett Baker 16 Dec 2023 22:12 UTC
      11 points
      9
      Parent
      Maybe a more relevant concern I have with this is it feels like a “Can you write a symphony” type test to me. Like, there are very few people alive right now who could do the process you outline without any outside help, guidance, or prompting.
      - Thane Ruthenis 17 Dec 2023 5:49 UTC
        4 points
        1
        Parent
        Yeah, it’s necessarily a high bar. See justification here.
        I’m not happy about only being able to provide high-bar predictions like this, but it currently seems to me to be a territory-level problem.
        Garrett Baker 17 Dec 2023 23:28 UTC
        6 points
        4
        Parent
        It really seems like there should be a lower bar to update though. Like, you say to consider humans as an existence proof of AGI, so likely your theory says something about humans. There must be some testable part of everyday human cognition which relies on this general algorithm, right?
        
        Like, at the very least, what if we looked at fMRIs of human brains while they were engaging in all the tasks you laid out above, and looked at some similarity metric between the scans? You would probably expect there to be lots of similarity compared to, possibly, say Jacob Cannell or Quintin Pope’s predictions. Right?
        
        Even if you don’t think one similarity metric could cover it, you should still be able to come up with some difference of predictions, even if not immediately right now.
        
        Edit: Also I hope you forgive me for not asking for a prediction of this form earlier. It didn’t occur to me.
        Thane Ruthenis 18 Dec 2023 5:55 UTC
        2 points
        0
        Parent
        There must be some testable part of everyday human cognition which relies on this general algorithm, right?
        Well, yes, but they’re of a hard-to-verify “this is how human cognition feels like it works” format. E. g., I sometimes talk about how humans seem to be able to navigate unfamiliar environments without experience, in a way that seems to disagree with baseline shard-theory predictions. But I don’t think that’s been persuading people not already inclined to this view.
        The magical number 7±2 and the associated weirdness is also of the relevant genre.
        Like, at the very least, what if we looked at fMRIs of human brains while they were engaging in all the tasks you laid out above, and looked at some similarity metric between the scans?
        Hm, I guess something like this might work? Not sure regarding the precise operationalization, though.
        Garrett Baker 26 Dec 2023 21:40 UTC
        2 points
        0
        Parent
        Hm, I guess something like this might work? Not sure regarding the precise operationalization, though.
        You willing to do a dialogue about predictions here with @jacob_cannell or @Quintin Pope or @Nora Belrose or others (also a question to those pinged)?
        Thane Ruthenis 26 Dec 2023 22:41 UTC
        4 points
        0
        Parent
        If any of the others are particularly enthusiastic about this and expect it to be high-value, sure!
        That said, I personally don’t expect it to be particularly productive.
        These sorts of long-standing disagreements haven’t historically been resolvable via debate (the failure of Hanson vs. Yudkowsky is kind of foundational to the field).
        I think there’s great value in having a public discussion nonetheless, but I think it’s in informing the readers’ models of what different sides believe.
        Thus, inasmuch as we’re having a public discussion, I think it should be optimized for thoroughly laying out one’s points to the audience.
        However, dialogues-as-a-feature seem to be more valuable to the participants, and are actually harder to grok for readers.
        Thus, my preferred method for discussing this sort of stuff is to exchange top-level posts trying to refute each other (the way this post is, to a significant extent, a response to the AI is easy to control article), and then maybe argue a bit in the comments. But not to have a giant tedious top-level argument.
        I’d actually been planning to make a post about the difficulties the “classical alignment views” have with making empirical predictions, and I guess I can prioritize it more?
        But I’m overall pretty burned out on this sort of arguing. (And arguing about “what would count as empirical evidence for you?” generally feels like too-meta fake work, compared to just going out and trying to directly dredge up some evidence.)
        Quintin Pope 26 Dec 2023 21:57 UTC
        2 points
        0
        Parent
        Not entirely sure what @Thane Ruthenis’ position is, but this feels like a maybe relevant piece of information: https://www.science.org/content/article/formerly-blind-children-shed-light-centuries-old-puzzle
        Thane Ruthenis 26 Dec 2023 22:16 UTC
        4 points
        0
        Parent
        Not sure what the relevance is? I don’t believe that “we possess innate (and presumably God-given) concepts that are independent of the senses”, to be clear. “Children won’t be able to instantly understand how to parse a new sense and map its feedback to the sensory modalities they’ve previously been familiar with, but they’ll grok it really fast with just a few examples” was my instant prediction upon reading the titular question.
        jacob_cannell 27 Dec 2023 3:10 UTC
        4 points
        0
        Parent
        I also not sure of the relevance and not following the thread fully, but the summary of that experiment is that it takes some time (measured in nights of sleep which are rough equivalent of big batch training updates) for the newly sighted to develop vision, but less time than infants—presumably because the newly sighted already have full functioning sensor inference world models in another modality that can speed up learning through dense top down priors.
        
        But its way way more than “grok it really fast with just a few examples”—training their new visual systems still takes non-trivial training data & time
      - Garrett Baker 17 Dec 2023 0:00 UTC
        4 points
        0
        Parent
        Though, admittedly, the prompt was to modify the original situation I presented, which had an output currently very difficult for any human to produce to begin with. So I don’t quite fault you for responding in kind.
      - Bezzi 16 Dec 2023 23:21 UTC
        1 point
        0
        Parent
        Well, for what’s worth, I can write a symphony (following the traditional tonal rules), as this is actually mandated in order to pass some advanced composition classes. I think that letting the AI write a symphony without supervision and then make some composition professor evaluate it could actually be a very good test, because there’s no way a stochastic parrot could follow all the traditional rules correctly for more than a few seconds (an even better test would be to ask it to write a fugue on a given subject, whose rules are even more precise).
    - Garrett Baker 16 Dec 2023 17:50 UTC
      4 points
      2
      Parent
      
      So I would fortify this a bit: individual or isolated instances don’t count. AIs should be broadly known to be able to engage in this sort of stuff. That should be happening frequently, without much optimization and tailoring made on the human end; about as easily as GPT-4 could be tasked to write a graduate-level essay.
      
      I think sticking to this would make it difficult for you to update sooner. We should expect small approaches before large approaches here, and private solutions before publicly disclosed solutions.
      
      Relatedly would DeepMind’s recent LLM mathematical proof paper if it were more general count? They give LLMs feedback via an evaluator function, exploiting the NP hard nature of a problem in combinatorics and bin packing (note: I have not read this paper in full).