Maybe you don’t think this is autonomous enough for you
Yep. The core thing here is iteration. If an AI can execute a whole research loop on its own – run into a problem it doesn’t know how to solve, figure out what it needs to learn to solve it, construct a research procedure for figuring that out, carry out that procedure, apply the findings, repeat – then research-as-a-whole begins to move at AI speeds. It doesn’t need to wait for a human to understand the findings and figure out where to point it next – it can go off and invent whole new fields at inhuman speeds.
Which means it can take off; we can meaningfully lose control of it. (Especially if it starts doing AI research itself.)
Conversely, if there’s a human in the loop, that’s a major bottleneck. As I’d mentioned in the post, I think LLMs and such AIs are a powerful technology, and greatly boosting human research speeds is something where they could contribute. But without a fully closed autonomous loop, that’s IMO not an omnicide risk.
To me these tasks seem pretty basic, likely beyond current LLM abilities, but GPT-5 or 6? Not out of the question given no major architecture or training changes.
That’s a point of disagreement: I don’t think GPT-N would be able to do it. I think this post by Nate Soares mostly covers the why. Oh, or this post by janus.
I don’t expect LLMs to be able to keep themselves “on-target” while choosing novel research topics or properly integrating their findings. That’s something you need proper context-aware consequentialist-y cognition for. It may seem trivial – see janus’ post pointing out that “steering” cognition basically amounts to just injecting a couple decision-bits at the right points – but that triviality is deceptive.
Ok, so if I get a future LLM to write the code to use standard genai tricks to generate novel designs in <area>, write a paper about the results, and the paper is seen as a major revolution in <area>, and this seems to not violate the assumptions Nora and Quintin are making during doom arguments, would this update you? What constraints do you want to put on <area>?
Nope, because of the “if I get a future LLM to [do the thing]” step. The relevant benchmark is the AI being able to do it on its own. Note also how your setup doesn’t involve the LLM autonomously iterating on its discovery, which I’d pointed out as the important part.
To expand on that:
Consider an algorithm that generates purely random text. If you have a system consisting of trillions of human uploads using it, each hitting “rerun” a million times per second, and then selectively publishing only the randomly-generated outputs that are papers containing important mathematical proofs – well, that’s going to generate novel discoveries sooner or later. But the load-bearing part isn’t the random-text algorithm, it’s the humans selectively amplifying those of its outputs that make sense.
LLM-based discoveries as you’ve proposed, I claim, would be broadly similar. LLMs have a better prior on important texts than a literal uniform distribution, and they could be prompted to further be more likely to generate something useful, which is why it won’t take trillions of uploads and millions of tries. But the load-bearing part isn’t the LLM, it’s the human deciding where to point its cognition and which result to amplify.
Paragraph intended as a costly signal I am in fact invested in this conversation, no need to actually read: Sorry for the low effort replies, but by its nature the info I want from you is more costly for you to give than for me to ask for. Thanks for the response, and hopefully thanks also for future responses.
I feel like I’d always be getting an LLM to do something. Like, if I get an LLM to do the field selection for me, does this work?
Maybe more open-endedly: what, concretely, is the closest thing to what I said that would make you update?
Maybe more open-endedly: what, concretely, is the closest thing to what I said that would make you update?
Oh, nice way to elicit the response you’re looking for!
The baseline proof-of-concept would go as follows:
You give the AI some goal, such as writing an analytical software intended to solve some task.
The AI, over the course of writing the codebase, runs into some non-trivial, previously unsolved mathematical problem. Some formulas need to be tweaked to work in the new context, or there’s some missing math theory that needs to be derived.
The AI doesn’t hallucinate solutions or swap-in the closest (and invalid) analogue. Instead, it correctly identifies that a problem exists, figures out how it can approach solving it, and goes about doing this.
As it’s deriving new theory, it sometimes runs into new sub-problems. Likewise, it doesn’t hallucinate solutions, but spins off some subtasks, and solves sub-problems in them.
Ideally, it even defines experiments or rigorous test procedures for fault-checking its theory empirically.
In the end, it derives a whole bunch of novel abstractions/functions/terminology, with layers of novel abstractions building up on the preceding layers of novel abstractions, and all of that is coherently optimized to fit into the broader software-engineering task it’s been given.
The software works. It doesn’t need to be bug-free, the theory doesn’t need to be perfect, but it needs to be about as good as a human programmer would’ve managed, and actually based on some novel derivations.
This seems like something an LLM, e. g. in an AutoGPT wrapper, should be able to do, if its base model is generally intelligent
I am a bit wary of reality Goodharting on this test, though. E. g., I can totally imagine some specific niche field in which an LLM, for some reason, can do this, but can’t do it anywhere else. Or some fuzziness around what counts as “novel math” being exploited – e. g., if the AI happens to hit upon re-applying extant math theory to a different field? Or, even more specifically, that there’s some specific research-engineering task that some LLM somewhere manages to ace, but in an one-off manner?
So I would fortify this a bit: individual or isolated instances don’t count. AIs should be broadly known to be able to engage in this sort of stuff. That should be happening frequently, without much optimization and tailoring made on the human end; about as easily as GPT-4 could be tasked to write a graduate-level essay.
It’s fine if they can’t do that for literally any field. But it should be a “blacklist” of fields, not a “whitelist” of fields.
So if we get an AI model that can do this, and it’s based on something relevantly similar to the current paradigm, and it doesn’t violate the LLM-style safety guarantees, I think that would be significant evidence against my model.
Maybe a more relevant concern I have with this is it feels like a “Can you write a symphony” type test to me. Like, there are very few people alive right now who could do the process you outline without any outside help, guidance, or prompting.
It really seems like there should be a lower bar to update though. Like, you say to consider humans as an existence proof of AGI, so likely your theory says something about humans. There must be some testable part of everyday human cognition which relies on this general algorithm, right?
Like, at the very least, what if we looked at fMRIs of human brains while they were engaging in all the tasks you laid out above, and looked at some similarity metric between the scans? You would probably expect there to be lots of similarity compared to, possibly, say Jacob Cannell or Quintin Pope’s predictions. Right?
Even if you don’t think one similarity metric could cover it, you should still be able to come up with some difference of predictions, even if not immediately right now.
Edit: Also I hope you forgive me for not asking for a prediction of this form earlier. It didn’t occur to me.
There must be some testable part of everyday human cognition which relies on this general algorithm, right?
Well, yes, but they’re of a hard-to-verify “this is how human cognition feels like it works” format. E. g., I sometimes talk about how humans seem to be able to navigate unfamiliar environments without experience, in a way that seems to disagree with baseline shard-theory predictions. But I don’t think that’s been persuading people not already inclined to this view.
Like, at the very least, what if we looked at fMRIs of human brains while they were engaging in all the tasks you laid out above, and looked at some similarity metric between the scans?
Hm, I guess something like this might work? Not sure regarding the precise operationalization, though.
If any of the others are particularly enthusiastic about this and expect it to be high-value, sure!
That said, I personally don’t expect it to be particularly productive.
These sorts of long-standing disagreements haven’t historically been resolvable via debate (the failure of Hanson vs. Yudkowsky is kind of foundational to the field).
I think there’s great value in having a public discussion nonetheless, but I think it’s in informing the readers’ models of what different sides believe.
Thus, inasmuch as we’re having a public discussion, I think it should be optimized for thoroughly laying out one’s points to the audience.
However, dialogues-as-a-feature seem to be more valuable to the participants, and are actually harder to grok for readers.
Thus, my preferred method for discussing this sort of stuff is to exchange top-level posts trying to refute each other (the way this post is, to a significant extent, a response to the AI is easy to control article), and then maybe argue a bit in the comments. But not to have a giant tedious top-level argument.
I’d actually been planning to make a post about the difficulties the “classical alignment views” have with making empirical predictions, and I guess I can prioritize it more?
But I’m overall pretty burned out on this sort of arguing. (And arguing about “what would count as empirical evidence for you?” generally feels like too-meta fake work, compared to just going out and trying to directly dredge up some evidence.)
Not sure what the relevance is? I don’t believe that “we possess innate (and presumably God-given) concepts that are independent of the senses”, to be clear. “Children won’t be able to instantly understand how to parse a new sense and map its feedback to the sensory modalities they’ve previously been familiar with, but they’ll grok it really fast with just a few examples” was my instant prediction upon reading the titular question.
I also not sure of the relevance and not following the thread fully, but the summary of that experiment is that it takes some time (measured in nights of sleep which are rough equivalent of big batch training updates) for the newly sighted to develop vision, but less time than infants—presumably because the newly sighted already have full functioning sensor inference world models in another modality that can speed up learning through dense top down priors.
But its way way more than “grok it really fast with just a few examples”—training their new visual systems still takes non-trivial training data & time
Though, admittedly, the prompt was to modify the original situation I presented, which had an output currently very difficult for any human to produce to begin with. So I don’t quite fault you for responding in kind.
Well, for what’s worth, I can write a symphony (following the traditional tonal rules), as this is actually mandated in order to pass some advanced composition classes. I think that letting the AI write a symphony without supervision and then make some composition professor evaluate it could actually be a very good test, because there’s no way a stochastic parrot could follow all the traditional rules correctly for more than a few seconds (an even better test would be to ask it to write a fugue on a given subject, whose rules are even more precise).
So I would fortify this a bit: individual or isolated instances don’t count. AIs should be broadly known to be able to engage in this sort of stuff. That should be happening frequently, without much optimization and tailoring made on the human end; about as easily as GPT-4 could be tasked to write a graduate-level essay.
I think sticking to this would make it difficult for you to update sooner. We should expect small approaches before large approaches here, and private solutions before publicly disclosed solutions.
Relatedly would DeepMind’s recent LLM mathematical proof paper if it were more general count? They give LLMs feedback via an evaluator function, exploiting the NP hard nature of a problem in combinatorics and bin packing (note: I have not read this paper in full).
Yep. The core thing here is iteration. If an AI can execute a whole research loop on its own – run into a problem it doesn’t know how to solve, figure out what it needs to learn to solve it, construct a research procedure for figuring that out, carry out that procedure, apply the findings, repeat – then research-as-a-whole begins to move at AI speeds. It doesn’t need to wait for a human to understand the findings and figure out where to point it next – it can go off and invent whole new fields at inhuman speeds.
Which means it can take off; we can meaningfully lose control of it. (Especially if it starts doing AI research itself.)
Conversely, if there’s a human in the loop, that’s a major bottleneck. As I’d mentioned in the post, I think LLMs and such AIs are a powerful technology, and greatly boosting human research speeds is something where they could contribute. But without a fully closed autonomous loop, that’s IMO not an omnicide risk.
That’s a point of disagreement: I don’t think GPT-N would be able to do it. I think this post by Nate Soares mostly covers the why. Oh, or this post by janus.
I don’t expect LLMs to be able to keep themselves “on-target” while choosing novel research topics or properly integrating their findings. That’s something you need proper context-aware consequentialist-y cognition for. It may seem trivial – see janus’ post pointing out that “steering” cognition basically amounts to just injecting a couple decision-bits at the right points – but that triviality is deceptive.
Ok, so if I get a future LLM to write the code to use standard genai tricks to generate novel designs in <area>, write a paper about the results, and the paper is seen as a major revolution in <area>, and this seems to not violate the assumptions Nora and Quintin are making during doom arguments, would this update you? What constraints do you want to put on <area>?
Nope, because of the “if I get a future LLM to [do the thing]” step. The relevant benchmark is the AI being able to do it on its own. Note also how your setup doesn’t involve the LLM autonomously iterating on its discovery, which I’d pointed out as the important part.
To expand on that:
Consider an algorithm that generates purely random text. If you have a system consisting of trillions of human uploads using it, each hitting “rerun” a million times per second, and then selectively publishing only the randomly-generated outputs that are papers containing important mathematical proofs – well, that’s going to generate novel discoveries sooner or later. But the load-bearing part isn’t the random-text algorithm, it’s the humans selectively amplifying those of its outputs that make sense.
LLM-based discoveries as you’ve proposed, I claim, would be broadly similar. LLMs have a better prior on important texts than a literal uniform distribution, and they could be prompted to further be more likely to generate something useful, which is why it won’t take trillions of uploads and millions of tries. But the load-bearing part isn’t the LLM, it’s the human deciding where to point its cognition and which result to amplify.
Paragraph intended as a costly signal I am in fact invested in this conversation, no need to actually read: Sorry for the low effort replies, but by its nature the info I want from you is more costly for you to give than for me to ask for. Thanks for the response, and hopefully thanks also for future responses.
I feel like I’d always be getting an LLM to do something. Like, if I get an LLM to do the field selection for me, does this work?
Maybe more open-endedly: what, concretely, is the closest thing to what I said that would make you update?
Oh, nice way to elicit the response you’re looking for!
The baseline proof-of-concept would go as follows:
You give the AI some goal, such as writing an analytical software intended to solve some task.
The AI, over the course of writing the codebase, runs into some non-trivial, previously unsolved mathematical problem. Some formulas need to be tweaked to work in the new context, or there’s some missing math theory that needs to be derived.
The AI doesn’t hallucinate solutions or swap-in the closest (and invalid) analogue. Instead, it correctly identifies that a problem exists, figures out how it can approach solving it, and goes about doing this.
As it’s deriving new theory, it sometimes runs into new sub-problems. Likewise, it doesn’t hallucinate solutions, but spins off some subtasks, and solves sub-problems in them.
Ideally, it even defines experiments or rigorous test procedures for fault-checking its theory empirically.
In the end, it derives a whole bunch of novel abstractions/functions/terminology, with layers of novel abstractions building up on the preceding layers of novel abstractions, and all of that is coherently optimized to fit into the broader software-engineering task it’s been given.
The software works. It doesn’t need to be bug-free, the theory doesn’t need to be perfect, but it needs to be about as good as a human programmer would’ve managed, and actually based on some novel derivations.
This seems like something an LLM, e. g. in an AutoGPT wrapper, should be able to do, if its base model is generally intelligent
I am a bit wary of reality Goodharting on this test, though. E. g., I can totally imagine some specific niche field in which an LLM, for some reason, can do this, but can’t do it anywhere else. Or some fuzziness around what counts as “novel math” being exploited – e. g., if the AI happens to hit upon re-applying extant math theory to a different field? Or, even more specifically, that there’s some specific research-engineering task that some LLM somewhere manages to ace, but in an one-off manner?
So I would fortify this a bit: individual or isolated instances don’t count. AIs should be broadly known to be able to engage in this sort of stuff. That should be happening frequently, without much optimization and tailoring made on the human end; about as easily as GPT-4 could be tasked to write a graduate-level essay.
It’s fine if they can’t do that for literally any field. But it should be a “blacklist” of fields, not a “whitelist” of fields.
So if we get an AI model that can do this, and it’s based on something relevantly similar to the current paradigm, and it doesn’t violate the LLM-style safety guarantees, I think that would be significant evidence against my model.
Maybe a more relevant concern I have with this is it feels like a “Can you write a symphony” type test to me. Like, there are very few people alive right now who could do the process you outline without any outside help, guidance, or prompting.
Yeah, it’s necessarily a high bar. See justification here.
I’m not happy about only being able to provide high-bar predictions like this, but it currently seems to me to be a territory-level problem.
It really seems like there should be a lower bar to update though. Like, you say to consider humans as an existence proof of AGI, so likely your theory says something about humans. There must be some testable part of everyday human cognition which relies on this general algorithm, right?
Like, at the very least, what if we looked at fMRIs of human brains while they were engaging in all the tasks you laid out above, and looked at some similarity metric between the scans? You would probably expect there to be lots of similarity compared to, possibly, say Jacob Cannell or Quintin Pope’s predictions. Right?
Even if you don’t think one similarity metric could cover it, you should still be able to come up with some difference of predictions, even if not immediately right now.
Edit: Also I hope you forgive me for not asking for a prediction of this form earlier. It didn’t occur to me.
Well, yes, but they’re of a hard-to-verify “this is how human cognition feels like it works” format. E. g., I sometimes talk about how humans seem to be able to navigate unfamiliar environments without experience, in a way that seems to disagree with baseline shard-theory predictions. But I don’t think that’s been persuading people not already inclined to this view.
The magical number 7±2 and the associated weirdness is also of the relevant genre.
Hm, I guess something like this might work? Not sure regarding the precise operationalization, though.
You willing to do a dialogue about predictions here with @jacob_cannell or @Quintin Pope or @Nora Belrose or others (also a question to those pinged)?
If any of the others are particularly enthusiastic about this and expect it to be high-value, sure!
That said, I personally don’t expect it to be particularly productive.
These sorts of long-standing disagreements haven’t historically been resolvable via debate (the failure of Hanson vs. Yudkowsky is kind of foundational to the field).
I think there’s great value in having a public discussion nonetheless, but I think it’s in informing the readers’ models of what different sides believe.
Thus, inasmuch as we’re having a public discussion, I think it should be optimized for thoroughly laying out one’s points to the audience.
However, dialogues-as-a-feature seem to be more valuable to the participants, and are actually harder to grok for readers.
Thus, my preferred method for discussing this sort of stuff is to exchange top-level posts trying to refute each other (the way this post is, to a significant extent, a response to the AI is easy to control article), and then maybe argue a bit in the comments. But not to have a giant tedious top-level argument.
I’d actually been planning to make a post about the difficulties the “classical alignment views” have with making empirical predictions, and I guess I can prioritize it more?
But I’m overall pretty burned out on this sort of arguing. (And arguing about “what would count as empirical evidence for you?” generally feels like too-meta fake work, compared to just going out and trying to directly dredge up some evidence.)
Not entirely sure what @Thane Ruthenis’ position is, but this feels like a maybe relevant piece of information: https://www.science.org/content/article/formerly-blind-children-shed-light-centuries-old-puzzle
Not sure what the relevance is? I don’t believe that “we possess innate (and presumably God-given) concepts that are independent of the senses”, to be clear. “Children won’t be able to instantly understand how to parse a new sense and map its feedback to the sensory modalities they’ve previously been familiar with, but they’ll grok it really fast with just a few examples” was my instant prediction upon reading the titular question.
I also not sure of the relevance and not following the thread fully, but the summary of that experiment is that it takes some time (measured in nights of sleep which are rough equivalent of big batch training updates) for the newly sighted to develop vision, but less time than infants—presumably because the newly sighted already have full functioning sensor inference world models in another modality that can speed up learning through dense top down priors.
But its way way more than “grok it really fast with just a few examples”—training their new visual systems still takes non-trivial training data & time
Though, admittedly, the prompt was to modify the original situation I presented, which had an output currently very difficult for any human to produce to begin with. So I don’t quite fault you for responding in kind.
Well, for what’s worth, I can write a symphony (following the traditional tonal rules), as this is actually mandated in order to pass some advanced composition classes. I think that letting the AI write a symphony without supervision and then make some composition professor evaluate it could actually be a very good test, because there’s no way a stochastic parrot could follow all the traditional rules correctly for more than a few seconds (an even better test would be to ask it to write a fugue on a given subject, whose rules are even more precise).
I think sticking to this would make it difficult for you to update sooner. We should expect small approaches before large approaches here, and private solutions before publicly disclosed solutions.
Relatedly would DeepMind’s recent LLM mathematical proof paper if it were more general count? They give LLMs feedback via an evaluator function, exploiting the NP hard nature of a problem in combinatorics and bin packing (note: I have not read this paper in full).