I have a bunch of loosely related and not fully fleshed out ideas for future posts.
In the spirit of 10 reasons why lists of 10 reasons might be a winning strategy, I’ve written some of them up as a list of facts / claims / predictions / takes. (Some of the explanations aren’t exactly “quick”, but you can just read the bold and move on if you find it uninteresting or unsurprising.)
If there’s interest, I might turn some of them into their own posts or expand on them in the comments here.
Computational complexity theory does not say anything practical about the bounds on AI (or human) capabilities. Results from computational complexity theory are mainly facts about the limiting behavior of deterministic, fully general solutions to parameterized problems. For example, if a problem is NP-hard (and P≠NP), that implies[1] that there is no deterministic algorithm anyone (even a superintelligence) can run, which accepts arbitrary instances of the problem and finds a solution in time steps polynomial in the size of the problem. But that doesn’t mean that any particular, non-parameterized instance of the problem cannot be solved some other way, e.g. by exploiting a regularity in the particular instance, using a heuristic or approximation or probabilistic solution, or that a human or AI can find a way of sidestepping the need to solve the problem entirely.
From here on, capabilities research won’t fizzle out (no more AI winters).I predict that the main bottleneck on AI capabilities progress going forward will be researcher time to think up, design, implement, and run experiments. In the recent past, the compute and raw scale of AI systems was simply too little for many potential algorithmic innovations to work at all. Now that we’re past that point, some non-zero fraction of new ideas that smart researchers think up and spend the time to test will “just work” at least somewhat, and these ideas will compound with other improvements in algorithms and scale. It’s not quite recursive or self improvement yet, but we’ve reached some kind of criticality threshold on progress which is likely to make things get weird, faster than expected.My own prediction for what one aspect of this might look like is here.
Scaling laws and their implications, e.g.Chinchilla, are facts about particular architectures and training algorithms. As a perhaps non-obvious implication, I predict that future AI capabilities research progress will not be limited much by the availability of compute and / or training data. A few frames from a webcam may or may not be enough for a superintelligence to deduce general relativity, but the entire corpus of the current internet is almost certainly more than enough to train a below-human-level AI up to superhuman levels, even if the AI has to start with algorithms designed entirely by human capabilities researchers. (The fact that much of the training data was generated by humans is not relevant as a capabilities bound of systems trained on that data.)
“Human-level” intelligence is actually a pretty wide spectrum. Somewhat contra the classic diagram, I think that intelligence in humans spans a pretty wide range, even in absolute terms. Here, I’m using a meaning of intelligence which is roughly, the ability to re-arrange matter and energy according to one’s whims. By this metric, the smartest humans can greatly outperform average or below-average humans. A couple of implications of this view:
An AI system that is only slightly superhuman might be capable of re-arranging most of the matter and energy in the visible universe arbitrarily.
Aiming for “human-level” AI systems is a pretty wide target, with wildly different implications depending on where in the human regime you hit. A misaligned super-genius is a lot scarier than a misaligned village idiot.
Goal-direction and abstract reasoning (at ordinary human levels) are very useful for next-token prediction. For example, if I want to predict the next token of text for the following prompts:
”The following is a transcript of a chess game played between two Stockfish 15 instances: 1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 O-O 5. Bf4 d5 6. “
or
”000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f is a SHA-1 hash of the following preimage: “
The strategy I could take that results in predicting tokens with the lowest loss probably involves things like spinning up a chess engine or searching for what process might plausibly have generated that string in the first place, as opposed to just thinking for a while and writing down whatever the language processing or memory modules in my brain come up with. SGD on regularly structured transformer networks may not actually hit on such strategies at any scale, but...
LLMs are evidence that abstract reasoning ability emerges as a side effect of solving any sufficiently hard and general problem with enough effort. Alternatively: abstract reasoning ability is a convergent solution to any sufficiently hard problem. Recent results with LLMs demonstrate that relatively straightforward methods applied at scales feasible with current human tech qualify as “enough effort”. Although LLMs are still probably far below human-level at abstract reasoning ability, the fact that they show signs of doing such reasoning at all implies that hitting on abstract reasoning as a problem-solving strategy is somewhat easier than most people would have predicted just 10 or 15 years ago.
I think this is partially what Eliezer means when he claims that “reality was far to the Eliezer side of Eliezer on the Eliezer-Robin axis”. Eliezer predicted at the time that general abstract reasoning was easy to develop and scale, relative to Robin. But even Eliezer thought you would still need some kind of understanding of the actual underlying cognitive algorithms to initially bootstrap from, using GOFAI methods, complicated architectures / training processes, etc. It turns out that just applying SGD to regularly-structured networks at non-planet-consuming scales to the problem of text prediction is sufficient to hit on (weak versions of) such algorithms incidentally!
The difficulty required to think up algorithms needed to learn and do abstract reasoning is upper bounded by evolution. Evolution managed to discover, design, and pack general-purpose reasoning algorithms (and / or the process for learning them during a single human lifetime) into a 10 W, 1000 cm3 box through iterative mutation. It might or might not take a lot of compute to implement and run such algorithms in silicon, but they can’t be too complicated in an absolute sense.
The amount ofcomputerequired to run and develop such algorithms can be approximated by comparing current AI systems to components of the brain. I’m usually skeptical of biology-based AI timelines (e.g. for the reasons described here, and my own thoughts here), but I think there’s at least one comparison method that can be useful. Suppose you take all the high-level tasks that current AI systems can do at roughly human levels or above (speech and audio processing, language and text processing, vision, etc.), and determine what fraction of neurons and energy the brain uses to carry out those tasks. For example, a quick search of Wikipedia gives an estimate of ~280 million neurons in the visual cortex. Suppose you add up all the neurons dedicated (mostly) to doing things that AI systems can already do individually, and find that this amounts to, say, 20% of the total neurons in the brain (approximately 80 billion).
Such an estimate, if accurate, would imply that the compute and energy requirements for human-level AGI are roughly approximated by scaling current AI systems by 5x. Of course, the training algorithms, network architectures, and interconnects needed to implement the abstract reasoning carried out in the frontal cortex might be more difficult to discover and implement than the algorithms and training methods required to implement vision or text processing. But by (7), these algorithms can’t be too hard to discover, since evolution did so, and by (2), we’re likely to continue seeing steady, compounding algorithmic advances from here until the end.
The advantage of such an estimation approach is it doesn’t require figuring out how the brain or current AI systems are solving these tasks or drawing any comparisons between them other than their high-level performance characteristics, nor does it require trying to figure out how many FLOP-equivalent operations the brain is doing, or how many of those are useful. In the method of comparison above, neurons (or energy) are just a way of roughly quantifying the fraction of the brain’s resources allocated to these high-level tasks.
One of the purposes of studying some problems in decision theory, embedded agency, and agent foundations is to be able to recognize and avoid having the first AGI systems solve those problems. For example, if interpretability research tells you that your system is doing anything internally that looks like logical decision theory, or if it starts making any decisions whatsoever for which evidential, causal, and logical decision theories do not give answers which are all trivially equivalent, it’s probably time to halt, melt, and catch fire.
IMO, the first AGI systems should be narrowly focused on solving problems in nanotech, biotech, and computer security. Such problems do not obviously require a deep understanding of decision theory or embedded agency, but an AGI may run into solutions to such problems as a side effect of being generally intelligent. Past a certain intelligence level, solving embedded agency explicitly and fully is probably unavoidable, but to the extent that we can, we should try to detect and delay having an AGI develop such an understanding for as long as possible.
Even at human level, 99% honesty for AI isn’t good enough. (A fleshed-out version of this take would be my reply to @HoldenKarnofsky’s latest comment in the thread here.) I think instilling both a reliable habit of being honest and a general (perhaps deontological) policy of being honest are not sufficient for safety in human-level AI or literal humans. To see why, consider Honest Herb, a human who cultivates a habit of being honest by default, and of avoiding white lies in his day-to-day life. For higher-stakes situations or more considered decisions where Herb might be tempted to deceive, he also has a deontological rule against lying, which he tries hard to stick to even when it seems like honesty is sub-optimal under consequentialist reasoning, and even when (he thinks) he has considered all knock-on effects.
But this deontological rule is not absolute: if Herb were, for example, a prisoner of hostile aliens, the aliens might observe his behavior or the internal workings of his brain to verify that he actually has such habits and a deontological policy that he sticks to under all circumstances that they can observe. But it is exactly the 0.1% of cases that the aliens cannot observe that might allow Herb to escape. When the stakes are sufficiently high, and Herb is sufficiently confident in his own consequentialist-based reasoning, he will break his deontological policy against deception in order to win his freedom. I expect human-level AI systems to be similar to humans in this regard, and for this to hold even if interpretability research catches up to the point where it can actually “read the mind” of AI systems on a very deep level.
10 quick takes about AGI
I have a bunch of loosely related and not fully fleshed out ideas for future posts.
In the spirit of 10 reasons why lists of 10 reasons might be a winning strategy, I’ve written some of them up as a list of facts / claims / predictions / takes. (Some of the explanations aren’t exactly “quick”, but you can just read the bold and move on if you find it uninteresting or unsurprising.)
If there’s interest, I might turn some of them into their own posts or expand on them in the comments here.
Computational complexity theory does not say anything practical about the bounds on AI (or human) capabilities. Results from computational complexity theory are mainly facts about the limiting behavior of deterministic, fully general solutions to parameterized problems. For example, if a problem is NP-hard (and P≠NP), that implies[1] that there is no deterministic algorithm anyone (even a superintelligence) can run, which accepts arbitrary instances of the problem and finds a solution in time steps polynomial in the size of the problem. But that doesn’t mean that any particular, non-parameterized instance of the problem cannot be solved some other way, e.g. by exploiting a regularity in the particular instance, using a heuristic or approximation or probabilistic solution, or that a human or AI can find a way of sidestepping the need to solve the problem entirely.
Claims like “ideal utility maximisation is computationally intractable” or “If just one step in this plan is incomputable, the whole plan is as well.” are thus somewhat misleading, or at least missing a step in their reasoning about why such claims are relevant as a bound on human or AI capabilities. My own suspicion is that when one attempts to repair these claims by making them more precise, it becomes clear that results from computational complexity theory are mostly irrelevant.
From here on, capabilities research won’t fizzle out (no more AI winters). I predict that the main bottleneck on AI capabilities progress going forward will be researcher time to think up, design, implement, and run experiments. In the recent past, the compute and raw scale of AI systems was simply too little for many potential algorithmic innovations to work at all. Now that we’re past that point, some non-zero fraction of new ideas that smart researchers think up and spend the time to test will “just work” at least somewhat, and these ideas will compound with other improvements in algorithms and scale. It’s not quite recursive or self improvement yet, but we’ve reached some kind of criticality threshold on progress which is likely to make things get weird, faster than expected. My own prediction for what one aspect of this might look like is here.
Scaling laws and their implications, e.g. Chinchilla, are facts about particular architectures and training algorithms. As a perhaps non-obvious implication, I predict that future AI capabilities research progress will not be limited much by the availability of compute and / or training data. A few frames from a webcam may or may not be enough for a superintelligence to deduce general relativity, but the entire corpus of the current internet is almost certainly more than enough to train a below-human-level AI up to superhuman levels, even if the AI has to start with algorithms designed entirely by human capabilities researchers. (The fact that much of the training data was generated by humans is not relevant as a capabilities bound of systems trained on that data.)
“Human-level” intelligence is actually a pretty wide spectrum. Somewhat contra the classic diagram, I think that intelligence in humans spans a pretty wide range, even in absolute terms. Here, I’m using a meaning of intelligence which is roughly, the ability to re-arrange matter and energy according to one’s whims. By this metric, the smartest humans can greatly outperform average or below-average humans. A couple of implications of this view:
An AI system that is only slightly superhuman might be capable of re-arranging most of the matter and energy in the visible universe arbitrarily.
Aiming for “human-level” AI systems is a pretty wide target, with wildly different implications depending on where in the human regime you hit. A misaligned super-genius is a lot scarier than a misaligned village idiot.
Goal-direction and abstract reasoning (at ordinary human levels) are very useful for next-token prediction. For example, if I want to predict the next token of text for the following prompts:
”The following is a transcript of a chess game played between two Stockfish 15 instances: 1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 O-O 5. Bf4 d5 6. “
or
”000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f is a SHA-1 hash of the following preimage: “
The strategy I could take that results in predicting tokens with the lowest loss probably involves things like spinning up a chess engine or searching for what process might plausibly have generated that string in the first place, as opposed to just thinking for a while and writing down whatever the language processing or memory modules in my brain come up with. SGD on regularly structured transformer networks may not actually hit on such strategies at any scale, but...
LLMs are evidence that abstract reasoning ability emerges as a side effect of solving any sufficiently hard and general problem with enough effort. Alternatively: abstract reasoning ability is a convergent solution to any sufficiently hard problem. Recent results with LLMs demonstrate that relatively straightforward methods applied at scales feasible with current human tech qualify as “enough effort”. Although LLMs are still probably far below human-level at abstract reasoning ability, the fact that they show signs of doing such reasoning at all implies that hitting on abstract reasoning as a problem-solving strategy is somewhat easier than most people would have predicted just 10 or 15 years ago.
I think this is partially what Eliezer means when he claims that “reality was far to the Eliezer side of Eliezer on the Eliezer-Robin axis”. Eliezer predicted at the time that general abstract reasoning was easy to develop and scale, relative to Robin. But even Eliezer thought you would still need some kind of understanding of the actual underlying cognitive algorithms to initially bootstrap from, using GOFAI methods, complicated architectures / training processes, etc. It turns out that just applying SGD to regularly-structured networks at non-planet-consuming scales to the problem of text prediction is sufficient to hit on (weak versions of) such algorithms incidentally!
The difficulty required to think up algorithms needed to learn and do abstract reasoning is upper bounded by evolution. Evolution managed to discover, design, and pack general-purpose reasoning algorithms (and / or the process for learning them during a single human lifetime) into a 10 W, 1000 cm3 box through iterative mutation. It might or might not take a lot of compute to implement and run such algorithms in silicon, but they can’t be too complicated in an absolute sense.
The amount of compute required to run and develop such algorithms can be approximated by comparing current AI systems to components of the brain. I’m usually skeptical of biology-based AI timelines (e.g. for the reasons described here, and my own thoughts here), but I think there’s at least one comparison method that can be useful. Suppose you take all the high-level tasks that current AI systems can do at roughly human levels or above (speech and audio processing, language and text processing, vision, etc.), and determine what fraction of neurons and energy the brain uses to carry out those tasks. For example, a quick search of Wikipedia gives an estimate of ~280 million neurons in the visual cortex. Suppose you add up all the neurons dedicated (mostly) to doing things that AI systems can already do individually, and find that this amounts to, say, 20% of the total neurons in the brain (approximately 80 billion).
Such an estimate, if accurate, would imply that the compute and energy requirements for human-level AGI are roughly approximated by scaling current AI systems by 5x. Of course, the training algorithms, network architectures, and interconnects needed to implement the abstract reasoning carried out in the frontal cortex might be more difficult to discover and implement than the algorithms and training methods required to implement vision or text processing. But by (7), these algorithms can’t be too hard to discover, since evolution did so, and by (2), we’re likely to continue seeing steady, compounding algorithmic advances from here until the end.
The advantage of such an estimation approach is it doesn’t require figuring out how the brain or current AI systems are solving these tasks or drawing any comparisons between them other than their high-level performance characteristics, nor does it require trying to figure out how many FLOP-equivalent operations the brain is doing, or how many of those are useful. In the method of comparison above, neurons (or energy) are just a way of roughly quantifying the fraction of the brain’s resources allocated to these high-level tasks.
One of the purposes of studying some problems in decision theory, embedded agency, and agent foundations is to be able to recognize and avoid having the first AGI systems solve those problems. For example, if interpretability research tells you that your system is doing anything internally that looks like logical decision theory, or if it starts making any decisions whatsoever for which evidential, causal, and logical decision theories do not give answers which are all trivially equivalent, it’s probably time to halt, melt, and catch fire.
IMO, the first AGI systems should be narrowly focused on solving problems in nanotech, biotech, and computer security. Such problems do not obviously require a deep understanding of decision theory or embedded agency, but an AGI may run into solutions to such problems as a side effect of being generally intelligent. Past a certain intelligence level, solving embedded agency explicitly and fully is probably unavoidable, but to the extent that we can, we should try to detect and delay having an AGI develop such an understanding for as long as possible.
Even at human level, 99% honesty for AI isn’t good enough. (A fleshed-out version of this take would be my reply to @HoldenKarnofsky’s latest comment in the thread here.) I think instilling both a reliable habit of being honest and a general (perhaps deontological) policy of being honest are not sufficient for safety in human-level AI or literal humans. To see why, consider Honest Herb, a human who cultivates a habit of being honest by default, and of avoiding white lies in his day-to-day life. For higher-stakes situations or more considered decisions where Herb might be tempted to deceive, he also has a deontological rule against lying, which he tries hard to stick to even when it seems like honesty is sub-optimal under consequentialist reasoning, and even when (he thinks) he has considered all knock-on effects.
But this deontological rule is not absolute: if Herb were, for example, a prisoner of hostile aliens, the aliens might observe his behavior or the internal workings of his brain to verify that he actually has such habits and a deontological policy that he sticks to under all circumstances that they can observe. But it is exactly the 0.1% of cases that the aliens cannot observe that might allow Herb to escape. When the stakes are sufficiently high, and Herb is sufficiently confident in his own consequentialist-based reasoning, he will break his deontological policy against deception in order to win his freedom. I expect human-level AI systems to be similar to humans in this regard, and for this to hold even if interpretability research catches up to the point where it can actually “read the mind” of AI systems on a very deep level.
at most; exotic possibilities of quantum mechanics might make the implications of current results even weaker