10 quick takes about AGI
I have a bunch of loosely related and not fully fleshed out ideas for future posts.
In the spirit of 10 reasons why lists of 10 reasons might be a winning strategy, I’ve written some of them up as a list of facts / claims / predictions / takes. (Some of the explanations aren’t exactly “quick”, but you can just read the bold and move on if you find it uninteresting or unsurprising.)
If there’s interest, I might turn some of them into their own posts or expand on them in the comments here.
Computational complexity theory does not say anything practical about the bounds on AI (or human) capabilities. Results from computational complexity theory are mainly facts about the limiting behavior of deterministic, fully general solutions to parameterized problems. For example, if a problem is NP-hard (and ), that implies[1] that there is no deterministic algorithm anyone (even a superintelligence) can run, which accepts arbitrary instances of the problem and finds a solution in time steps polynomial in the size of the problem. But that doesn’t mean that any particular, non-parameterized instance of the problem cannot be solved some other way, e.g. by exploiting a regularity in the particular instance, using a heuristic or approximation or probabilistic solution, or that a human or AI can find a way of sidestepping the need to solve the problem entirely.
Claims like “ideal utility maximisation is computationally intractable” or “If just one step in this plan is incomputable, the whole plan is as well.” are thus somewhat misleading, or at least missing a step in their reasoning about why such claims are relevant as a bound on human or AI capabilities. My own suspicion is that when one attempts to repair these claims by making them more precise, it becomes clear that results from computational complexity theory are mostly irrelevant.From here on, capabilities research won’t fizzle out (no more AI winters). I predict that the main bottleneck on AI capabilities progress going forward will be researcher time to think up, design, implement, and run experiments. In the recent past, the compute and raw scale of AI systems was simply too little for many potential algorithmic innovations to work at all. Now that we’re past that point, some non-zero fraction of new ideas that smart researchers think up and spend the time to test will “just work” at least somewhat, and these ideas will compound with other improvements in algorithms and scale. It’s not quite recursive or self improvement yet, but we’ve reached some kind of criticality threshold on progress which is likely to make things get weird, faster than expected. My own prediction for what one aspect of this might look like is here.
Scaling laws and their implications, e.g. Chinchilla, are facts about particular architectures and training algorithms. As a perhaps non-obvious implication, I predict that future AI capabilities research progress will not be limited much by the availability of compute and / or training data. A few frames from a webcam may or may not be enough for a superintelligence to deduce general relativity, but the entire corpus of the current internet is almost certainly more than enough to train a below-human-level AI up to superhuman levels, even if the AI has to start with algorithms designed entirely by human capabilities researchers. (The fact that much of the training data was generated by humans is not relevant as a capabilities bound of systems trained on that data.)
“Human-level” intelligence is actually a pretty wide spectrum. Somewhat contra the classic diagram, I think that intelligence in humans spans a pretty wide range, even in absolute terms. Here, I’m using a meaning of intelligence which is roughly, the ability to re-arrange matter and energy according to one’s whims. By this metric, the smartest humans can greatly outperform average or below-average humans. A couple of implications of this view:
An AI system that is only slightly superhuman might be capable of re-arranging most of the matter and energy in the visible universe arbitrarily.
Aiming for “human-level” AI systems is a pretty wide target, with wildly different implications depending on where in the human regime you hit. A misaligned super-genius is a lot scarier than a misaligned village idiot.
Goal-direction and abstract reasoning (at ordinary human levels) are very useful for next-token prediction. For example, if I want to predict the next token of text for the following prompts:
”The following is a transcript of a chess game played between two Stockfish 15 instances: 1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 O-O 5. Bf4 d5 6. “
or
”000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f is a SHA-1 hash of the following preimage: “
The strategy I could take that results in predicting tokens with the lowest loss probably involves things like spinning up a chess engine or searching for what process might plausibly have generated that string in the first place, as opposed to just thinking for a while and writing down whatever the language processing or memory modules in my brain come up with. SGD on regularly structured transformer networks may not actually hit on such strategies at any scale, but...LLMs are evidence that abstract reasoning ability emerges as a side effect of solving any sufficiently hard and general problem with enough effort. Alternatively: abstract reasoning ability is a convergent solution to any sufficiently hard problem. Recent results with LLMs demonstrate that relatively straightforward methods applied at scales feasible with current human tech qualify as “enough effort”. Although LLMs are still probably far below human-level at abstract reasoning ability, the fact that they show signs of doing such reasoning at all implies that hitting on abstract reasoning as a problem-solving strategy is somewhat easier than most people would have predicted just 10 or 15 years ago.
I think this is partially what Eliezer means when he claims that “reality was far to the Eliezer side of Eliezer on the Eliezer-Robin axis”. Eliezer predicted at the time that general abstract reasoning was easy to develop and scale, relative to Robin. But even Eliezer thought you would still need some kind of understanding of the actual underlying cognitive algorithms to initially bootstrap from, using GOFAI methods, complicated architectures / training processes, etc. It turns out that just applying SGD to regularly-structured networks at non-planet-consuming scales to the problem of text prediction is sufficient to hit on (weak versions of) such algorithms incidentally!The difficulty required to think up algorithms needed to learn and do abstract reasoning is upper bounded by evolution. Evolution managed to discover, design, and pack general-purpose reasoning algorithms (and / or the process for learning them during a single human lifetime) into a 10 W, 1000 cm box through iterative mutation. It might or might not take a lot of compute to implement and run such algorithms in silicon, but they can’t be too complicated in an absolute sense.
The amount of compute required to run and develop such algorithms can be approximated by comparing current AI systems to components of the brain. I’m usually skeptical of biology-based AI timelines (e.g. for the reasons described here, and my own thoughts here), but I think there’s at least one comparison method that can be useful. Suppose you take all the high-level tasks that current AI systems can do at roughly human levels or above (speech and audio processing, language and text processing, vision, etc.), and determine what fraction of neurons and energy the brain uses to carry out those tasks. For example, a quick search of Wikipedia gives an estimate of ~280 million neurons in the visual cortex. Suppose you add up all the neurons dedicated (mostly) to doing things that AI systems can already do individually, and find that this amounts to, say, 20% of the total neurons in the brain (approximately 80 billion).
Such an estimate, if accurate, would imply that the compute and energy requirements for human-level AGI are roughly approximated by scaling current AI systems by 5x. Of course, the training algorithms, network architectures, and interconnects needed to implement the abstract reasoning carried out in the frontal cortex might be more difficult to discover and implement than the algorithms and training methods required to implement vision or text processing. But by (7), these algorithms can’t be too hard to discover, since evolution did so, and by (2), we’re likely to continue seeing steady, compounding algorithmic advances from here until the end.
The advantage of such an estimation approach is it doesn’t require figuring out how the brain or current AI systems are solving these tasks or drawing any comparisons between them other than their high-level performance characteristics, nor does it require trying to figure out how many FLOP-equivalent operations the brain is doing, or how many of those are useful. In the method of comparison above, neurons (or energy) are just a way of roughly quantifying the fraction of the brain’s resources allocated to these high-level tasks.One of the purposes of studying some problems in decision theory, embedded agency, and agent foundations is to be able to recognize and avoid having the first AGI systems solve those problems. For example, if interpretability research tells you that your system is doing anything internally that looks like logical decision theory, or if it starts making any decisions whatsoever for which evidential, causal, and logical decision theories do not give answers which are all trivially equivalent, it’s probably time to halt, melt, and catch fire.
IMO, the first AGI systems should be narrowly focused on solving problems in nanotech, biotech, and computer security. Such problems do not obviously require a deep understanding of decision theory or embedded agency, but an AGI may run into solutions to such problems as a side effect of being generally intelligent. Past a certain intelligence level, solving embedded agency explicitly and fully is probably unavoidable, but to the extent that we can, we should try to detect and delay having an AGI develop such an understanding for as long as possible.Even at human level, 99% honesty for AI isn’t good enough. (A fleshed-out version of this take would be my reply to @HoldenKarnofsky’s latest comment in the thread here.) I think instilling both a reliable habit of being honest and a general (perhaps deontological) policy of being honest are not sufficient for safety in human-level AI or literal humans. To see why, consider Honest Herb, a human who cultivates a habit of being honest by default, and of avoiding white lies in his day-to-day life. For higher-stakes situations or more considered decisions where Herb might be tempted to deceive, he also has a deontological rule against lying, which he tries hard to stick to even when it seems like honesty is sub-optimal under consequentialist reasoning, and even when (he thinks) he has considered all knock-on effects.
But this deontological rule is not absolute: if Herb were, for example, a prisoner of hostile aliens, the aliens might observe his behavior or the internal workings of his brain to verify that he actually has such habits and a deontological policy that he sticks to under all circumstances that they can observe. But it is exactly the 0.1% of cases that the aliens cannot observe that might allow Herb to escape. When the stakes are sufficiently high, and Herb is sufficiently confident in his own consequentialist-based reasoning, he will break his deontological policy against deception in order to win his freedom. I expect human-level AI systems to be similar to humans in this regard, and for this to hold even if interpretability research catches up to the point where it can actually “read the mind” of AI systems on a very deep level.
- ^
at most; exotic possibilities of quantum mechanics might make the implications of current results even weaker
- 18 Aug 2023 3:34 UTC; 4 points) 's comment on ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks by (
I love the format!
agree completely
agree completely
Capabilities won’t be limited by compute OR data? So capabilities will be purely limited by algorithmic improvements? This seems extreme; it seems like better (if not more) data is still useful, and more compute seems highly relevant today.
I agree that a human supergenius that just happens to be able to duplicate themselves and think faster with more compute could probably rearrange all the matter in its light cone. A village idiot could not. And I agree that humans are way smarter at some tasks than others. I think this makes discussion of how smart machines can get pretty irrelevant, and it makes the discussion of how fast they can get smarter less relevant. They’re almost smart enough to take over, and it will take less than a generation (generously) to get there. We need alignment solutions ASAP. It could already be too late to implement solutions that don’t apply to current directions in AI.
Agree. Goal direction and abstract reasoning are highly useful for almost any cognitive task. Goal direction allows you to split the task into simpler components and work on them separately. Abstract reasoning saves tons of computation and corresponds to the structure of the world.
I think you need to say complex cognition emerges on difficult tasks with enough effort, enough data variety, and good learning algorithms. How much is enough in each category is debatable, and they clearly trade off against each other.
Yes, but I don’t think evolution is a very useful upper bound on difficulty of thinking up algorithms. Evolution “thinks” completely differently than us or any agent. And it’s had an unimaginably vast amount of “compute” in physical systems to work with over time.
Yes, but again the comparison is of dubious use. There are a lot of molecules in the brain, and if count each receptor as a calculation (which it is), we’d get a very high compute requirement. It’s clearly lower, but how much lower is debatable and debated. I’d rather reason from GPT4 and other existing deep networks directly. GPT4 is AGI without memory, executive function, or sensory systems. Each of those can be estimated from existing systems. My answer would also be about 5x GPT4, except for the likely algorithmic improvements.
Agreed, but every time someone says how we “should” build AGI or what we “should” allow it to do, I tune out if it’s not followed immediately by an idea of how we COULD convince the world to take this path. Having AI try doing everything is tempting from both an economic and curiosity standpoint.
Absolutely. This is the sharp left turn logic, nicely spelled out in A Friendly Face (Another Failure Story). Honesty prior to contextual awareness might be encouraging, but I really don’t want to launch any system without strong corrigibility and interpretability. I’m aware we might have to try aligning such a system, but we really should be shooting for better options that align with the practicalities of AI development.
On (3), I’m more saying, capabilities won’t be bottlenecked on more data or compute. Before, say, 2019 (GPT2 release) AI researchers weren’t actually using enough data or compute for many potential algorithmic innovations to be relevant, regardless of what was available theoretically at the time.
But now that we’re past a minimum threshold of using enough compute and data where lots of things have started working at all, I claim / predict that capabilities researchers will always be able to make meaningful and practical advances just by improving algorithms. More compute and more data could also be helpful, but I consider that to be kind of trivial—you can always get better performance from a Go or Chess engine by letting it run for longer to search deeper in the game tree by brute force.
A few billion years of very wasteful and inefficient trial-and-error by gradual mutation on a single planet doesn’t seem too vast, in the grand scheme of things. Most of the important stuff (in terms of getting to human-level intelligence) probably happened in the last few million years. Maybe it takes planet-scale or even solar-system scale supercomputers running for a few years to reproduce / simulate. I would bet that it doesn’t take anything galaxy-scale.
On (9): yeah, I was mainly just pointing out a potentially non-obvious use and purpose of some research that people sometimes don’t see the relevance of. Kind of straw, but I think that some people look at e.g. logical decision theory, and say “how the heck am I supposed to build this into an ML model? I can’t, therefore this is not relevant.”
And one reply is that you don’t build it in directly: a smart enough AI system will hit on LDT (or something better) all by itself. We thus want to understand LDT (and other problems in agent foundations) so that we can get out in front of that and see it coming.
What do you think about combinatorial explosion as a possible soft limit to the power of intelligence? Sure, we can fight such explosion by different ways of cheating, like high-level planning or using neural nets to predict most promising branches. But the main idea is that intelligence will eventually boil down to searching the best answer by trying—like evolution does.
How much work is “eventually” doing in that sentence, is my question. We already have machine learning systems in some fields (pharma, materials science) that greatly reduce the number of experiments researchers need to conduct to achieve a goal or get an answer. How low does the bound need to get?
I see a lot of discussion and speculation about “there’s no way to get this right on the first try even for a superintelligence” but I don’t think that’s the right constraint unless you’ve already somehow contained the system in a way that only allows it a single shot to attempt something. In which case, you’re most of the way to full containment anyway. Otherwise, the system may require additional trials/data/feedback, and will be able to get them, with many fewer such attempts than a human would need.
No one doubts that an ASI would have an easier time executing its plans than we could imagine but the popular claim is one-shot.
Yes, but I think it’s important that when someone says, “Well I think one-shotting X is impossible at any level of intelligence,” you can reply, “Maybe, but that doesn’t really help solve the not-dying problem, which is the part that I care about.”
I think the harder the theoretical doom plan it is the easier it is to control at least until alignment research catches up. It’s important because obsessing over unlikely scenarios that make the problem harder than it is can exclude potential solutions.
I do think it’s plausible that e.g. nanotech requires some amount of trial-and-error or experimentation, even for a superintelligence. But such experimentation could be done quickly or cheaply.
Evolution is a pretty dumb optimization process; ordinary human level intelligence is more than enough to surpass its optimization power with OOM less trial and error.
For example, designing an internal combustion engine or a CPU requires solving some problems which might run into combinatorial explosions, if your strategy is to just try a bunch of different designs until you find one that works. But humans manage to design engines and CPUs and many other things that evolution couldn’t do with billions of years of trial and error.
There might be some practical problems for which combinatorial explosion or computational hardness imposes a hard limit on the capabilities of intelligence. For example, I expect there are cryptographic algorithms that even a superintelligence won’t be able to break.
But I doubt that such impossibilities translate into practical limits—what does it matter if a superintelligence can’t crack the keys to your bitcoin wallet, if it can just directly disassemble you and your computer into their constituent atoms?
Maybe developing disassembling technology itself unavoidably requires solving some fundamentally intractable problem. But I think human success at various design problems is at least weak evidence that this isn’t true. If you didn’t know the answer in advance, and you had to guess whether it was possible to design a modern CPU without intractable amounts of trial and error, you might guess no.
It’s very difficult to argue with most of the other claims if the base assumption is that this sort of technology is a) possible b) in one or few shots c) with reasonable for the planet compute.
I’ve asked the mods to enable inline reacts for comments on this post.
If the feature is enabled(Now enabled!) You can react to each of the takes individually by highlighting them in the list below.Computational complexity theory does not say anything practical about the bounds on AI (or human) capabilities.
From here on, capabilities research won’t fizzle out (no more AI winters).
Scaling laws and their implications, e.g. Chinchilla, are facts about particular architectures and training algorithms.
“Human-level” intelligence is actually a pretty wide spectrum.
Goal-direction and abstract reasoning (at ordinary human levels) are very useful for next-token prediction.
LLMs are evidence that abstract reasoning ability emerges as a side effect of solving any sufficiently hard and general problem with enough effort.
The difficulty required to think up algorithms needed to learn and do abstract reasoning is upper bounded by evolution.
The amount of compute required to run and develop such algorithms can be approximated by comparing current AI systems to components of the brain.
One of the purposes of studying some problems in decision theory, embedded agency, and agent foundations is to be able to recognize and avoid having the first AGI systems solve those problems.
Even at human level, 99% honesty for AI isn’t good enough.
This implies it could be impossible to know with given parameters and a function what the arg max is. I don’t see how you can get around this. I don’t understand what you are ultimately concluding is the result from that statement being false or not related to computation.
This is probably typically the equivalent of using a probabilistic solution or approximation. Exploiting a regularity means you are using the average solution for that regularity. Or that regularity has already been solved/is computable. I think the latter is somewhat unlikely.
This introduces accumulated uncertainty.
The whole world model is a chaotic incomputable system. Example: how people will react to your plan, given no certain knowledge of which people will observer your plan
It will be hardware, as it has always been. The current paradigm makes it seem like algorithms progress faster than hardware does and is bottlenecked by hardware. I don’t see any evidence against this.
Unsure about this claim. How are you relating neuron count to current systems? I think the actual bio anchors claim puts it at ~2040. 5x puts it within the next 2 years. We are also rapidly approaching the limits of silicon.
Evolution discovering something doesn’t necessarily mean it is easy. Just that it’s possible. We also have way more energy than is provably required for human level AGI.
This claim seems fantastical. Ignoring the fact that we already have slightly superhuman narrow AI that don’t enable us to do any of that, and that groups of smart humans are slightly superhuman, the AI will have to reverse entropy and probably generate unlimited energy to do that.
A misaligned human level genius isn’t exactly scary either unless it’s directing us somehow and it gains a lot of influence. However history is full of these and yet we still stand.
Don’t see why this has to be the case. If language itself plays a role in development of abstract thought, then this could explain why LLMs in particular seem to have abstract thought.
This claim isn’t particularly convincing. A lying politician lies 70% of the time. A righteous politician lies 30% of the time. I am also pretty confused by the story. What exactly is it portraying?
One thing I’m saying is that arguments of the form: “doing X is computationally intractable, therefore a superintelligence won’t be able to do X” are typically using a loose, informal definition of “computationally intractable” which I suspect makes the arguments not go through. Usually because X is something like “build nanotech”, but the actual thing that is provably NP-hard is something more like “solve (in full generality) some abstracted modelling problem which is claimed to be required to build nanotech”.
Another thing is that even if X itself is computationally intractable, something epsilon different from X might not be. Non-rhetorical question: what does it matter if utility maximization is not “ideal”?
This is a great example. non-computability has a precise technical meaning. If you want to link this meaning to a claim about a plan in the physical world or limit on ability to model human behavior, you have to actually do the work to make the link in a precise and valid way. (I’ve never seen anyone do this convincingly.)
Another example is the halting problem. It’s provably undecidable, meaning there is no fully general algorithm that will tell you whether a program halts or not. And yet, for many important programs that I encounter in practice, I can tell at a glance whether they will halt or not, and prove it so. (In fact, under certain sampling / distribution assumptions, the halting problem is overwhelmingly likely to be solvable for a given randomly sampled program.)
Not really, for an extreme example, simulating the entire universe is computationally intractable because it would require more resources than in the universe. By intractable I personally just mean it requires much more effort to simulate/guess than experiment. An obvious real world example is cracking a hash like you said. None of this has to do with NP hardness, it’s more like an estimate on the raw computational power required, and whether that exceeds the computational resources theoretically available. Another factor is also the amount of uncertainty the calculation will necessarily have.
Nothing that reduces to X can be computationally intractable or X isn’t. I don’t see any way around adding uncertainty by computing X-e which differs from X. Again this is just simply stated as approximations. In a model to build nano tech, an easy way this pops up is if there are multiple valid hypotheses given raw data, but you need high powered tools to experimentally determine which one is right based on some unknown unknowns. (Say a constant that hasn’t been measured yet to some requisite precision). This isn’t too contrived as it pops up in the real world a lot.
Easy, lots of independent decision makers means there are an exponential number of states the entire system can be in. I’m sure there are also a lot of patterns in human behavior but humans are also diverse in their values and reactions that it would be difficult to know with certainty how some humans may react. Therefore you can only assign probabilities of what state an action might cause a system to transition to. A lot of these probabilities will be best guesses as well so there will be some sort of compounding error.
Concretely in a world domination plan, social engineering isn’t guaranteed to work. All it takes is for one of your actions to get the wrong reaction, i.e. evoke suspicion, and you get shut down. And it may be impossible to know for certain if you can avoid that ending.
I am not trying to relate it to computability theory, just showing that you get compounding uncertainty. Simulation of quantum dynamics of sufficient scale is an obvious example.
If you know whether a program falls in a set of of solvable programs, you’ve already solved the halting problem. I assume you still double check your logic. I also don’t see how you can prove anything for sufficiently large modules.
I don’t see why the halting problem is relevant here, nor do I see that paper proving anything about real world programs. It’s talking about arbitrary tapes and programs. I don’t see how it directly relates to real life programming or other problem classes.
And if you want to talk about NP hardness, it seems that decision makers regularly encounter NP hard problems. It’s not obvious that they’re avoidable or why they would be. I don’t see why an ASI would for example be able to avoid the hardness of determining optimal resource allocation. The only side step I can think of is not needing resources or using a suboptimal solution.
This paper seems to explain it better than I can hope to: https://royalsocietypublishing.org/doi/10.1098/rstb.2018.0138
Getting stuck in a local maxima. This seems to undermine the whole atoms line of thought.
I’m not; the main point of the comparison method I propose is that it sidesteps the need to relate neurons in the brain to operations in AI systems.
The relevant question is what fraction of the brain is used to carry out a high-level task like speech recognition or visual processing. How to accurately measure that fraction might involve considering the number of neurons, energy consumption, or synaptic operations in a particular region of the brain, as a percent of total brain capacity. But the comparison between brains and AI systems is only on overall performance characteristics at high-level tasks.
I haven’t actually done the research to estimate what fractions of the brain are used to do which tasks; 5x was just an example. But it wouldn’t surprise me if there is enough existing compute already lying around for AGI, given the right algorithms. (And further, that those algorithms are not too hard for current human researchers to discover, for reasons covered in this post and elsewhere.)
At best it may show we need a constant factor amount less than the brain has (I am highly doubtful about this claim) to reach its intelligence.
And no one disputes that we can get better than human performance at narrower tasks with worse than human compute. However, such narrow AI also augment human capabilities.
How exactly are you side stepping computation requirements? The brain is fairly efficient at what it has to do. I would be surprised if given the brains constraints, you could get more than 1 OOM more efficient. A brain also has much longer to learn.
Do you have any evidence for these claims? I don’t think your evolution argument is strong in proving that they’re easy to find. I am also not convinced that current hardware is enough. The brain is also far more efficient and parallel at approximate calculations than our current hardware. The exponential growth we’ve seen in model performance has always been accompanied by an exponential growth in hardware. The algorithms used are typically really simple which makes them scalable.
Maybe an algorithm can make the computer I’m using super intelligent but I highly doubt that.
Also I think it would be helpful to retract the numbers or at least say it’s just a guess.
I’d probably not claim this, or at least significantly limit the claim here, because of some new results on Transformers/LLMs, and apparently they don’t actually do multi-step reasoning, but instead develop short cuts, and importantly can’t implement recursive algorithms, which is really important.
Tweets below:
AK on Twitter: “Faith and Fate: Limits of Transformers on Compositionality Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly… https://t.co/lsEfo9trPR” / Twitter
Talia Ringer @ FCRC on Twitter: “New preprint just dropped! “Can Transformers Learn to Solve Problems Recursively?” With @dylanszzhang, @CurtTigges, @BlancheMinerva, @mraginsky, and @TaliaRinger. https://t.co/D13mD2Q7aq https://t.co/wqM2FPQEQ4″ / Twitter
This is actually a fairly important limitation of LLMs, and appears to sort of vindicate the LLM skeptics like Yann Lecun and Gary Marcus, in that LLMs don’t actually reason on multi-step problems all that well.
It seems like it’s easy to break this limitation by writing prompts that break a problem into pieces, then calling a new instance of the LLM to solve each piece and then to provide the answer given the step by step reasoning from previous prompts. The SmartGPT does something like this, and achieves vastly better performance on the logical reasoning benchmarks it’s been tested on.