That’s alright. Would you be able to articulate what you associate with AGI in general? For example, do you associate AGI with certain intellectual or physical capabilities, or do you associate it more with something like moral agency, personhood or consciousness?
tangerine
Thank you for the clarification!
Of course, it is much more likely to be predictable a couple of days in advance than a year in advance, but even the former may conceivably be quite challenging depending on situational awareness of near-human-level models in training.
Do I understand correctly that you think that we are likely to only recognize AGI after it has been built? If so, how would we recognize AGI as you define it?
Do you also think that AGI will result in a fast take-off?
What would you expect the world to look like if AGI < 2030? Or put another way, what evidence would convince you that AGI < 2030?
What do you make of feral children like Genie? While there are not many counterfactuals to cultural learning—probably mostly because depriving children of cultural learning is considered highly immoral—feral children do provide strong evidence that humans that are deprived of cultural learning do not come close to being functional adults. Additionally, it seems obvious that people who do not receive certain training, e.g., those who do not learn math or who do not learn carpentry, generally have low capability in that domain.
the genetic changes come first, then the cultural changes come after
You mean to say that the human body was virtually “finished evolving” 200,000 years ago, thereby laying the groundwork for cultural optimization which took over form that point? Henrich’s thesis of gene-culture coevolution contrasts with this view and I find it to be much more likely to be true. For example, the former thesis posits that humans lost a massive amount of muscle strength (relative to, say, chimpanzees) over many generations and only once that process had been virtually “completed”, started to compensate by throwing rocks or making spears when hunting other animals, requiring much less muscle strength than direct engagement. This begs the question, how did our ancestors survive in the time when muscle strength had already significantly decreased, but tool usage did not exist yet? Henrich’s thesis answers this by saying that such a time did not exist; throwing rocks came first, which provided the evolutionary incentive for our ancestors to expend less energy on growing muscles (since throwing rocks suffices for survival and requires less muscle strength). The subsequent invention of spears provided further incentive for muscles to grow even weaker.
There are many more examples to make that are like the one above. Perhaps the most important one is that as the amount of culture grows (also including things like rudimentary language and music), a larger brain has an advantage because it can learn more and more quickly (as also evidenced by the LLM scaling laws). Without culture, this evolutionary incentive for larger brains is much weaker. The incentive for larger brains leads to a number of other peculiarities specific to humans, such as premature birth, painful birth and fontanelles.
How do LLMs and the scaling laws make you update in this way? They make me update in the opposite direction. For example, I also believe that the human body is optimized for tool use and scaling, precisely because of the gene-culture coevolution that Henrich describes. Without culture, this optimization would not have occurred. Our bodies are cultural artifacts.
Cultural learning is an integral part of the scaling laws; the scaling laws show that indefinitely scaling the number of parameters in a model doesn’t quite work; the training data also has to scale, with the implication that that data is some kind of cultural artifact, where the quality of that artifact determines the capabilities of the resulting model. LLMs work because of the accumulated culture that goes into them. This is no less true for “thinking” models like o1 and o3, because the way they think is very heavily influenced by the training data. The fact that thinking models do so well is because thinking becomes possible at all, not because thinking is something inherently beyond the training data. These models can think because of the culture they absorbed, which includes a lot of examples of thinking. Moreover, the degree to which Reinforcement Learning determines the capabilities of thinking models is small compared to Supervised Learning, because, firstly, less compute is spent on RL than on SL, and, secondly, RL is much less sample-efficient than SL.
Current LLMs can only do sequential reasoning of any kind by adjusting their activations, not their weights, and this is probably not enough to derive and internalize new concepts à la C.
For me this is the key bit which makes me update towards your thesis.
This is indeed an interesting sociological breakdown of the “movement”, for lack of a better word.
I think the injection of the author’s beliefs about whether or not short timelines are correct distracting from the central point. For example, the author states the following.
there is no good argument for when [AGI] might be built.
This is a bad argument against worrying about short timelines, bordering on intellectual dishonesty. Building anti-asteroid defenses is a good idea even if you don’t know that one is going to hit us within the next year.
The argument that it’s better to have AGI appear sooner rather than later because institutions are slowly breaking down is an interesting one. It’s also nakedly accelerationist, which is strangely inconsistent with the argument that AGI is not coming soon, and in my opinion very naïve.
Besides that, I think it’s generally a good take on the state of the movement, i.e., like pretty much any social movement it has a serious problem with coherence and collateral damage and it’s not clear whether there’s any positive effect.
Ah, I see now. Thank you! I remember reading this discussion before and agree with your viewpoint that he is still directionally correct.
he apparently faked some of his evidence
Would be happy to hear more about this. Got any links? A quick Google search doesn’t turn up anything.
You talk about personhood in a moral and technical sense, which is important, but I think it’s important to also take into account the legal and economic senses of personhood. Let me try to explain.
I work for a company where there’s a lot of white-collar busywork going on. I’ve come to realize that the value of this busywork is not so much the work itself (indeed a lot of it is done by fresh graduates and interns with little to no experience), but the fact that the company can bear responsibility for the work due to its somehow good reputation (something something respectability cascades), i.e., “Nobody ever got fired for hiring them”. There is not a lot of incentive to automate any of this work, even though I can personally attest that there is a lot of low-hanging fruit. (A respected senior colleague of mine plainly stated to me, privately, that most of it is bullshit jobs.)
By my estimation, “bearing responsibility” in the legal and economic sense means that an entity can be punished, where being punished means that something happens which disincentivizes it and other entities from doing the same. (For what it’s worth, I think much of our moral and ethical intuitions about personhood can be derived from this definition.) AI cannot function as a person of any legal or economic consequence (and by extension, moral or ethical consequence) if it cannot be punished or learn in that way. I assume it will be able to eventually, but until then most of these bullshit jobs will stay virtually untouchable because someone needs to bear responsibility. How does one punish an API? Currently, we practically only punish the person serving the API or the person using it.
There are two ways I see to overcome this. One way is that AI eventually can act as a drop-in replacement for human agents in the sense that they can bear responsibility and be punished as described above. With the current systems this is clearly not (yet) the case.
The other way is that the combination of cost, speed and quality becomes too good to ignore, i.e., that we get to a point where we can say “Nobody ever got fired for using AI” (on a task-by-task basis). This depends on the trade-offs that we’re willing to make between the different aspects of using AI for a given task, such as cost, speed, quality, reliability and interpretability. This is already driving use of AI for some tasks where the trade-off is good enough, while for others it’s not nearly good enough or still too risky to try.
Reminds me of mimetic desire:
Man is the creature who does not know what to desire, and he turns to others in order to make up his mind. We desire what others desire because we imitate their desires.
However, I only subscribe to this theory insofar as it is supported by Joseph Henrich’s work, e.g., The Secret of Our Success, in which Henrich provides evidence that imitation (including imitation of desires) is the basis of human-level intelligence. (If you’re curious how that works, I highly recommend Scott Alexander’s book review.)
But we already knew that some people think AGI is near and others think it’s farther away!
And what do you conclude based on that?
I would say that as those early benchmarks (“can beat anyone at chess”, etc.) are achieved without producing what “feels like” AGI, people are forced to make their intuitions concrete, or anyway reckon with their old bad operationalizations of AGI.
The relation between the real world and our intuition is an interesting topic. When people’s intuitions are violated (e.g., the Turing test is passed but it doesn’t “feel like” AGI), there’s a temptation to try to make the real world fit the intuition, when it is more productive to accept that the intuition is wrong. That is, maybe achieving AGI doesn’t feel like you expect. But that can be a fine line to walk. In any case, privileging an intuitive map above the actual territory is about as close as you can get to a “cardinal sin” for someone who claims to be rational. (To be clear, I’m not saying you are doing that.)
They spend more time thinking about the concrete details of the trip, not because they know the trip is happening soon, but because some think the trip is happening soon. Disagreement on and attention to concrete details is driven by only some people saying that the current situation looks like, or is starting to look like the event occurring according to their interpretation. If the disagreement had happened at the beginning, they would soon have started using different words.
In the New York example, it could be that when someone says “Guys, we should really buy those Broadway tickets. The trip to New York is next month already.” they prompt the response “What? I thought we were going the month after!”, hence disagreement. If this detail had been discussed earlier, there might have been the “February trip” and the “March trip” in order to disambiguate the trip(s) to New York.
In the case of AGI, some people’s alarm bells are currently going off, prompting others to say that more capabilities are required. What seems to have happened is that people at one point latched on to the concept of AGI, thinking that their interpretation was virtually the same as those of others because of its lack of definition. Again, if they had disagreed with the definition to begin with, they would have used a different word altogether. Now that some people are claiming that AGI is here or here soon, it turns out that the interpretations do in fact differ. The most obnoxious cases are when people disagree with their own past interpretation once that interpretation is threatened to be satisfied, on the basis of some deeper, undefined intuition (or, in the case of OpenAI and Microsoft, ulterior motives). This of course is also known as “moving the goalposts”.
Once upon a time, not that long ago, AGI was interpreted by many as “it can beat anyone at chess”, “it can beat anyone at go” or “it can pass the Turing test”. We are there now, according to those interpretations.
Whether or not AGI exists depends only marginally on any one person’s interpretation. Words are a communicative tool and therefore depend on others’ interpretations. That is, the meanings of words don’t fall out of the sky; they don’t pass through a membrane from another reality. Instead, we define meaning collectively (and often unconsciously). For example, “What is intelligence?” is a question of how that word is in practice interpreted by other people. “How should it be interpreted (according to me personally)?” is a valid but different question.
The amount of contention says something about whether an event occurred according to the average interpretation. Whether it occurred according to your specific interpretation depends on how close that interpretation is to the average interpretation.
You can’t increase the probability of getting a million dollars by personally choosing to define a contentious event as you getting a million dollars.
I wouldn’t call either hypothesis invalid. People just use the same words to refer to different things. This is true for all words and hypotheses to some degree. When there is little to no contention that we’re not in New York, or that we don’t have AGI, or that the Second Coming hasn’t happened, then those differences are not apparent. But presumably there is some correlation between the different interpretations, such that when the Event does take place, contention rises to a degree that increases as that correlation decreases[1]. (Where by Event I mean some event that is semantically within some distance to the average interpretation[2].)
Formally, I say that , meaning is small, where can be considered a measure of how vaguely the term AGI is specified.
The more vaguely an event is specified, the more contention there is when the event takes place. Conversely, the more precisely an event is specified, the less contention there is when the event takes place. It’s kind of obvious when you think about it. Using Bayes’ law we can additionally say the following.
That is, when there is contention about whether a vaguely defined event such as AGI has occurred, your posterior probability should be high, modulated by your prior for AGI (the posterior monotonically decreases with the prior). I think it’s also possible to say that the more contentious an event the higher the probability that it has occurred, but that may require some additional assumptions about the distribution of interpretations in semantic space.
An important difference between AGI and the Second Coming (at least among rationalists and AI researchers) is that the latter generally has a much lower prior probability than the former.
You’re kind of proving the point; the Second Coming is so vaguely defined that it might as well have happened. Some churches preach this.
If the Lord Himself did float down from Heaven and gave a speech on Capitol Hill, I bet lots of Christians would deride Him as an impostor.
Disagreement on AGI Suggests It’s Near
Thank you for the reply!
I’ve actually come to a remarkably similar conclusion as described in this post. We’re phrasing things differently (I called it the “myth of general intelligence”), but I think we’re getting at the same thing. The Secret of Our Success has been very influential on my thinking as well.
This is also my biggest point of contention with Yudkowsky’s views. He seems to suggest (for example, in this post) that capabilities are gained from being able to think well and a lot. In my opinion he vastly underestimates the amount of data/experience required to make that possible in the first place, for any particular capability or domain. This speaks to the age-old (classical) rationalism vs empiricism debate, where Yudkowsky seems to sit on the rationalist side, whereas it seems you and I would lean more to the empiricist side.
Entities that reproduce with mutation will evolve under selection. I’m not so sure about the “natural” part. If AI takes over and starts breeding humans for long floppy ears, is that selection natural?
In some sense, all selection is natural, since everything is part of nature, but an AI that breeds humans for some trait can reasonably be called artificial selection (and mesa-optimization). If such a breeding program happened to allow the system to survive, nature selects for it. If not, it tautologically doesn’t. In any case, natural selection still applies.
But there won’t necessarily be more than one AI, at least not in the sense of multiple entities that may be pursuing different goals or reproducing independently. And even if there are, they won’t necessarily reproduce by copying with mutation, or at least not with mutation that’s not totally under their control with all the implications understood in advance. They may very well be able prevent evolution from taking hold among themselves. Evolution is optional for them. So you can’t be sure that they’ll expand to the limits of the available resources.
In a chaotic and unpredictable universe such as ours, survival is virtually impossible without differential adapation and not guaranteed even with it. (See my reply to lukedrago below.)
I’m glad you asked. I completely agree that nothing in the current LLM architecture prevents that technically and I expect that it will happen eventually.
The issue in the near future is practicality, because training models is currently—and will in the near future still be—very expensive. Inference is less expensive, but still so expensive that profit is only possible by serving the model statically (i.e., without changing its weights) to many clients, which amortizes the cost of training and inference.
These clients often rely heavily on models being static, because it makes its behavior predictable enough to be suitable for a production environment. For example, if you use a model for a chat bot on your company’s website, you wouldn’t want its personality to change based on what people say to it. We’ve seen that go wrong very quickly with Microsoft’s Twitter bot Tay.
It’s also a question whether you want your model to internalize new concepts (let’s just call it “continual learning”) based on everybody’s data or based on just your data. Using everybody’s data is more practical in the sense that you just update the one model that everybody uses (which is something that’s in a sense already happening when they move the cutoff date of the training data forward for the latest models), but it’s not something that users will necessarily be comfortable with. For example, users won’t want a model to leak their personal information to others. There are also legal barriers here, of course, especially with proprietary data.
People will probably be more comfortable with a model that updates just on their data, but that’s not practical (yet) in the sense that you would need the compute resources to be cheap enough to run an entire, slightly different model for each specific use case. It can already be done to some degree with fine-tuning, but that doesn’t change the weights of the entire model (that would be prohibitively expensive with current technology) and I don’t think this form of fine-tuning is able to implement continual learning effectively (but I’m happy to be proven wrong here).