Random thought on opioid addiction, no offense meant to people actually dealing with addiction, but I wonder if this might be useful: I read that opioid withdrawal makes people feel pain because the brain gets accustomed to extreme levels of pain suppression and without opioids their pain tolerance is so low that everything itches and hurts. This makes me wonder if this effect is kind of similar to autistic sensory sensitivities, just turned up to 9000. Could it be that withdrawal doesn’t create pain, but simply amplifies and turns attention to small pains and discomforts that are already there, but normal people just don’t notice or get used to ignoring? If so, opioid addiction may be like a canary in the coal mine, where people get used to being in pain and lack healthy tools to deal with it. If opioid addiction is largely because of painful withdrawal rather than just pleasure, could techniques to avoid pain be helpful in dealing with opioid addiction? Autistic people often need various coping strategies, like ear plugs to avoid noise or special clothing to decrease everyday friction that normies take for granted, and they can be more sensitive to internal bodily signals like pains that most people just don’t think are a big deal. Could the same coping skills and additional treatment for mild chronic pain etc be used to help treat addiction? If teaching physical and emotional pain avoidance/management skills to addicts when they are going through withdrawal is impractical, why not also teach them to non-addicts who might be at risk or just people in general, before they have a chance to become addicted? Less pain to begin with means fewer reasons to escape pain using drugs, and more chances to learn. Maybe everyone can benefit from taking small pains and discomforts and unhappiness more seriously as a society. And I don’t mean purely mental skills—we probably shouldn’t treat addicts or people at risk of becoming addicts the same way we treat normies. When people are really sensitized or in crisis, mental tolerance, mindfulness and reframing probably isn’t very helpful. We also need more physical ways to remove causes of pain, like widely available comfortable, itch-free clothing, ergonomic beds and chairs, quality air and quiet areas, treatment and prevention of minor chronic issues like inflammation and joint damage with age, etc. Instead of telling people to tough it up, treat minor pain and unhappiness as early warnings, and normalize healthy comfort-seeking before being in crisis. Also normalize and invest in treatment and prevention of low-grade health issues that people don’t typically go to the doctor for. These may seem like luxuries but are cheaper than long-term addiction and prison.
I knew closely several opiod addicted people and had myself addicted to nicotine. Physical withdrawal symptoms is only a small part of the problem in both cases. Although I tend to agree with you on this part:
withdrawal doesn’t create pain, but simply amplifies and turns attention to small pains and discomforts that are already there, but normal people just don’t notice or get used to ignoring
You really can thoughen up and endure days to weeks of the physical withdraval, but then you have to deal with the months to years of the psychological addiction.
Opiod addiction is like a short circuit in motivation: Normally, when some problem bothers you, you are motivated to solve it. Opioids give an illusion of all problems disappearing, and teach people this flawed behavioral pattern: Instead of solving the actual problem, just take a dose. And this becomes a vicious cycle: addicted person spends all money on drugs, it produces more problems and more urge to solve them with taking more drugs. Planning horizon reduces to hours. Some prefer to steal money to get a doze even knowing that they will be caught the same day.
Thanks for the input! If addiction is more because of psychological pain (“problems that bother you”) than direct physical pain, could the same approach work but with mental pleasures/distractions from pain instead, like games, toys or organized social activities? Edit: And coping methods to avoid/decrease mental and social discomfort, which can include but are not limited to just therapy or communication, but could be things like new job/friends or prioritizing things in life differently. I read that some people trying to fight addiction get overwhelmed by having to get everything together at once, or being expected to just quit and function like normal immediately. If they were supported to have fun/play and feel better first in healthier ways, could it be more helpful?
Of course. And this is what many good rehabilitation programs do.
But the mere distraction is again, only a temporary solution. Patients need to relearn healthy behavioral patterns, otherwise they may fall back eventually.
Games are good in that sense that they provide a quick feedback loop. You had a problem and quickly solved it without a drug.
When you and another person have different concepts of what’s good.
When both of you have the same concepts of what’s good but different models of how to get there.
This happens a lot when people are perfectionist and have aesthetic preferences for work being done in a certain way.
This happens in companies a lot. AI will work in those contexts and will be deceptive if it wants to do useful work. Actually maybe not, the dynamics will be different, like AI being neutral in some way like anybody can turn honesty mode and ask it anything.
Anyway, I think because of the way companies are structured and how humans work being slightly deceptive allows you to do useful work (I think it’s pretty intuitive for anyone who worked in a corporation or watched the office)
I don’t get the down votes. I do think it’s extremely simple—look at politics in general or even workplace politics, just try to google it, there even wikipedia pages roughly about what I want to talk about. I have experienced a situation where I need to do my job and my boss makes it harder for me in some way many times—being not completely honest is an obvious strategy and it’s good for the company you are working at
I think the downvotes is because the correct statement is something more like “In some situations, you can do more useful work by being deceptive.” I think this is actually what you argue for, but it’s very different from “To do useful work you need to be deceptive.”
If “To do useful work you need to be deceptive.” this means that one can’t do useful work without being deceptive. This is clearly wrong.
LW discussion norms is that you’re supposed to say what you mean, and not leave people to guess, because this leads to more precise communication. E.g. I guessed that you did not mean what you literary wrote, because that would be dumb, but I don’t know exactly what statement you’re arguing for.
I know this is not standard communication practice in most places, but it is actually very valuable, you should try it.
I don’t necessarily believe or disbelieve in the final 1% taking the longest in this case – there are too many variables to make a confident prediction. However, it does tend to be a common occurrence.
It could very well be that the 1% before the final 1% takes the longest. Based on the past few years, progress in the AI space has been made fairly steadily, so it could also be that it continues at just this pace until that last 1% is hit, and then exponential takeoff occurs.
You could also have a takeoff event that carries from now till 99%, which is then followed by the final 1% taking a long period.
A typical exponential takeoff is, of course, very possible as well.
While the alignment community is frantically trying to convince themselves of the possibility of benevolent artificial superintelligence, the human cognition research remains undeservedly neglected. Modern AI models are predominantly based on neural-networks, which is the so-called connectionist approach in cognitive architecture studies. But in the beginning, the symbolic approach was more popular because of the lesser computational demands. Logic programming was the means to imbue the system with the programmer’s intelligence. Although symbolist AI researchers have studied the work of the human brain, their research was driven by attempts to reproduce the work of the brain, to create an artificial personality, rather than help programmers expressing their thoughts. The user’s ergonomics were largely ignored. Logic programming languages aimed to be the closest representation of the programmer’s thoughts. But they failed at being practically convenient. As a result, nobody is using vanilla logic programming for practical means. In contrast to that, my research is driven by ergonomics and attempts to synchronize with the user’s thinking. For example, while proving a theorem (creating an algorithm), instead of manually composing plain texts of sophisticated language, the user sees the current context and chooses the next step from available options.
Sometimes people deliberately fill their environment with yes-men and drive out critics. Pointing out what they’re doing doesn’t help, because they’re doing it on purpose. However there are ways well intentioned people end up driving out critics unintentionally, and those are worth talking about.
The Rise and Fall of Mars Hill Church (podcast) is about a guy who definitely drove out critics deliberately. Mark Driscoll fired people, led his church to shun them, and rearranged the legal structure of the church to consolidate power. It worked, and his power was unchecked until the entire church collapsed. Yawn.
What’s interesting is who he hired after the purges. As described in a later episode, his later hiring was focused on people who were executives in the secular world. These people were great at executing on tasks, but unopinionated about what their task should be. Whatever Driscoll said was what they did.
This is something a good, feedback-craving leader could have done by accident. Hiring people who are good at the tasks you want them to do is a pretty natural move. But I think the speaker is correct (alas I didn’t write down his name) that this is anti-correlated at the tails- the best executors become so by not caring about what they’re executing.
So if you’re a leader and want to receive a healthy amount of pushback, it’s not enough to hire hypercompetent people and listen when they push back. You have to select specifically for ability to push back (including both willingness, and having good opinions).
My biggest outstanding question is “why did church network leaders give resources to a dude who had never/barely been to church to start his own?” There were probably subtler warning signs but surely they shouldn’t have been necessary once you encountered that fact and the fact that he was proud of it. If anyone has insight or sources on this I’d love to chat.
Mark Driscoll was raised Catholic, converted to evangelical Christianity at 19, got an MA in theology, connected with others who were associated with “church planting” efforts, and launched the first Mars Hill church in 1996 when he was 26. The church was initially in his home.
So it seems he may have received coaching, training, and some limited support at that early stage, but probably not enormous financial resources.
It looks like you’re right that he didn’t receive much funding via networks or sending churches. The podcast describes initial support coming from “friends and family”, in ways that sound more like a friends and family round of start-up funding than normal tithes.
I’m still under the impression that he received initial endorsements, blessings, and mentorship from people who should have known better.
(In case this isn’t a joke, Mars Hill church was named after Mars Hill / the Areopagus / Hill of Ares, which in the New Testament is where the apostle Paul gives a speech to a bunch of pagans about Jesus. That hill is named after the Greek god. The church was located on Earth, in particular in Seattle.)
A former plumber posted this during a debate with a quantum field theorist:
Buzzword-dropping protects itself by making quoting pointless. Take this, for example:
“A high-Φ refuse reservoir induces transconduit flux toward a lower Ψ basin via gravimetric asymmetry. Sustained ΔP < 0 is preserved through a sealed continuum, allowing energy-favorable migration under minimal enthalpic dissipation and inertial drag.”
It continues:
“Try quoting that. You can’t, at least not in any meaningful way. I never defined Φ, Ψ, or anything for that matter. It’s a wet mess that sounds absolutely profound, but it’s just shit going downhill, and its Literally draining.”
Last week I got nerdsniped with the question of why established evangelical leaders had a habit of taking charismatic narcissists and giving them support to found their own churches[1]. I expected this to be a whole saga that would teach lessons on how selecting for one set of good things secretly traded off against others. Then I found this checklist on churchplanting.com. It’s basically “tell me you’re a charismatic narcissist who will prioritize growth above virtue without telling me you’re a…“. And not charismatic in the sense of asking reasonable object-level questions that are assessed by a 3rd party and thus vulnerable to halo effects[2].
The first and presumably most important item on the checklist is “Visioning capacity”, which includes both the ability to dream that you are very important and to convince others to follow that dream. Comittment to growth has it’s own section (7), but it’s also embedded in, but there’s also section 4 (skill at attracting converts). Section 12 is Resilience, but the only specific setback mentioned is ups and downs in attendance. The very item on the list is “Can you create a grand Faith” is the last item on the 13 point list. “displaying Godly love and compassion to people” is a subheading under “6. Effectively builds relationships”.
There are other checklists that at least ask about character, so this isn’t all church planting. But it looks like the answer to “why do some evangelicals support charismatic narcissists that prioritize growth above all else...” is “because that’s what they want, presumably for the same reason lots of people value charm and growth.”
Interesting find. What about the visioning section conveyed “the dream that you are very important?” Or, alternatively, what do you mean by “dream” in this context?
In practice, newly planted churches[1] are cults of personality (neutral valence) around the planting team, or sometimes just the lead pastor[2]. “developing a theme which highlights the vision and philosophy of ministry” and “establishing a clear church identity related to the theme and vision” is inevitably[3] about selling yourself as a brand.
It’s possible to be a non-narcissist and pass this checklist, including the vision part. But it’s a lot easier if you have a high opinion of yourself, few doubts, don’t care about harming others, and love being the center of attention .
If you don’t believe in your work, consider looking for other options
I spent 15 months working for ARC Theory. I recently wrote up why I don’t believe in their research. If one reads my posts, I think it should become very clear to the reader that either ARC’s research direction is fundamentally unsound, or I’m still misunderstanding some of the very basics after more than a year of trying to grasp it. In either case, I think it’s pretty clear that it was not productive for me to work there. Throughout writing my posts, I felt an intense shame imagining readers asking the very fair question: “If you think the agenda is so doomed, why did you keep working on it?”[1]
In my first post, I write: “Unfortunately, by the time I left ARC, I became very skeptical of the viability of their agenda.”This is not quite true. I was very skeptical from the beginning, for largely similar reasons I expressed in my posts. But first I told myself that I should stay a little longer. Either they manage to convince me that the agenda is sound, or I demonstrate that it doesn’t work, in which case I free up the labor of the group of smart people working on the agenda. I think this was initially a somewhat reasonable position, though it was already in large part motivated reasoning.
But half a year after joining, I don’t think this theory of change was very tenable anymore. It was becoming clear that our arguments were going in circles. I couldn’t convince Paul and Mark (the two people thinking the most about the big picture questions), nor could they convince me. Eight months in, two friends visited me in California, and they noticed that I always derailed the conversation when they asked me about my research. I think that should have been an important thing to notice that I was ashamed to talk about my research to my friends, because I was afraid they would see how crazy it was. I should have quit then, but I stayed for another seven months.
I think this was largely due to cowardice. I’m very bad at coding and all my previous attempts at upskilling in coding went badly.[2] I thought of my main skill as being a mathematician, and I wanted to keep working on AI safety. The few other places one can work as a mathematician in AI safety looked even less promising to me than ARC. I was afraid that if I quit, I wouldn’t find anything else to do.
In retrospect, this fear was unfounded. I realized there were other skills one can develop, not just coding. In my afternoons, I started reading a lot more papers and serious blog posts [3] from various branches of AI safety. After a few months, I felt I had much more context on many topics. I started to think more about what I can do with my non-mathematical skills. When I finally started applying for jobs, I got an offer from the European AI Office and UKAISI, and it looked more likely than not that I would get an offer from Redwood. [4]
Other options I considered that looked less promising than the three above, but still better than staying at ARC: Team up with some Hungarian coder friends and execute some simple but interesting experiments I had vague plans for. [5] Assemble a good curriculum for the prosaic AI safety agendas that I like. Apply for a grant-maker job. Become a Joe Carlsmith-style general investigator. Try to become a journalist or an influential blogger. Work on crazy acausal trade stuff.
I still think many of these were good opportunities, and probably there are many others. Of course, different options are good for people with different skill profiles, but I really believe that the world is ripe with opportunities to be useful for people who are generally smart and reasonable and have enough context on AI safety. If you are working on AI safety but don’t really believe that your day-to-day job is going anywhere, remember that having context and being ingrained in the AI safety field is a great asset in itself,[6] and consider looking for other projects to work on.
(Important note: ARC was a very good workplace, my coworkers were very nice to me and receptive to my doubts, and I really enjoyed working there except for feeling guilty that my work is not useful. I’m also not accusing the people who continue working at ARC of being cowards in the way I have been. They just have a different assessment of ARC’s chances, or work on lower-level questions than I have, where it can be reasonable to just defer to others on the higher-level questions.)
(As an employee of the European AI Office, it’s important for me to emphasize this point: The views and opinions of the author expressed herein are personal and do not necessarily reflect those of the European Commission or other EU institutions.)
No, really, it felt very bad writing the posts. It felt like describing how I worked for a year on a scheme that was either trying to build perpetuum mobile machines, or trying to build normal cars, I just missed the fact that gasoline exists. Embarrassing either way.
How exactly are you measuring coding ability? What are the ways you’ve tried to upskill, and what are common failure modes? Can you describe your workflow at a high-level, or share a recording? Are you referring to competence at real world engineering tasks, or performance on screening tests?
If one reads my posts, I think it should become very clear to the reader that either ARC’s research direction is fundamentally unsound, or I’m still misunderstanding some of the very basics after more than a year of trying to grasp it.
I disagree. Instead, I think that either ARC’s research direction is fundamentally unsound, or you’re still misunderstanding some of the finer details after more than a year of trying to grasp it. Like, your post is a few layers deep in the argument tree, and the discussions we had about these details (e.g. in January) went even deeper. I don’t really have a position on whether your objections ultimately point at an insurmountable obstacle for ARC’s agenda, but if they do, I think one needs to really dig into the details in order to see that.
That’s not how I see it. I think the argument tree doesn’t go very deep until I lose the the thread. Here are a few, slightly stylized but real, conversations I had with friends who had no context on what ARC was doing, when I tried to explain our research to them:
Me: We want to to do Low Probability Estimation.
Them: Does this mean you want to estimate the probability that ChatGPT says a specific word after a 100 words on chain of thought? Isn’t this clearly impossible?
Me: No, you see, we only want to estimate the probabilities only as well as the model knows.
Them: What does this mean?
Me: [I can’t answer this question.]
Me: We want to do Mechanistic Anomaly Detection.
Them: Isn’t this clearly impossible? Won’t this result in a lot of false positives when anything out of distribution happens?
Me: Yes, why we have this new clever idea of relying on the fragility of sensor tampering, that if you delete a subset of the actions, you will get an inconsistent image.
Them: What if the AI builds another robot to tamper with the cameras?
Me: We actually don’t want to delete actions but heuristic arguments for why the cameras will show something, and we want to construct heuristic explanations in a way that they carry over through delegated actions.
Them: What does this mean?
Me; [I can’t answer this question.]
Me: We want to create Heuristic Arguments to explain everything the model does.
Them: What does it mean that an argument explained a behavior? What is even the type signature of heuristic arguments? And you want to explain everything a model does? Isn’t this clearly impossible?
Me: [I can’t answer this question.]
When I was explaining our research to outsiders (which I usually tried to avoid out of cowardice), we usually got to some of these points within minutes. So I wouldn’t say these are fine details of our agenda.
During my time at ARC, the majority of my time was spent on asking variations of these three questions from Mark and Paul. They always kindly answered, and the answer was convincing-sounding enough for the moment that I usually couldn’t really reply on the spot, and then I went back to my room to think through their answers. But I never actually understood their answers, and I can’t reproduce them now. Really, I think that was the majority of work I did at ARC. When I left, you guys should have bought a rock with “Isn’t this clearly impossible?” written on it, and that would profitably replace my presence.
That’s why I’m saying that either ARC’s agenda is fundamentally unsound or I’m still missing some of the basics. What is standing between ARC’s agenda collapsing from five minutes of questioning from an outsider is that Paul and Mark (and maybe others in the team) have some convincing-sounding answers to the three questions above. So I would say that these answers are really part of the basics, and I never understood them.
Maybe Mark will show up in the comments now to give answers to the three questions, and I expect the answers to sound kind of convincing, and I won’t have a very convincing counter-argument other than some rambling reply saying essentially that “I think this argument is missing the point and doesn’t actually answer the question, but I can’t really point out why, because I don’t actually understand the argument because I don’t understand how you imagine heuristic arguments”. (This is what happened in the comments on my other post, and thanks to Mark for the reply and I’m sorry for still not understanding it.) I can’t distinguish whether I’m just bad at understanding some sound arguments here, or the arguments are elaborate self-delusions of people who are smarter and better at arguments than me. In any case, I feel epistemic learned helplessness on some of these most basic questions in ARC’s agenda.
What is your opinion on the Low Probability Estimation paper published this year at ICLR?
I don’t have a background in the field, but it seems like they were able to get some results, that indicate the approach is able to extract some results. https://arxiv.org/pdf/2410.13211
It’s a nice paper, and I’m glad they did the research, but importantly, the paper reports a negative result about our agenda. The main result is that the method inspired by our ideas under-performs the baseline. Of course, these are just the first experiments, work is ongoing, this is not conclusive negative evidence for anything. But the paper certainly shouldn’t be counted as positive evidence for ARC’s ideas.
I was very skeptical from the beginning, for largely similar reasons I expressed in my posts. But first I told myself that I should stay a little longer.
IME, in the majority of cases, when I strongly felt like quitting but was also inclined to justify “staying just a little bit longer because XYZ”, and listened to my justifications, staying turned out to be the wrong decision.
Little is known about whether people make good choices when facing important decisions. This paper reports on a large-scale randomized field experiment in which research subjects having difficulty making a decision flipped a coin to help determine their choice. For important decisions (e.g. quitting a job or ending a relationship), those who make a change (regardless of the outcome of the coin toss) report being substantially happier two months and six months later. This correlation, however, need not reflect a causal impact. To assess causality, I use the outcome of a coin toss. Individuals who are told by the coin toss to make a change are much more likely to make a change and are happier six months later than those who were told by the coin to maintain the status quo. The results of this paper suggest that people may be excessively cautious when facing life-changing choices.
Pretty much the whole causal estimate comes down to the influence of happiness 6 months after quitting a job or breaking up. Almost everything else is swamped with noise. The only individual question with a consistent causal effect larger than the standard error was “should I break my bad habit?”, and doing so made people unhappier. Even for those factors, there’s a lot of biases in this self-report data, which the authors noted and tried to address. I’m just not sure what we can really learn from this, even though it is a fun study.
Fun fact: AI-2027 estimates that getting to ASI might take the equivalent of a 100-person team of top human AI research talent working for tens of thousands of years.
I’m curious why ASI would take so much work. What exactly is the R&D labor supposed to be doing each day, that adds up to so much effort? I’m curious how people are thinking about that, if they buy into this kind of picture. Thanks :)
(Calculation details: For example, in October 2027 of the AI-2027 modal scenario, they have “330K superhuman AI researcher copies thinking at 57x human speed”, which is 1.6 million person-years of research in that month alone. And that’s mostly going towards inventing ASI, I think. Did I get that right?)
(My own opinion, stated without justification, is that LLMs are not a paradigm that can scale to ASI, but after some future AI paradigm shift, there will be very very little R&D separating “this type of AI can do anything importantly useful at all” and “full-blown superintelligence”. Like maybe dozens or hundreds of person-years, or whatever, as opposed to millions. More on this in a (hopefully) forthcoming post.)
Whew, a critique that our takeoff should be faster for a change, as opposed to slower.
Fun fact: AI-2027 estimates that getting to ASI might take the equivalent of a 100-person team of top human AI research talent working for tens of thousands of years.
(Calculation details: For example, in October 2027 of the AI-2027 modal scenario, they have “330K superhuman AI researcher copies thinking at 57x human speed”, which is 1.6 million person-years of research in that month alone. And that’s mostly going towards inventing ASI, I think. Did I get that right?)
This depends on how large you think the penalty is for parallelized labor as opposed to serial. If 330k parallel researchers is more like equivalent to 100 researchers at 50x speed than 100 researchers at 3,300x speed, then it’s more like a team of 100 researchers working for (50*57)/12=~250 years.
Also of course to the extent you think compute will be an important input, during October they still just have a month’s worth of total compute even though they’re working for 250-25,000 subjective years.
I’m curious why ASI would take so much work. What exactly is the R&D labor supposed to be doing each day, that adds up to so much effort? I’m curious how people are thinking about that, if they buy into this kind of picture. Thanks :)
I’m imagining that there’s a mix of investing tons of effort into optimizing experimenting ideas, implementing and interpreting every experiment quickly, as well as tons of effort into more conceptual agendas given the compute shortage, some of which bear fruit but also involve lots of “wasted” effort exploring possible routes, and most of which end up needing significant experimentation as well to get working.
(My own opinion, stated without justification, is that LLMs are not a paradigm that can scale to ASI, but after some future AI paradigm shift, there will be very very little R&D separating “this type of AI can do anything importantly useful at all” and “full-blown superintelligence”. Like maybe dozens or hundreds of person-years, or whatever, as opposed to millions. More on this in a (hopefully) forthcoming post.)
I don’t share this intuition regarding the gap between the first importantly useful AI and ASI. If so, that implies extremely fast takeoff, correct? Like on the order of days from AI that can do important things to full-blown superintelligence?
Currently there are hundreds or perhaps low thousands of years of relevant research effort going into frontier AI each year. The gap between importantly useful AI and ASI seems larger than a year of current AI progress (though I’m not >90% confident in that, especially if timelines are <2 years). Then we also need to take into account diminishing returns, compute bottlenecks, and parallelization penalties, so my guess is that the required person-years should be at minimum in the thousands and likely much more. Overall the scenario you’re describing is maybe (roughly) my 95th percentile speed?
I’m curious about your definition for importantly useful AI actually. Under some interpretations I feel like current AI should cross that bar.
I’m uncertain about the LLMs thing but would lean toward pretty large shifts by the time of ASI; I think it’s more likely LLMs scale to superhuman coders than to ASI.
If we divide the inventing-ASI task into (A) “thinking about and writing algorithms” versus (B) “testing algorithms”, in the world of today there’s a clean division of labor where the humans do (A) and the computers do (B). But in your imagined October 2027 world, there’s fungibility between how much compute is being used on (A) versus (B). I guess I should interpret your “330K superhuman AI researcher copies thinking at 57x human speed” as what would happen if the compute hypothetically all went towards (A), none towards (B)? And really there’s gonna be some division of compute between (A) and (B), such that the amount of (A) is less than I claimed? …Or how are you thinking about that?
I’m curious about your definition for importantly useful AI actually. Under some interpretations I feel like current AI should cross that bar.
Right, but I’m positing a discontinuity between current AI and the next paradigm, and I was talking about the gap between when AI-of-that-next-paradigm is importantly useful versus when it’s ASI. For example, AI-of-that-next-paradigm might arguably already exist today but where it’s missing key pieces such that it barely works on toy models in obscure arxiv papers. Or here’s a more concrete example: Take the “RL agent” line of AI research (AlphaZero, MuZero, stuff like that), which is quite different from LLMs (e.g. “training environment” rather than “training data”, and there’s nothing quite like self-supervised pretraining (see here)). This line of research has led to great results on board games and videogames, but it’s more-or-less economically useless, and certainly useless for alignment research, societal resilience, capabilities research, etc. If it turns out that this line of research is actually much closer to how future ASI will work at a nuts-and-bolts level than LLMs are (for the sake of argument), then we have not yet crossed the “AI-of-that-next-paradigm is importantly useful” threshold in my sense.
If it helps, here’s a draft paragraph from that (hopefully) forthcoming post:
Another possible counter-argument from a prosaic-AGI person would be: “Maybe this future paradigm exists, but LLM agents will find it, not humans, so this is really part of that ‘AIs-doing-AI-R&D’ story like I’ve been saying”. I have two responses. First, I disagree with that prediction. Granted, probably LLMs will be a helpful research tool involved in finding the new paradigm, but there have always been helpful research tools, from PyTorch to arXiv to IDEs, and I don’t expect LLMs to be fundamentally different from those other helpful research tools. Second, even if it’s true that LLMs will discover the new paradigm by themselves (or almost by themselves), I’m just not sure I even care. I see the pre-paradigm-shift AI world as a lesser problem, one that LLM-focused AI alignment researchers (i.e. the vast majority of them) are already focusing on. Good luck to them. And I want to talk about what happens in the strange new world that we enter after that paradigm shift.
Next:
If so, that implies extremely fast takeoff, correct? Like on the order of days from AI that can do important things to full-blown superintelligence?
Well, even if you have an ML training plan that will yield ASI, you still need to run it, which isn’t instantaneous. I dunno, it’s something I’m still puzzling over.
…But yeah, many of my views are pretty retro, like a time capsule from like AI alignment discourse of 2009. ¯\_(ツ)_/¯
I can somewhat see where you’re coming from about a new method being orders of magnitude more data efficient in RL, but I very strongly bet on transformers being core even after such a paradigm shift. I’m curious whether you think the transformer architecture and text input/output need to go, or whether the new training procedure / architecture fits in with transformers because transformers are just the best information mixing architecture.
My guess the main issue of current transformers turns out to be the fact that they don’t have a long-term state/memory, and I think this is a pretty critical part of how humans are able to learn on the job as effectively as they do.
The trouble as I’ve heard it is the other approaches which incorporate a state/memory for the long-run are apparently much harder to train reasonably well than transformers, plus first-mover effects.
If we divide the inventing-ASI task into (A) “thinking about and writing algorithms” versus (B) “testing algorithms”, in the world of today there’s a clean division of labor where the humans do (A) and the computers do (B). But in your imagined October 2027 world, there’s fungibility between how much compute is being used on (A) versus (B). I guess I should interpret your “330K superhuman AI researcher copies thinking at 57x human speed” as what would happen if the compute hypothetically all went towards (A), none towards (B)? And really there’s gonna be some division of compute between (A) and (B), such that the amount of (A) is less than I claimed? …Or how are you thinking about that?
I’m not 100% sure what you mean, but my guess is that you mean (B) to represent the compute used for experiments? We do project a split here and the copies/speed numbers are just for (A). You can see our projections for the split in our compute forecast (we are not confident that they are roughly right).
Re: the rest of your comment, makes sense. Perhaps the place I most disagree is that if LLMs will be the thing discovering the new paradigm, they will probably also be useful for things like automating alignment research, epistemics, etc. Also if they are misaligned they could sabotage the research involved in the paradigm shift.
That does raise my eyebrows a bit, but also, note that we currently have hundreds of top-level researchers at AGI labs tirelessly working day in and day out, and that all that activity results in a… fairly leisurely pace of progress, actually.[1]
Recall that what they’re doing there is blind atheoretical empirical tinkering (tons of parallel experiments most of which are dead ends/eke out scant few bits of useful information). If you take that research paradigm and ramp it up to superhuman levels (without changing the fundamental nature of the work), maybe it really would take this many researcher-years.
And if AI R&D automation is actually achieved on the back of sleepwalking LLMs, that scenario does seem plausible. These superhuman AI researchers wouldn’t actually be generally superhuman researchers, just superhuman at all the tasks in the blind-empirical-tinkering research paradigm. Which has steeply declining returns to more intelligence added.
That said, yeah, if LLMs actually scale to a “lucid” AGI, capable of pivoting to paradigms with better capability returns on intelligent work invested, I expect it to take dramatically less time.
It’s fast if you use past AI progress as the reference class, but is decidedly not fast if you try to estimate “absolute” progress. Like, this isn’t happening, we’ve jumped to near human-baseline and slowed to a crawl at this level. If we assume the human level is the ground and we’re trying to reach the Sun, it in fact might take millennia at this pace.
we’ve jumped to near human-baseline and slowed to a crawl at this level
A possible reason for that might be the fallibility of our benchmarks. It might be the case that for complex tasks, it’s hard for humans to see farther than their nose.
Incidentally, is there any meaningful sense in which we can say how many “person-years of thought” LLMs have already done?
We know they can do things in seconds that would take a human minutes. Does that mean those real-time seconds count as “human-minutes” of thought? Etc.
The short version is getting compute-optimal experiments to self-improve yourself, training to do tasks that unavoidably take a really long time to learn/get data on because of real-world experimentation being necessary, combined with a potential hardware bottleneck on robotics that also requires real-life experimentation to overcome.
Another point is that to the extent you buy the scaling hypothesis at all, then compute bottlenecks will start to bite, and given that researchers will seek small constant improvements they don’t generalize, and this can start a cascade of wrong decisions that could take a very long time to get out of.
(My own opinion, stated without justification, is that LLMs are not a paradigm that can scale to ASI, but after some future AI paradigm shift, there will be very very little R&D separating “this type of AI can do anything importantly useful at all” and “full-blown superintelligence”. Like maybe dozens or hundreds of person-years, or whatever, as opposed to millions. More on this in a (hopefully) forthcoming post.)
I’d like to see that post, and I’d like to see your arguments on why it’s so easy for intelligence to be increased so fast, conditional on a new paradigm shift.
(For what it’s worth, I personally think LLMs might not be the last paradigm, because of their current lack of continuous learning/neuroplasticity plus no long term memory/state, but I don’t expect future paradigms to have an AlphaZero like trajectory curve, where things go from zero to wildly superhuman in days/weeks, though I do think takeoff is faster if we condition on a new paradigm being required for ASI, so I do see the AGI transition to plausibly include having only months until we get superintelligence, and maybe only 1-2 years before superintelligence starts having very, very large physical impacts through robotics, assuming that new paradigms are developed, so I’m closer to hundreds of person years/thousands of person years than dozens of person years).
The world is complicated (see: I, Pencil). You can be superhuman by only being excellent at a few fields, for example politics, persuasion, military, hacking. That still leaves you potentially vulnerable, even if your opponents are unlikely to succeed; or you could hurt yourself by your ignorance in some field. Or you can be superhuman in the sense of being able to make the pencil from scratch, only better at each step. That would probably take more time.
Are you suggesting that e.g. “R&D Person-Years 463205–463283 go towards ensuring that the AI has mastery of metallurgy, and R&D Person-Years 463283–463307 go towards ensuring that the AI has mastery of injection-molding machinery, and …”?
If no, then I don’t understand what “the world is complicated” has to do with “it takes a million person-years of R&D to build ASI”. Can you explain?
…Or if yes, that kind of picture seems to contradict the facts that:
This seems quite disanalogous to how LLMs are designed today (i.e., LLMs can already answer any textbook question about injection-molding machinery, but no human doing LLM R&D has ever worked specifically on LLM knowledge of injection-molding machinery),
This seems quite disanalogous to how the human brain was designed (i.e., humans are human-level at injection-molding machinery knowledge and operation, but Evolution designed human brains for the African Savannah, which lacked any injection-molding machinery).
LLMs quickly acquired the capacity to read what humans wrote and paraphrase it. It is not obvious to me (though that may speak more about my ignorance) that it will be similarly easy to acquire deep understanding of everything.
I think the first question to think about is how to use them to make CDT decisions. You can create a market about a causal effect if you have control over the decision and you can randomise it to break any correlations with the rest of the world, assuming the fact that you’re going to randomise it doesn’t otherwise affect the outcome (or bettors don’t think it will).
Committing to doing that does render the market useless for choosing policy, but you could randomly decide whether to randomise or to make the decision via whatever the process you actually want to use, and have the market be conditional on the former. You probably don’t want to be randomising your policy decisions too often, but if liquidity wasn’t an issue you could set the probability of randomisation arbitrarily low.
Any update to the market is (equivalent to) updating on some kind of information. So all you can do is dynamically choose what to do or do not update on.* Unfortunately, whenever you choose not to update on something, you are giving up on the asymptotic learning guarantees of policy market setups. So the strategic gains from updatelesness (like not falling into traps) are in a fundamental sense irreconcilable with the learning gains from updatefulness. That doesn’t prevent that you can be pretty smart about deciding what to update on exactly… but due to embededness problems and the complexity of the world, it seems to be the norm (rather than the exception) that you cannot be sure a priori of what to update on (you just have to make some arbitrary choices).
*For avoidance of doubt, what matters for whether you have updated on X is not “whether you have heard about X”, but rather “whether you let X factor into your decisions”. Or at least, this is the case for a sophisticated enough external observer (assessing whether you’ve updated on X), not necessarily all observers.
I’ve been thinking through the following philosophical argument for the past several months.
1. Most things that currently exist have properties that allow them to continue to exist for a significant amount of time and propagate, since otherwise, they would cease existing very quickly.
2. This implies that most things capable of gaining adaptations, such as humans, animals, species, ideas, and communities, have adaptations for continuing to exist.
3. This also includes decision-making systems and moral philosophies.
4. Therefore, one could model the morality of such things as tending towards the ideal of perfectly maintaining their own existence and propagating as much as possible.
Many of the consequences of this approximation of the morality of things seem quite interesting. For instance, the higher-order considerations of following an “ideal” moral system (that is, utilitarianism using a measure of one’s own continued existence at a point in the future) lead to many of the same moral principles that humans actually have (e.g. cooperation, valuing truth) while also avoiding a lot of the traps of other systems (e.g. hedonism). This chain of thought has led me to believe that existence itself could be a principal component of real-life morality.
While it does have a lot of very interesting conclusions, I’m very concerned that if I were to write about it, I would receive 5 comments directing me to some passage by a respected figure that already discusses the argument, especially given the seemingly incredibly obvious structure it has. However, I’ve searched through LW and tried to research the literature as well as I can (through Google Scholar, Elicit, and Gemini, for instance), but I must not have the right keywords, since I’ve come up fairly empty, other than for philosophers with vaguely similar sounding arguments that don’t actually get at the heart of the matter (e.g. Peter Singer’s work comes up a few times, but he particularly focused on suffering rather than existence itself, and certainly didn’t use any evolutionary-style arguments to reach that conclusion).
If this really hasn’t been written about extensively anywhere, I would update towards believing the hypothesis that there’s actually some fairly obvious flaw that renders it unsound, stopping it from getting past, say, the LW moderation process or the peer review process. As such, I suspect that there is some issue with it, but I’ve not really been able to pinpoint what exactly stops someone from using existence as the fundamental basis of moral reasoning.
Would anyone happen to know of links that do directly explore this topic? (Or, alternatively, does anyone have critiques of this view that would spare me the time of writing more about this if this isn’t true?)
I don’t know any source. My first though it, if you define morality as “the thought system that propagates itself as much as possible”, what makes it different from other thought systems that propagate themselves as much as possible? If memetic survival s the whole story, why do different things exist, as opposed to everything converging on the most virulent form?
As for your first question, there are certainly other thought systems (or I suppose decision theories) that allow a thing to propagate itself, but I highlight a hypothetical decision theory that would be ideal in this respect. Of course, given that things are different from each other (as you mention), this ideal decision theory would necessarily be different for each of them.
Additionally, as the ideal decision theory for self-propagation is computationally intractable to follow, “the most virulent form” isn’t[1] actually useful for anything that currently exists. Instead, we see more computationally tractable propagation-based decision theories based on messy heuristics that happened to correlate with existence in the environment where such heuristics were able to develop.
For your final question, I don’t think that this theory explains initial conditions like having several things in the universe. Other processes analogous to random mutation, allopatric speciation, and spontaneous creation (that is, to not only species, but ideas, communities, etc.) would be better suited for answering such questions. “Propagative decision theory” does have some implications for the decision theories of things that can actually follow a decision theory, as well as giving a very solid indicator on otherwise unsolvable/controversial moral quandaries (e.g. insect suffering), but it otherwise only really helps as much as evolutionary psychology when it comes to explaining properties that already exist.
Other than in the case that some highly intelligent being manages to apply this theory well enough to do things like instrumental convergence that the ideal theory would prioritize, in which case this paragraph suddenly stops applying.
Harder questions (e.g. around 50% average instead of around 90% average) seem better for differentiating students’ understanding for at least two reasons: -The graph of percent of students who got a question correct as a function of the difficulty of the question tends to follow a sigmoid-ish curve where the fastest increase is around the middle. -Some of a students’ incorrect answers on the test are going to come from sources that (a) the student can prepare to mitigate (b) aren’t caused by lack of whatever students should be being tested for (e.g. questions with ambiguous meanings, questions that require an understanding of a niche framework that is never used outside of the curriculum, questions that have shortcuts that the teacher didn’t recognize, etc). Ideally, we don’t want any differences in student test results to be based off of these things, but harder tests at least mitigate the issue since understanding (or whatever stuff which we do want students to spend time on) becomes a more important cause for incorrect student selections. (Neither of these are hard-and-fast-rules but the general pattern seems to hold based on my experience as a student.)
https://www.understandingai.org/p/i-got-fooled-by-ai-for-science-hypeheres plasma physicist gets disappointing results trying to implement “physics-inspired neural networks” for solving PDEs. the example in the paper works, but a different equation doesn’t converge at all to the known closed-form solution & he can’t find any settings that get it to work.
Beware mistaking a “because” for an “and”. Sometimes you think something is X and Y, but it turns out to be X because Y.
For instance, I was recently at a metal concert, and helped someone off the ground in a mosh pit. Someone thanked me afterwards but to me it seemed like the most obvious thing in the world.
A mosh pit is not fun AND a place where everyone helps each other. It is fun BECAUSE everyone helps each other. Play-acting aggression while being supportive is where the fun is born.
If you want to be twice as profitable as your competitors, you don’t have to be twice as good as them. You just have to be slightly better.
I think AI development is mainly compute constrained (relevant for intelligence explosion dynamics).
There are some arguments against, based on the high spending of firms on researcher and engineer talent. The claim is that this supports one or both of a) large marginal returns to having more (good) researchers or b) steep power laws in researcher talent (implying large production multipliers from the best researchers).
Given that the workforces at labs remain not large, I think the spending naively supports (b) better.
But in fact I think there is another, even better explanation:
Researchers’ taste (an AI production multiplier) varies more smoothly
(research culture/collective intelligence of a team or firm may be more important)
Marginal parallel researchers have very diminishing AI production returns (sometimes negative, when the researchers have worse taste)
(also determining a researcher’s taste ex ante is hard)
BUT firms’ utility is sharply convex in AI production
capturing more accolades and market share are basically the entire game
spending as much time as possible with a non-commoditised offering allows profiting off fast-evaporating margin
so firms are competing over getting cool stuff out first
time-to-delivery of non-commoditised (!) frontier models
and getting loyal/sticky customer bases
ease-of-adoption of product wrapping
sometimes differentiation of offerings
this turns small differences in human capital/production multiplier/research taste into big differences in firm utility
so demand for the small pool of the researchers with (legibly) great taste is very hot
This also explains why it’s been somewhat ‘easy’ (but capital intensive) for a few new competitors to pop into existence each year, and why firms’ revealed preferred savings rate into compute capital is enormous (much greater than 100%!).
We see token prices drop incredibly sharply, which supports the non-commoditised margin claim (though this is also consistent with a Wright’s Law effect from (runtime) algorithmic efficiency gains, which should definitely also be expected).
A lot of engineering effort is being put into product wrappers and polish, which supports the customer base claim.
The implications include: headroom above top human expert teams’ AI research taste could be on the small side (I think this is right for many R&D domains, because a major input is experimental throughput). So both quantity and quality of (perhaps automated) researchers should have steeply diminishing returns in AI production rate. But might they nevertheless unlock a practical monopoly (or at least an increasingly expensive barrier to entry) on AI-derived profit, by keeping the (more monetisable) frontier out of reach of competitors?
Who predicted that AI will have a multi-year “everything works” period where the prerequisite pieces come together and suddenly every technique works on every problem? Like before electricity you had to use the right drill bit or saw blade for a given material, but now you can cut anything with anything if you are only slightly patient.
Sometimes people talk about how AIs will be very superhuman at a bunch of (narrow) domains. A key question related to this is how much this generalizes. Here are two different possible extremes for how this could go:
It’s effectively like an attached narrow weak AI: The AI is superhuman at things like writing ultra fast CUDA kernels, but from the AI’s perspective, this is sort of like it has a weak AI tool attached to it (in a well integrated way) which is superhuman at this skill. The part which is writing these CUDA kernels (or otherwise doing the task) is effectively weak and can’t draw in a deep way on the AI’s overall skills or knowledge to generalize (likely it can shallowly draw on these in a way which is similar to the overall AI providing input to the weak tool AI). Further, you could actually break out these capabilities into a separate weak model that humans can use. Humans would use this somewhat less fluently as they can’t use it as quickly and smoothly due to being unable to instantaneously translate their thoughts and not being absurdly practiced at using the tool (like AIs would be), but the difference is ultimately mostly convenience and practice.
Integrated superhumanness: The AI is superhuman at things like writing ultra fast CUDA kernels via a mix of applying relatively general (and actually smart) abilities, having internalized a bunch of clever cognitive strategies which are applicable to CUDA kernels and sometimes to other domains, as well as domain specific knowledge and heuristics. (Similar to how humans learn.) The AI can access and flexibly apply all of the things it learned from being superhuman at CUDA kernels (or whatever skill) and with a tiny amount of training/practice it can basically transfer all these things to some other domain even if the domain is very different. The AI is at least as good at understanding and flexibly applying what it has learned as humans would be if they learned the (superhuman) skill to the same extent (and perhaps the AIs are actually much better at this than humans). You can’t separate these capabilities into a weak model, the weak model RL’d on this (and distilled into) would either be much worse at CUDA or would need to actually be generally quite capable (rather than weak).
My sense is that the current frontier LLMs are much closer to (1) than (2) for most of their skills, particularly the skills which they’ve been heavily trained on (e.g. next token prediction or competitive programming). As AIs in the current paradigm get more capable, they appear to shift some toward (2) and I expect that at the point when AIs are capable of automating virtually all cognitive work that humans can do, we’ll be much closer to (2). That said, it seems likely that powerful AIs built in the current paradigm[1] which otherwise match humans at downstream performance will somewhat lag behind humans in integrating/generalizing skills they learn (at least without spending a bunch of extra compute on skill integration) because this ability currently seems to be lagging behind other capabilities relative to humans and AIs can compensate for worse skill integration with other advantages (being extremely knowledgeable, fast speed, parallel training on vast amounts of relevant data including “train once, deploy many”, better memory, faster and better communication, etc).
I think different views about the extent to which future powerful AIs will deeply integrate their superhuman abilities versus these abilities being shallowly attached partially drive some disagreements about misalignment risk and what takeoff will look like.
I suppose that most tasks that an LLM can accomplish could theoretically be performed more efficiently by a dedicated program optimized for that task (and even better by a dedicated physical circuit). Hypothesis 1) amounts to considering that such a program, a dedicated module within the model, is established during training. This module can be seen as a weak AI used as a tool by the stronger AI. A bit like how the human brain has specialized modules that we (the higher conscious module) use unconsciously (e.g., when we read, the decoding of letters is executed unconsciously by a specialized module).
We can envision that at a certain stage the model becomes so competent in programming that it will tend to program code on the fly, a tool, to solve most tasks that we might submit to it. In fact, I notice that this is already increasingly the case when I ask a question to a recent model like Claude Sonnet 3.7. It often generates code, a tool, to try to answer me rather than trying to answer the question ‘itself.’ It clearly realizes that dedicated code will be more effective than its own neural network. This is interesting because in this scenario, the dedicated module is not generated during training but on the fly during normal production operation. In this way, it would be sufficient for AI to become a superhuman programmer to become superhuman in many domains thanks to the use of these tool-programs. The next stage would be the on-the-fly production of dedicated physical circuits (FPGA, ASIC, or alien technology), but that’s another story.
This refers to the philosophical debate about where intelligence resides: in the tool or in the one who created it? In the program or in the programmer? If a human programmer programs a superhuman AI, should we attribute this superhuman intelligence to the programmer? Same question if the programmer is itself an AI? It’s the kind of chicken and egg debate where the answer depends on how we divide the continuity of reality into discrete categories. You’re right that integration is an interesting criterion as it is a kind of formal / non arbitrary solution to this problem of defining discrete categories among the continuity of reality.
People also disagree greatly about how much humans tend towards integration rather than non-integration, and how much human skill comes from domain transfer. And I think some / a lot of the beliefs about artificial intelligence are downstream of these beliefs about the origins of biological intelligence and human expertise, i.e., in Yudkowsky / Ngo dialogues. (Object level: Both the LW-central and alternatives to the LW-central hypotheses seem insufficiently articulated; they operate as a background hypothesis too large to see rather than something explicitly noted, imo.)
People also disagree greatly about how much humans tend towards integration rather than non-integration, and how much human skill comes from domain transfer.
Makes me wonder whether most of what people believe to be “domain transfer” could simply be IQ.
I mean, suppose that you observe a person being great at X, then you make them study Y for a while, and it turns out that they are better at Y than an average person who spend the same time studying Y.
One observer says: “Clearly some of the skills at X have transferred to the skills of Y.”
Another observer says: “You just indirectly chose a smart person (by filtering for high skills at X), duh.”
This seems important to think about, I strong upvoted!
As AIs in the current paradigm get more capable, they appear to shift some toward (2) and I expect that at the point when AIs are capable of automating virtually all cognitive work that humans can do, we’ll be much closer to (2).
I’m not sure that link supports your conclusion.
First, the paper is about AI understanding its own behavior. This paper makes me expect that a CUDA-kernel-writing AI would be able to accurately identify itself as being specialized at writing CUDA kernels, which doesn’t support the idea that it would generalize to non-CUDA tasks.
Maybe if you asked the AI “please list heuristics you use to write CUDA kernels,” it would be able to give you a pretty accurate list. This is plausibly more useful for generalizing, because if the model can name these heuristics explicitly, maybe it can also use the ones that generalize, if they do generalize. This depends on 1) the model is aware of many heuristics that it’s learned, 2) many of these heuristics generalize across domains, and 3) it can use its awareness of these heuristics to successfully generalize. None of these are clearly true to me.
Second, the paper only tested GPT-4o and Llama 3, so the paper doesn’t provide clear evidence that more capable AIs “shift some towards (2).” The authors actually call out in the paper that future work could test this on smaller models to find out if there are scaling laws—has anybody done this? I wouldn’t be too surprised if small models were also able to self-report simple attributes about themselves that were instilled during training.
Fair, but I think the AI being aware of its behavior is pretty continuous with being aware of the heuristics it’s using and ultimately generalizing these (e.g., in some cases the AI learns what code word it is trying to make the user say which is very similar to being aware of any other aspect of the task it is learning). I’m skeptical that very weak/small AIs can do this based on some other papers which show they fail at substantially easier (out-of-context reasoning) tasks.
I think most of the reason why I believe this is improving with capabilities is due to a broader sense of how well AIs generalize capabilities (e.g., how much does o3 get better at tasks it wasn’t trained on), but this paper was the most clearly relevant link I could find.
I’m not sure o3 does get significantly better at tasks it wasn’t trained on. Since we don’t know what was in o3′s training data, it’s hard to say for sure that it wasn’t trained on any given task.
To my knowledge, the most likely example of a task that o3 does well on without explicit training is GeoGuessr. But see this Astral Codex Ten post, quoting Daniel Kang:[1]
We also know that o3 was trained on enormous amounts RL tasks, some of which have “verified rewards.” The folks at OpenAI are almost certainly cramming every bit of information with every conceivable task into their o-series of models! A heuristic here is that if there’s an easy to verify answer and you can think of it, o3 was probably trained on it.
I think this is a bit overstated, since GeoGuessr is a relatively obscure task, and implementing an idea takes much longer than thinking of it.[2] But it’s possible that o3 was trained on GeoGuessr.
The same ACX post also mentions:
On the other hand, the DeepGuessr benchmark finds that base models like GPT-4o and GPT-4.1 are almost as good as reasoning models at this, and I would expect these to have less post-training, probably not enough to include GeoGuessr
Do you have examples in mind of tasks that you don’t think o3 was trained on, but which it nonetheless performs significantly better at than GPT-4o?
I would guess that OpenAI has trained on GeoGuessr. It should be pretty easy to implement—just take images off the web which have location metadata attached, and train to predict the location. Plausibly getting good at Geoguessr imbues some world knowledge.
I think different views about the extent to which future powerful AIs will deeply integrate their superhuman abilities versus these abilities being shallowly attached partially drive some disagreements about misalignment risk and what takeoff will look like.
I think this might be wrong when it comes to our disagreements, because I don’t disagree with this shortform.[1] Maybe a bigger crux is how valuable (1) is relative to (2)? Or the extent to which (2) is more helpful for scientific progress than (1)?
I don’t think this explains our disagreements. My low confidence guess is we have reasonably similar views on this. But, I do think it drives parts of some disagreements between me and people who are much more optimisitic than me (e.g. various not-very-concerned AI company employees).
I agree the value of (1) vs (2) might also be a crux in some cases.
Is the crux that the more optimistic folks plausibly agree (2) is cause for concern, but believe that mundane utility can be reaped with (1), and they don’t expect us to slide from (1) into (2) without noticing?
When people are skeptical about the concept of AGI being meaningful or having clear boundaries, it could sometimes be downstream of skepticism about very fast and impactful R&D done by AIs, such as software-only singularity or things like macroscopic biotech where compute buildout happens at a speed impossible for human industry. Such events are needed to serve as landmarks, anchoring a clear concept of AGI, otherwise the definition remains contentious.
So AI company CEOs who complain about AGI being too nebulous to define might already be expecting a scaling slowdown, with their strategy being primarily about the fight for the soul of the 2028-2030 market. When scaling is slow, it’ll become too difficult to gain a significant quality advantage sufficient to defeat the incumbents. So the decisive battle is happening now, with the rhetoric making it more palatable to push through the decisions to build the $140bn training systems of 2028.
This behavior doesn’t need to be at all related to expecting superintelligence, it makes sense as a consequence of not expecting superintelligence in the near future.
I think short timelines just don’t square with the way intelligence agencies are behaving. The NSA took Y2K more seriously than it currently seems to be taking near-term AGI. You can make the argument that intelligence agencies are less competent than they used to be, but I don’t buy that they aren’t at least extremely paranoid and moderately competent: that seems like their job.
Researchers at AGI labs seem to genuinely believe the hype they’re selling, a significant fraction of non-affiliated top-of-the-line DL researchers is inclined to believe them as well, and basically all competent well-informed people agree that the short-timelines position is not unreasonable to hold.
Dismissing short timelines based on NSA’s behavior requires assuming that they’re much more competent in the field of AI than everyone in the above list. After all, that’d require them to be strongly (and correctly) confident that all these superstar researchers above are incorrect.
While that’s not impossible, it seems highly unlikely to me. Much more likely that they’re significantly less competent, and accordingly dismissive.
This is a late reply, but at least from this article, it seems like Ilya Sutskever was running out of confidence that OpenAI would reach AGI by mid 2023. Additionally, if the rumors about GPT-5 are true, it’s mainly going to be a unification of existing models rather than something entirely new. Combined with the GPT-4.5 release, it sure seems like progress at OpenAI is slowing down rather than speeding up.
How do you know that researchers at AGI labs genuinely believe what they’re saying? Couldn’t the companies just put pressure on them to act like they believe Transformative AI is imminent? I just don’t buy that these agencies are dismissive without good reason. They’ve explored remote viewing and other ideas that are almost certainly bullshit. If they are willing to consider those possibilities, I don’t know why they wouldn’t consider the possibility of current deep learning techniques creating a national security threat. That seems like their job, and they’ve explored significantly weirder ideas.
I just don’t buy that these agencies are dismissive without good reason
On what possible publicly-unavailable evidence could they have updated in order to correctly attain such a high degree of dismissiveness?
I could think of three types of evidence:
Strong theoretical reasons.
E. g., some sort of classified, highly advanced, highly empirically supported theory of deep learning/intelligence/agency, such that you can run a bunch of precise experiments, or do a bunch of math derivations, and definitively conclude that DL/LLMs don’t scale to AGI.
Empirical tests.
E. g., perhaps the deep state secretly has 100x the compute of AGI labs, and they already ran the pretraining game to GPT-6 and been disappointed by the results.
Overriding expert opinions.
E. g., a large number of world-class best-of-the-best AI scientists with an impeccable track record firmly and unanimously saying that LLMs don’t scale to AGI. This requires either a “shadow industry” of AI experts working for the government, or for the AI-expert public speakers to be on the deep state’s payroll and lying in public about their uncertainty.
I mean, I guess it’s possible that what we see of the AI industry is just the tip of the iceberg and the government has classified research projects that are a decade ahead of the public state of knowledge. But I find this rather unlikely.
And unless we do postulate that, I don’t see any possible valid pathway by which they could’ve attained high certainty regarding the current paradigm not working out.
They’ve explored remote viewing and other ideas that are almost certainly bullshit
There are two ways we can update on it:
The fact that they investigated psychic phenomena means they’re willing to explore a wide variety of ambitious ideas, regardless of their weirdness – and therefore we should expect them not do dismiss the AGI Risk out of hand.
The fact that they investigated psychic phenomena means they have a pretty bad grip on reality – and therefore we should not expect them to get the AGI Risk right.
I never looked into it enough to know which interpretation is the correct one. Expecting less competence rather than more is usually a good rule of thumb, though.
it sure seems like progress at OpenAI is slowing down rather than speeding up
To be clear, I personally very much agree with that. But:
at least from this article, it seems like Ilya Sutskever was running out of confidence that OpenAI would reach AGI by mid 2023
I find that I’m not inclined to take Sutskever’s current claims about this at face value. He’s raising money for his thing, he has a vested interest in pushing the agenda that the LLM paradigm is a dead end and that his way is the only way. Same how it became advantageous for him to talk about the data wall once he’s no longer with the unlimited-compute company.
Again, I do believe both in LLMs being a dead end and in the data wall. But I don’t trust Sutskever to be a clean source of information regarding that, so I’m not inclined to update on his claims to that end.
Those are good points. The last thing i’ll say drastically reduces the amount of competence required by the government in order for them to be dismissive while still being rational, and it is that the leading AI labs may already be fairly confident that the current techniques of deep-learning won’t get to AGI in the near-future, so the security agencies know this as well.
That would make sense. But I doubt all AGI companies are that good at informational security and deception. This would require all of {OpenAI, Anthropic, DeepMind, Meta, xAI} to decide on the deceptive narrative, and then not fail to keep up the charade, which would require both sending the right public messages and synchronizing their research publications such that the set of paradigm-damning ones isn’t public.
In addition, how do we explain people who quit AGI companies and remain with short timelines?
I guess I would respond to the first point by saying all of the companies you mentioned have incentive to say they are closing in on AGI even if they aren’t. It doesn’t seem that sophisticated to say “we’re close to AGI” when you’re not. Mark Zuckerberg said that AI would be at the level of a junior SWE this year, and Meta proceeded to release Llama 4. Unless prognosticators at Meta seriously fucked up, the most likely scenario is that Zuckerberg made that comment knowing it was bullshit. And the sharing of research did slow down a lot in 2023, which gave companies cover to not release unflattering results.
And to your last point, it seems reasonable that companies could pressure former employees to act as if they believe AGI is imminent. And some researchers may be emotionally invested in believing that what they worked on is what will lead to superintelligence.
And my question for you is: if DeepMind had solid evidence that AGI would be here in 1 year, and if the security agencies had access to DeepMind’s evidence and reasoning, do you believe they would still do nothing?
As someone who thinks superintelligence could come in the near future, I basically agree with @snewman’s view that AIs have to automate the entire economy, or automate a sector that could then automate everything else very fast, but unfortunately for us this basically gives us no good fire alarms for AGI unless @Ege Erdil and @Matthew Barnett et al are right that takeoff is slow enough that most value comes from broad automation, and external use dominates internal use:
On YouTube, @Evbo’s parkour civilization and PVP civilization drama movies, professionally produced, set in Minecraft, and half-parody of YA dystopia serves as a surprisingly good demonstration of Instrumental Convergence (the protagonist kills or bribes most people they meet to “rank up” in the beginning), and non-human morality (the characters basically only care about the Minecraft activity of their series, without a hint of irony).
I think using existing non-AI media as an analogy for AI could be helpful, because people think that a terminator-like ASI would be robots shooting people, one of the reasons why a common suggestion for unaligned AI is to just turn it off, pour water on the servers, etc.
Random thought on opioid addiction, no offense meant to people actually dealing with addiction, but I wonder if this might be useful: I read that opioid withdrawal makes people feel pain because the brain gets accustomed to extreme levels of pain suppression and without opioids their pain tolerance is so low that everything itches and hurts. This makes me wonder if this effect is kind of similar to autistic sensory sensitivities, just turned up to 9000. Could it be that withdrawal doesn’t create pain, but simply amplifies and turns attention to small pains and discomforts that are already there, but normal people just don’t notice or get used to ignoring? If so, opioid addiction may be like a canary in the coal mine, where people get used to being in pain and lack healthy tools to deal with it. If opioid addiction is largely because of painful withdrawal rather than just pleasure, could techniques to avoid pain be helpful in dealing with opioid addiction? Autistic people often need various coping strategies, like ear plugs to avoid noise or special clothing to decrease everyday friction that normies take for granted, and they can be more sensitive to internal bodily signals like pains that most people just don’t think are a big deal. Could the same coping skills and additional treatment for mild chronic pain etc be used to help treat addiction? If teaching physical and emotional pain avoidance/management skills to addicts when they are going through withdrawal is impractical, why not also teach them to non-addicts who might be at risk or just people in general, before they have a chance to become addicted? Less pain to begin with means fewer reasons to escape pain using drugs, and more chances to learn. Maybe everyone can benefit from taking small pains and discomforts and unhappiness more seriously as a society. And I don’t mean purely mental skills—we probably shouldn’t treat addicts or people at risk of becoming addicts the same way we treat normies. When people are really sensitized or in crisis, mental tolerance, mindfulness and reframing probably isn’t very helpful. We also need more physical ways to remove causes of pain, like widely available comfortable, itch-free clothing, ergonomic beds and chairs, quality air and quiet areas, treatment and prevention of minor chronic issues like inflammation and joint damage with age, etc. Instead of telling people to tough it up, treat minor pain and unhappiness as early warnings, and normalize healthy comfort-seeking before being in crisis. Also normalize and invest in treatment and prevention of low-grade health issues that people don’t typically go to the doctor for. These may seem like luxuries but are cheaper than long-term addiction and prison.
I knew closely several opiod addicted people and had myself addicted to nicotine. Physical withdrawal symptoms is only a small part of the problem in both cases. Although I tend to agree with you on this part:
You really can thoughen up and endure days to weeks of the physical withdraval, but then you have to deal with the months to years of the psychological addiction.
Opiod addiction is like a short circuit in motivation: Normally, when some problem bothers you, you are motivated to solve it. Opioids give an illusion of all problems disappearing, and teach people this flawed behavioral pattern: Instead of solving the actual problem, just take a dose. And this becomes a vicious cycle: addicted person spends all money on drugs, it produces more problems and more urge to solve them with taking more drugs. Planning horizon reduces to hours. Some prefer to steal money to get a doze even knowing that they will be caught the same day.
Thanks for the input! If addiction is more because of psychological pain (“problems that bother you”) than direct physical pain, could the same approach work but with mental pleasures/distractions from pain instead, like games, toys or organized social activities?
Edit: And coping methods to avoid/decrease mental and social discomfort, which can include but are not limited to just therapy or communication, but could be things like new job/friends or prioritizing things in life differently. I read that some people trying to fight addiction get overwhelmed by having to get everything together at once, or being expected to just quit and function like normal immediately. If they were supported to have fun/play and feel better first in healthier ways, could it be more helpful?
Of course. And this is what many good rehabilitation programs do.
But the mere distraction is again, only a temporary solution. Patients need to relearn healthy behavioral patterns, otherwise they may fall back eventually.
Games are good in that sense that they provide a quick feedback loop. You had a problem and quickly solved it without a drug.
To do useful work you need to be deceptive.
When you and another person have different concepts of what’s good.
When both of you have the same concepts of what’s good but different models of how to get there.
This happens a lot when people are perfectionist and have aesthetic preferences for work being done in a certain way.
This happens in companies a lot. AI will work in those contexts and will be deceptive if it wants to do useful work. Actually maybe not, the dynamics will be different, like AI being neutral in some way like anybody can turn honesty mode and ask it anything.
Anyway, I think because of the way companies are structured and how humans work being slightly deceptive allows you to do useful work (I think it’s pretty intuitive for anyone who worked in a corporation or watched the office)
It probably doesn’t apply to AI corporations?
I don’t get the down votes. I do think it’s extremely simple—look at politics in general or even workplace politics, just try to google it, there even wikipedia pages roughly about what I want to talk about. I have experienced a situation where I need to do my job and my boss makes it harder for me in some way many times—being not completely honest is an obvious strategy and it’s good for the company you are working at
I think the downvotes is because the correct statement is something more like “In some situations, you can do more useful work by being deceptive.” I think this is actually what you argue for, but it’s very different from “To do useful work you need to be deceptive.”
If “To do useful work you need to be deceptive.” this means that one can’t do useful work without being deceptive. This is clearly wrong.
It seems like both me and you are able to decipher what I meant easily—why someone failed to do that
LW discussion norms is that you’re supposed to say what you mean, and not leave people to guess, because this leads to more precise communication. E.g. I guessed that you did not mean what you literary wrote, because that would be dumb, but I don’t know exactly what statement you’re arguing for.
I know this is not standard communication practice in most places, but it is actually very valuable, you should try it.
There is a tendency for the last 1% to take the longest time.
I wonder if that long last 1% will be before AGI, or ASI, or both.
I don’t think many people on lw believe that the last 1% will take the longest time—I believe many would say that the take off is exponential
I don’t necessarily believe or disbelieve in the final 1% taking the longest in this case – there are too many variables to make a confident prediction. However, it does tend to be a common occurrence.
It could very well be that the 1% before the final 1% takes the longest. Based on the past few years, progress in the AI space has been made fairly steadily, so it could also be that it continues at just this pace until that last 1% is hit, and then exponential takeoff occurs.
You could also have a takeoff event that carries from now till 99%, which is then followed by the final 1% taking a long period.
A typical exponential takeoff is, of course, very possible as well.
While the alignment community is frantically trying to convince themselves of the possibility of benevolent artificial superintelligence, the human cognition research remains undeservedly neglected.
Modern AI models are predominantly based on neural-networks, which is the so-called connectionist approach in cognitive architecture studies. But in the beginning, the symbolic approach was more popular because of the lesser computational demands. Logic programming was the means to imbue the system with the programmer’s intelligence.
Although symbolist AI researchers have studied the work of the human brain, their research was driven by attempts to reproduce the work of the brain, to create an artificial personality, rather than help programmers expressing their thoughts. The user’s ergonomics were largely ignored. Logic programming languages aimed to be the closest representation of the programmer’s thoughts. But they failed at being practically convenient. As a result, nobody is using vanilla logic programming for practical means.
In contrast to that, my research is driven by ergonomics and attempts to synchronize with the user’s thinking. For example, while proving a theorem (creating an algorithm), instead of manually composing plain texts of sophisticated language, the user sees the current context and chooses the next step from available options.
Sometimes people deliberately fill their environment with yes-men and drive out critics. Pointing out what they’re doing doesn’t help, because they’re doing it on purpose. However there are ways well intentioned people end up driving out critics unintentionally, and those are worth talking about.
The Rise and Fall of Mars Hill Church (podcast) is about a guy who definitely drove out critics deliberately. Mark Driscoll fired people, led his church to shun them, and rearranged the legal structure of the church to consolidate power. It worked, and his power was unchecked until the entire church collapsed. Yawn.
What’s interesting is who he hired after the purges. As described in a later episode, his later hiring was focused on people who were executives in the secular world. These people were great at executing on tasks, but unopinionated about what their task should be. Whatever Driscoll said was what they did.
This is something a good, feedback-craving leader could have done by accident. Hiring people who are good at the tasks you want them to do is a pretty natural move. But I think the speaker is correct (alas I didn’t write down his name) that this is anti-correlated at the tails- the best executors become so by not caring about what they’re executing.
So if you’re a leader and want to receive a healthy amount of pushback, it’s not enough to hire hypercompetent people and listen when they push back. You have to select specifically for ability to push back (including both willingness, and having good opinions).
My biggest outstanding question is “why did church network leaders give resources to a dude who had never/barely been to church to start his own?” There were probably subtler warning signs but surely they shouldn’t have been necessary once you encountered that fact and the fact that he was proud of it. If anyone has insight or sources on this I’d love to chat.
Mark Driscoll was raised Catholic, converted to evangelical Christianity at 19, got an MA in theology, connected with others who were associated with “church planting” efforts, and launched the first Mars Hill church in 1996 when he was 26. The church was initially in his home.
So it seems he may have received coaching, training, and some limited support at that early stage, but probably not enormous financial resources.
It looks like you’re right that he didn’t receive much funding via networks or sending churches. The podcast describes initial support coming from “friends and family”, in ways that sound more like a friends and family round of start-up funding than normal tithes.
I’m still under the impression that he received initial endorsements, blessings, and mentorship from people who should have known better.
Launching the first Mars church sounds like a success to me!
(In case this isn’t a joke, Mars Hill church was named after Mars Hill / the Areopagus / Hill of Ares, which in the New Testament is where the apostle Paul gives a speech to a bunch of pagans about Jesus. That hill is named after the Greek god. The church was located on Earth, in particular in Seattle.)
A former plumber posted this during a debate with a quantum field theorist:
It continues:
Curious how others would model this.
Last week I got nerdsniped with the question of why established evangelical leaders had a habit of taking charismatic narcissists and giving them support to found their own churches[1]. I expected this to be a whole saga that would teach lessons on how selecting for one set of good things secretly traded off against others. Then I found this checklist on churchplanting.com. It’s basically “tell me you’re a charismatic narcissist who will prioritize growth above virtue without telling me you’re a…“. And not charismatic in the sense of asking reasonable object-level questions that are assessed by a 3rd party and thus vulnerable to halo effects[2].
The first and presumably most important item on the checklist is “Visioning capacity”, which includes both the ability to dream that you are very important and to convince others to follow that dream. Comittment to growth has it’s own section (7), but it’s also embedded in, but there’s also section 4 (skill at attracting converts). Section 12 is Resilience, but the only specific setback mentioned is ups and downs in attendance. The very item on the list is “Can you create a grand Faith” is the last item on the 13 point list. “displaying Godly love and compassion to people” is a subheading under “6. Effectively builds relationships”.
There are other checklists that at least ask about character, so this isn’t all church planting. But it looks like the answer to “why do some evangelicals support charismatic narcissists that prioritize growth above all else...” is “because that’s what they want, presumably for the same reason lots of people value charm and growth.”
This is church planting, where the churches may advise, or fund but not have any authority over like they might in mainline denominations.
nor in the Christian sense of Charismatic
Interesting find. What about the visioning section conveyed “the dream that you are very important?” Or, alternatively, what do you mean by “dream” in this context?
In practice, newly planted churches[1] are cults of personality (neutral valence) around the planting team, or sometimes just the lead pastor[2]. “developing a theme which highlights the vision and philosophy of ministry” and “establishing a clear church identity related to the theme and vision” is inevitably[3] about selling yourself as a brand.
It’s possible to be a non-narcissist and pass this checklist, including the vision part. But it’s a lot easier if you have a high opinion of yourself, few doubts, don’t care about harming others, and love being the center of attention .
of this type. Presumably there are other types we hear less about because they don’t seek growth and publicity.
Sources: Rise and Fall of Mars Hill Church, Terminal: The Dying Church Planter
in this type
If you don’t believe in your work, consider looking for other options
I spent 15 months working for ARC Theory. I recently wrote up why I don’t believe in their research. If one reads my posts, I think it should become very clear to the reader that either ARC’s research direction is fundamentally unsound, or I’m still misunderstanding some of the very basics after more than a year of trying to grasp it. In either case, I think it’s pretty clear that it was not productive for me to work there. Throughout writing my posts, I felt an intense shame imagining readers asking the very fair question: “If you think the agenda is so doomed, why did you keep working on it?”[1]
In my first post, I write: “Unfortunately, by the time I left ARC, I became very skeptical of the viability of their agenda.”This is not quite true. I was very skeptical from the beginning, for largely similar reasons I expressed in my posts. But first I told myself that I should stay a little longer. Either they manage to convince me that the agenda is sound, or I demonstrate that it doesn’t work, in which case I free up the labor of the group of smart people working on the agenda. I think this was initially a somewhat reasonable position, though it was already in large part motivated reasoning.
But half a year after joining, I don’t think this theory of change was very tenable anymore. It was becoming clear that our arguments were going in circles. I couldn’t convince Paul and Mark (the two people thinking the most about the big picture questions), nor could they convince me. Eight months in, two friends visited me in California, and they noticed that I always derailed the conversation when they asked me about my research. I think that should have been an important thing to notice that I was ashamed to talk about my research to my friends, because I was afraid they would see how crazy it was. I should have quit then, but I stayed for another seven months.
I think this was largely due to cowardice. I’m very bad at coding and all my previous attempts at upskilling in coding went badly.[2] I thought of my main skill as being a mathematician, and I wanted to keep working on AI safety. The few other places one can work as a mathematician in AI safety looked even less promising to me than ARC. I was afraid that if I quit, I wouldn’t find anything else to do.
In retrospect, this fear was unfounded. I realized there were other skills one can develop, not just coding. In my afternoons, I started reading a lot more papers and serious blog posts [3] from various branches of AI safety. After a few months, I felt I had much more context on many topics. I started to think more about what I can do with my non-mathematical skills. When I finally started applying for jobs, I got an offer from the European AI Office and UKAISI, and it looked more likely than not that I would get an offer from Redwood. [4]
Other options I considered that looked less promising than the three above, but still better than staying at ARC: Team up with some Hungarian coder friends and execute some simple but interesting experiments I had vague plans for. [5] Assemble a good curriculum for the prosaic AI safety agendas that I like. Apply for a grant-maker job. Become a Joe Carlsmith-style general investigator. Try to become a journalist or an influential blogger. Work on crazy acausal trade stuff.
I still think many of these were good opportunities, and probably there are many others. Of course, different options are good for people with different skill profiles, but I really believe that the world is ripe with opportunities to be useful for people who are generally smart and reasonable and have enough context on AI safety. If you are working on AI safety but don’t really believe that your day-to-day job is going anywhere, remember that having context and being ingrained in the AI safety field is a great asset in itself,[6] and consider looking for other projects to work on.
(Important note: ARC was a very good workplace, my coworkers were very nice to me and receptive to my doubts, and I really enjoyed working there except for feeling guilty that my work is not useful. I’m also not accusing the people who continue working at ARC of being cowards in the way I have been. They just have a different assessment of ARC’s chances, or work on lower-level questions than I have, where it can be reasonable to just defer to others on the higher-level questions.)
(As an employee of the European AI Office, it’s important for me to emphasize this point: The views and opinions of the author expressed herein are personal and do not necessarily reflect those of the European Commission or other EU institutions.)
No, really, it felt very bad writing the posts. It felt like describing how I worked for a year on a scheme that was either trying to build perpetuum mobile machines, or trying to build normal cars, I just missed the fact that gasoline exists. Embarrassing either way.
I don’t know why. People keep telling me that it should be easy to upskill, but for some reason it is not.
I particularly recommend Redwood’s blog.
We didn’t fully finish the work trial as I decided that the EU job was better.
Think of things in the style of some of Owain Evans’ papers or experiments on faithful chain of thought.
And having more context and knowledge is relatively easy to further improve by reading for a few months. It’s a young field.
How exactly are you measuring coding ability? What are the ways you’ve tried to upskill, and what are common failure modes? Can you describe your workflow at a high-level, or share a recording? Are you referring to competence at real world engineering tasks, or performance on screening tests?
There’s a chrome extension which lets you download leetcode questions as jupyter notebooks: https://github.com/k-erdem/offlineleet. After working on a problem, you can make a markdown cell with notes and convert it into flashcards for regular review: https://github.com/callummcdougall/jupyter-to-anki.
I would suggest scheduling calls with friends for practice sessions so that they can give you personalized feedback about what you need to work on.
I disagree. Instead, I think that either ARC’s research direction is fundamentally unsound, or you’re still misunderstanding some of the finer details after more than a year of trying to grasp it. Like, your post is a few layers deep in the argument tree, and the discussions we had about these details (e.g. in January) went even deeper. I don’t really have a position on whether your objections ultimately point at an insurmountable obstacle for ARC’s agenda, but if they do, I think one needs to really dig into the details in order to see that.
(ETA: I agree with your post overall, though!)
That’s not how I see it. I think the argument tree doesn’t go very deep until I lose the the thread. Here are a few, slightly stylized but real, conversations I had with friends who had no context on what ARC was doing, when I tried to explain our research to them:
Me: We want to to do Low Probability Estimation.
Them: Does this mean you want to estimate the probability that ChatGPT says a specific word after a 100 words on chain of thought? Isn’t this clearly impossible?
Me: No, you see, we only want to estimate the probabilities only as well as the model knows.
Them: What does this mean?
Me: [I can’t answer this question.]
Me: We want to do Mechanistic Anomaly Detection.
Them: Isn’t this clearly impossible? Won’t this result in a lot of false positives when anything out of distribution happens?
Me: Yes, why we have this new clever idea of relying on the fragility of sensor tampering, that if you delete a subset of the actions, you will get an inconsistent image.
Them: What if the AI builds another robot to tamper with the cameras?
Me: We actually don’t want to delete actions but heuristic arguments for why the cameras will show something, and we want to construct heuristic explanations in a way that they carry over through delegated actions.
Them: What does this mean?
Me; [I can’t answer this question.]
Me: We want to create Heuristic Arguments to explain everything the model does.
Them: What does it mean that an argument explained a behavior? What is even the type signature of heuristic arguments? And you want to explain everything a model does? Isn’t this clearly impossible?
Me: [I can’t answer this question.]
When I was explaining our research to outsiders (which I usually tried to avoid out of cowardice), we usually got to some of these points within minutes. So I wouldn’t say these are fine details of our agenda.
During my time at ARC, the majority of my time was spent on asking variations of these three questions from Mark and Paul. They always kindly answered, and the answer was convincing-sounding enough for the moment that I usually couldn’t really reply on the spot, and then I went back to my room to think through their answers. But I never actually understood their answers, and I can’t reproduce them now. Really, I think that was the majority of work I did at ARC. When I left, you guys should have bought a rock with “Isn’t this clearly impossible?” written on it, and that would profitably replace my presence.
That’s why I’m saying that either ARC’s agenda is fundamentally unsound or I’m still missing some of the basics. What is standing between ARC’s agenda collapsing from five minutes of questioning from an outsider is that Paul and Mark (and maybe others in the team) have some convincing-sounding answers to the three questions above. So I would say that these answers are really part of the basics, and I never understood them.
Maybe Mark will show up in the comments now to give answers to the three questions, and I expect the answers to sound kind of convincing, and I won’t have a very convincing counter-argument other than some rambling reply saying essentially that “I think this argument is missing the point and doesn’t actually answer the question, but I can’t really point out why, because I don’t actually understand the argument because I don’t understand how you imagine heuristic arguments”. (This is what happened in the comments on my other post, and thanks to Mark for the reply and I’m sorry for still not understanding it.) I can’t distinguish whether I’m just bad at understanding some sound arguments here, or the arguments are elaborate self-delusions of people who are smarter and better at arguments than me. In any case, I feel epistemic learned helplessness on some of these most basic questions in ARC’s agenda.
What is your opinion on the Low Probability Estimation paper published this year at ICLR?
I don’t have a background in the field, but it seems like they were able to get some results, that indicate the approach is able to extract some results. https://arxiv.org/pdf/2410.13211
It’s a nice paper, and I’m glad they did the research, but importantly, the paper reports a negative result about our agenda. The main result is that the method inspired by our ideas under-performs the baseline. Of course, these are just the first experiments, work is ongoing, this is not conclusive negative evidence for anything. But the paper certainly shouldn’t be counted as positive evidence for ARC’s ideas.
Thanks for the clarification! Not in the field and wasn’t sure I understood the meaning of the results correctsly.
Do you think that it would be worth it to try to partially sort this out in a LW dialogue?
IME, in the majority of cases, when I strongly felt like quitting but was also inclined to justify “staying just a little bit longer because XYZ”, and listened to my justifications, staying turned out to be the wrong decision.
Relevant classic paper from Steven Levitt. Abstract [emphasis mine]:
Pretty much the whole causal estimate comes down to the influence of happiness 6 months after quitting a job or breaking up. Almost everything else is swamped with noise. The only individual question with a consistent causal effect larger than the standard error was “should I break my bad habit?”, and doing so made people unhappier. Even for those factors, there’s a lot of biases in this self-report data, which the authors noted and tried to address. I’m just not sure what we can really learn from this, even though it is a fun study.
If you want to upskill in coding, I’m open to tutoring you for money.
I keep seeing the first clause as “I don’t believe in your work”.
Fun fact: AI-2027 estimates that getting to ASI might take the equivalent of a 100-person team of top human AI research talent working for tens of thousands of years.
I’m curious why ASI would take so much work. What exactly is the R&D labor supposed to be doing each day, that adds up to so much effort? I’m curious how people are thinking about that, if they buy into this kind of picture. Thanks :)
(Calculation details: For example, in October 2027 of the AI-2027 modal scenario, they have “330K superhuman AI researcher copies thinking at 57x human speed”, which is 1.6 million person-years of research in that month alone. And that’s mostly going towards inventing ASI, I think. Did I get that right?)
(My own opinion, stated without justification, is that LLMs are not a paradigm that can scale to ASI, but after some future AI paradigm shift, there will be very very little R&D separating “this type of AI can do anything importantly useful at all” and “full-blown superintelligence”. Like maybe dozens or hundreds of person-years, or whatever, as opposed to millions. More on this in a (hopefully) forthcoming post.)
Whew, a critique that our takeoff should be faster for a change, as opposed to slower.
This depends on how large you think the penalty is for parallelized labor as opposed to serial. If 330k parallel researchers is more like equivalent to 100 researchers at 50x speed than 100 researchers at 3,300x speed, then it’s more like a team of 100 researchers working for (50*57)/12=~250 years.
Also of course to the extent you think compute will be an important input, during October they still just have a month’s worth of total compute even though they’re working for 250-25,000 subjective years.
I’m imagining that there’s a mix of investing tons of effort into optimizing experimenting ideas, implementing and interpreting every experiment quickly, as well as tons of effort into more conceptual agendas given the compute shortage, some of which bear fruit but also involve lots of “wasted” effort exploring possible routes, and most of which end up needing significant experimentation as well to get working.
I don’t share this intuition regarding the gap between the first importantly useful AI and ASI. If so, that implies extremely fast takeoff, correct? Like on the order of days from AI that can do important things to full-blown superintelligence?
Currently there are hundreds or perhaps low thousands of years of relevant research effort going into frontier AI each year. The gap between importantly useful AI and ASI seems larger than a year of current AI progress (though I’m not >90% confident in that, especially if timelines are <2 years). Then we also need to take into account diminishing returns, compute bottlenecks, and parallelization penalties, so my guess is that the required person-years should be at minimum in the thousands and likely much more. Overall the scenario you’re describing is maybe (roughly) my 95th percentile speed?
I’m curious about your definition for importantly useful AI actually. Under some interpretations I feel like current AI should cross that bar.
I’m uncertain about the LLMs thing but would lean toward pretty large shifts by the time of ASI; I think it’s more likely LLMs scale to superhuman coders than to ASI.
Thanks, that’s very helpful!
If we divide the inventing-ASI task into (A) “thinking about and writing algorithms” versus (B) “testing algorithms”, in the world of today there’s a clean division of labor where the humans do (A) and the computers do (B). But in your imagined October 2027 world, there’s fungibility between how much compute is being used on (A) versus (B). I guess I should interpret your “330K superhuman AI researcher copies thinking at 57x human speed” as what would happen if the compute hypothetically all went towards (A), none towards (B)? And really there’s gonna be some division of compute between (A) and (B), such that the amount of (A) is less than I claimed? …Or how are you thinking about that?
Right, but I’m positing a discontinuity between current AI and the next paradigm, and I was talking about the gap between when AI-of-that-next-paradigm is importantly useful versus when it’s ASI. For example, AI-of-that-next-paradigm might arguably already exist today but where it’s missing key pieces such that it barely works on toy models in obscure arxiv papers. Or here’s a more concrete example: Take the “RL agent” line of AI research (AlphaZero, MuZero, stuff like that), which is quite different from LLMs (e.g. “training environment” rather than “training data”, and there’s nothing quite like self-supervised pretraining (see here)). This line of research has led to great results on board games and videogames, but it’s more-or-less economically useless, and certainly useless for alignment research, societal resilience, capabilities research, etc. If it turns out that this line of research is actually much closer to how future ASI will work at a nuts-and-bolts level than LLMs are (for the sake of argument), then we have not yet crossed the “AI-of-that-next-paradigm is importantly useful” threshold in my sense.
If it helps, here’s a draft paragraph from that (hopefully) forthcoming post:
Next:
Well, even if you have an ML training plan that will yield ASI, you still need to run it, which isn’t instantaneous. I dunno, it’s something I’m still puzzling over.
…But yeah, many of my views are pretty retro, like a time capsule from like AI alignment discourse of 2009. ¯\_(ツ)_/¯
I can somewhat see where you’re coming from about a new method being orders of magnitude more data efficient in RL, but I very strongly bet on transformers being core even after such a paradigm shift. I’m curious whether you think the transformer architecture and text input/output need to go, or whether the new training procedure / architecture fits in with transformers because transformers are just the best information mixing architecture.
My guess the main issue of current transformers turns out to be the fact that they don’t have a long-term state/memory, and I think this is a pretty critical part of how humans are able to learn on the job as effectively as they do.
The trouble as I’ve heard it is the other approaches which incorporate a state/memory for the long-run are apparently much harder to train reasonably well than transformers, plus first-mover effects.
Sorry for the late reply.
I’m not 100% sure what you mean, but my guess is that you mean (B) to represent the compute used for experiments? We do project a split here and the copies/speed numbers are just for (A). You can see our projections for the split in our compute forecast (we are not confident that they are roughly right).
Re: the rest of your comment, makes sense. Perhaps the place I most disagree is that if LLMs will be the thing discovering the new paradigm, they will probably also be useful for things like automating alignment research, epistemics, etc. Also if they are misaligned they could sabotage the research involved in the paradigm shift.
That does raise my eyebrows a bit, but also, note that we currently have hundreds of top-level researchers at AGI labs tirelessly working day in and day out, and that all that activity results in a… fairly leisurely pace of progress, actually.[1]
Recall that what they’re doing there is blind atheoretical empirical tinkering (tons of parallel experiments most of which are dead ends/eke out scant few bits of useful information). If you take that research paradigm and ramp it up to superhuman levels (without changing the fundamental nature of the work), maybe it really would take this many researcher-years.
And if AI R&D automation is actually achieved on the back of sleepwalking LLMs, that scenario does seem plausible. These superhuman AI researchers wouldn’t actually be generally superhuman researchers, just superhuman at all the tasks in the blind-empirical-tinkering research paradigm. Which has steeply declining returns to more intelligence added.
That said, yeah, if LLMs actually scale to a “lucid” AGI, capable of pivoting to paradigms with better capability returns on intelligent work invested, I expect it to take dramatically less time.
It’s fast if you use past AI progress as the reference class, but is decidedly not fast if you try to estimate “absolute” progress. Like, this isn’t happening, we’ve jumped to near human-baseline and slowed to a crawl at this level. If we assume the human level is the ground and we’re trying to reach the Sun, it in fact might take millennia at this pace.
A possible reason for that might be the fallibility of our benchmarks. It might be the case that for complex tasks, it’s hard for humans to see farther than their nose.
Incidentally, is there any meaningful sense in which we can say how many “person-years of thought” LLMs have already done?
We know they can do things in seconds that would take a human minutes. Does that mean those real-time seconds count as “human-minutes” of thought? Etc.
The short version is getting compute-optimal experiments to self-improve yourself, training to do tasks that unavoidably take a really long time to learn/get data on because of real-world experimentation being necessary, combined with a potential hardware bottleneck on robotics that also requires real-life experimentation to overcome.
Another point is that to the extent you buy the scaling hypothesis at all, then compute bottlenecks will start to bite, and given that researchers will seek small constant improvements they don’t generalize, and this can start a cascade of wrong decisions that could take a very long time to get out of.
I’d like to see that post, and I’d like to see your arguments on why it’s so easy for intelligence to be increased so fast, conditional on a new paradigm shift.
(For what it’s worth, I personally think LLMs might not be the last paradigm, because of their current lack of continuous learning/neuroplasticity plus no long term memory/state, but I don’t expect future paradigms to have an AlphaZero like trajectory curve, where things go from zero to wildly superhuman in days/weeks, though I do think takeoff is faster if we condition on a new paradigm being required for ASI, so I do see the AGI transition to plausibly include having only months until we get superintelligence, and maybe only 1-2 years before superintelligence starts having very, very large physical impacts through robotics, assuming that new paradigms are developed, so I’m closer to hundreds of person years/thousands of person years than dozens of person years).
The world is complicated (see: I, Pencil). You can be superhuman by only being excellent at a few fields, for example politics, persuasion, military, hacking. That still leaves you potentially vulnerable, even if your opponents are unlikely to succeed; or you could hurt yourself by your ignorance in some field. Or you can be superhuman in the sense of being able to make the pencil from scratch, only better at each step. That would probably take more time.
Are you suggesting that e.g. “R&D Person-Years 463205–463283 go towards ensuring that the AI has mastery of metallurgy, and R&D Person-Years 463283–463307 go towards ensuring that the AI has mastery of injection-molding machinery, and …”?
If no, then I don’t understand what “the world is complicated” has to do with “it takes a million person-years of R&D to build ASI”. Can you explain?
…Or if yes, that kind of picture seems to contradict the facts that:
This seems quite disanalogous to how LLMs are designed today (i.e., LLMs can already answer any textbook question about injection-molding machinery, but no human doing LLM R&D has ever worked specifically on LLM knowledge of injection-molding machinery),
This seems quite disanalogous to how the human brain was designed (i.e., humans are human-level at injection-molding machinery knowledge and operation, but Evolution designed human brains for the African Savannah, which lacked any injection-molding machinery).
Yes, I meant it that way.
LLMs quickly acquired the capacity to read what humans wrote and paraphrase it. It is not obvious to me (though that may speak more about my ignorance) that it will be similarly easy to acquire deep understanding of everything.
But maybe it will. I don’t know.
Is there a way to use policy markets to make FDT decisions instead of EDT decisions?
I think the first question to think about is how to use them to make CDT decisions. You can create a market about a causal effect if you have control over the decision and you can randomise it to break any correlations with the rest of the world, assuming the fact that you’re going to randomise it doesn’t otherwise affect the outcome (or bettors don’t think it will).
Committing to doing that does render the market useless for choosing policy, but you could randomly decide whether to randomise or to make the decision via whatever the process you actually want to use, and have the market be conditional on the former. You probably don’t want to be randomising your policy decisions too often, but if liquidity wasn’t an issue you could set the probability of randomisation arbitrarily low.
Then FDT… I dunno, seems hard.
Yep!
“If I randomize the pick, and pick A, will I be happy about the result?” “If I randomize the pick, and pick B, will I be happy about the result?”
Randomizing 1% of the time and adding a large liquidity subsidy works to produce CDT.
I agree with all of this! A related shortform here.
An interesting development in the time since your shortform was written is that we can now try these ideas out without too much effort via Manifold.
Anyone know of any examples?
Worked on this with Demski. Video, report.
Any update to the market is (equivalent to) updating on some kind of information. So all you can do is dynamically choose what to do or do not update on.* Unfortunately, whenever you choose not to update on something, you are giving up on the asymptotic learning guarantees of policy market setups. So the strategic gains from updatelesness (like not falling into traps) are in a fundamental sense irreconcilable with the learning gains from updatefulness. That doesn’t prevent that you can be pretty smart about deciding what to update on exactly… but due to embededness problems and the complexity of the world, it seems to be the norm (rather than the exception) that you cannot be sure a priori of what to update on (you just have to make some arbitrary choices).
*For avoidance of doubt, what matters for whether you have updated on X is not “whether you have heard about X”, but rather “whether you let X factor into your decisions”. Or at least, this is the case for a sophisticated enough external observer (assessing whether you’ve updated on X), not necessarily all observers.
Reading Resources for Technical AI Safety independent researchers upskilling to apply to roles:
GabeM—Leveling up in AI Safety Research
EA—Technical AI Safety
Michael Aird: Write down Theory of Change
Marius Hobbhahn—Advice for Independent Research
Rohin Shah—Advice for AI Alignment Researchers
gw—Working in Technical AI Safety
Richard Ngo—AGI Safety Career Advice
rmoehn—Be careful of failure modes
Bilal Chughtai—Working at a frontier lab
Upgradeable—Career Planning
Neel Nanda—Improving Research Process
Neel Nanda—Writing a Good Paper
Ethan Perez—Tips for Empirical Alignment Research
Ethan Perez—Empirical Research Workflows
Gabe M—ML Research Advice
Lewis Hommend—AI Safety PhD advice
Adam Gleave—AI Safety PhD advice
Application and Upskilling resources;
Job Board
Events and Training
I’ve been thinking through the following philosophical argument for the past several months.
1. Most things that currently exist have properties that allow them to continue to exist for a significant amount of time and propagate, since otherwise, they would cease existing very quickly.
2. This implies that most things capable of gaining adaptations, such as humans, animals, species, ideas, and communities, have adaptations for continuing to exist.
3. This also includes decision-making systems and moral philosophies.
4. Therefore, one could model the morality of such things as tending towards the ideal of perfectly maintaining their own existence and propagating as much as possible.
Many of the consequences of this approximation of the morality of things seem quite interesting. For instance, the higher-order considerations of following an “ideal” moral system (that is, utilitarianism using a measure of one’s own continued existence at a point in the future) lead to many of the same moral principles that humans actually have (e.g. cooperation, valuing truth) while also avoiding a lot of the traps of other systems (e.g. hedonism). This chain of thought has led me to believe that existence itself could be a principal component of real-life morality.
While it does have a lot of very interesting conclusions, I’m very concerned that if I were to write about it, I would receive 5 comments directing me to some passage by a respected figure that already discusses the argument, especially given the seemingly incredibly obvious structure it has. However, I’ve searched through LW and tried to research the literature as well as I can (through Google Scholar, Elicit, and Gemini, for instance), but I must not have the right keywords, since I’ve come up fairly empty, other than for philosophers with vaguely similar sounding arguments that don’t actually get at the heart of the matter (e.g. Peter Singer’s work comes up a few times, but he particularly focused on suffering rather than existence itself, and certainly didn’t use any evolutionary-style arguments to reach that conclusion).
If this really hasn’t been written about extensively anywhere, I would update towards believing the hypothesis that there’s actually some fairly obvious flaw that renders it unsound, stopping it from getting past, say, the LW moderation process or the peer review process. As such, I suspect that there is some issue with it, but I’ve not really been able to pinpoint what exactly stops someone from using existence as the fundamental basis of moral reasoning.
Would anyone happen to know of links that do directly explore this topic? (Or, alternatively, does anyone have critiques of this view that would spare me the time of writing more about this if this isn’t true?)
I don’t know any source. My first though it, if you define morality as “the thought system that propagates itself as much as possible”, what makes it different from other thought systems that propagate themselves as much as possible? If memetic survival s the whole story, why do different things exist, as opposed to everything converging on the most virulent form?
As for your first question, there are certainly other thought systems (or I suppose decision theories) that allow a thing to propagate itself, but I highlight a hypothetical decision theory that would be ideal in this respect. Of course, given that things are different from each other (as you mention), this ideal decision theory would necessarily be different for each of them.
Additionally, as the ideal decision theory for self-propagation is computationally intractable to follow, “the most virulent form” isn’t[1] actually useful for anything that currently exists. Instead, we see more computationally tractable propagation-based decision theories based on messy heuristics that happened to correlate with existence in the environment where such heuristics were able to develop.
For your final question, I don’t think that this theory explains initial conditions like having several things in the universe. Other processes analogous to random mutation, allopatric speciation, and spontaneous creation (that is, to not only species, but ideas, communities, etc.) would be better suited for answering such questions. “Propagative decision theory” does have some implications for the decision theories of things that can actually follow a decision theory, as well as giving a very solid indicator on otherwise unsolvable/controversial moral quandaries (e.g. insect suffering), but it otherwise only really helps as much as evolutionary psychology when it comes to explaining properties that already exist.
Other than in the case that some highly intelligent being manages to apply this theory well enough to do things like instrumental convergence that the ideal theory would prioritize, in which case this paragraph suddenly stops applying.
Harder questions (e.g. around 50% average instead of around 90% average) seem better for differentiating students’ understanding for at least two reasons:
-The graph of percent of students who got a question correct as a function of the difficulty of the question tends to follow a sigmoid-ish curve where the fastest increase is around the middle.
-Some of a students’ incorrect answers on the test are going to come from sources that (a) the student can prepare to mitigate (b) aren’t caused by lack of whatever students should be being tested for (e.g. questions with ambiguous meanings, questions that require an understanding of a niche framework that is never used outside of the curriculum, questions that have shortcuts that the teacher didn’t recognize, etc). Ideally, we don’t want any differences in student test results to be based off of these things, but harder tests at least mitigate the issue since understanding (or whatever stuff which we do want students to spend time on) becomes a more important cause for incorrect student selections.
(Neither of these are hard-and-fast-rules but the general pattern seems to hold based on my experience as a student.)
links 5/20/2025: https://roamresearch.com/#/app/srcpublic/page/05-20-2025
psalm 51:
https://en.wikipedia.org/wiki/Psalm_51 English, Hebrew, Greek
https://kpshaw.blogspot.com/2015/03/miserere-mei-deus.html Latin
https://www.youtube.com/watch?v=36Y_ztEW1NE Allegri Miserere
https://www.understandingai.org/p/i-got-fooled-by-ai-for-science-hypeheres plasma physicist gets disappointing results trying to implement “physics-inspired neural networks” for solving PDEs. the example in the paper works, but a different equation doesn’t converge at all to the known closed-form solution & he can’t find any settings that get it to work.
Beware mistaking a “because” for an “and”. Sometimes you think something is X and Y, but it turns out to be X because Y.
For instance, I was recently at a metal concert, and helped someone off the ground in a mosh pit. Someone thanked me afterwards but to me it seemed like the most obvious thing in the world.
A mosh pit is not fun AND a place where everyone helps each other. It is fun BECAUSE everyone helps each other. Play-acting aggression while being supportive is where the fun is born.
X because Y implies X and Y, though not the other way round.
I think AI development is mainly compute constrained (relevant for intelligence explosion dynamics).
There are some arguments against, based on the high spending of firms on researcher and engineer talent. The claim is that this supports one or both of a) large marginal returns to having more (good) researchers or b) steep power laws in researcher talent (implying large production multipliers from the best researchers).
Given that the workforces at labs remain not large, I think the spending naively supports (b) better.
But in fact I think there is another, even better explanation:
Researchers’ taste (an AI production multiplier) varies more smoothly
(research culture/collective intelligence of a team or firm may be more important)
Marginal parallel researchers have very diminishing AI production returns (sometimes negative, when the researchers have worse taste)
(also determining a researcher’s taste ex ante is hard)
BUT firms’ utility is sharply convex in AI production
capturing more accolades and market share are basically the entire game
spending as much time as possible with a non-commoditised offering allows profiting off fast-evaporating margin
so firms are competing over getting cool stuff out first
time-to-delivery of non-commoditised (!) frontier models
and getting loyal/sticky customer bases
ease-of-adoption of product wrapping
sometimes differentiation of offerings
this turns small differences in human capital/production multiplier/research taste into big differences in firm utility
so demand for the small pool of the researchers with (legibly) great taste is very hot
This also explains why it’s been somewhat ‘easy’ (but capital intensive) for a few new competitors to pop into existence each year, and why firms’ revealed preferred savings rate into compute capital is enormous (much greater than 100%!).
We see token prices drop incredibly sharply, which supports the non-commoditised margin claim (though this is also consistent with a Wright’s Law effect from (runtime) algorithmic efficiency gains, which should definitely also be expected).
A lot of engineering effort is being put into product wrappers and polish, which supports the customer base claim.
The implications include: headroom above top human expert teams’ AI research taste could be on the small side (I think this is right for many R&D domains, because a major input is experimental throughput). So both quantity and quality of (perhaps automated) researchers should have steeply diminishing returns in AI production rate. But might they nevertheless unlock a practical monopoly (or at least an increasingly expensive barrier to entry) on AI-derived profit, by keeping the (more monetisable) frontier out of reach of competitors?
Who predicted that AI will have a multi-year “everything works” period where the prerequisite pieces come together and suddenly every technique works on every problem? Like before electricity you had to use the right drill bit or saw blade for a given material, but now you can cut anything with anything if you are only slightly patient.
Sometimes people talk about how AIs will be very superhuman at a bunch of (narrow) domains. A key question related to this is how much this generalizes. Here are two different possible extremes for how this could go:
It’s effectively like an attached narrow weak AI: The AI is superhuman at things like writing ultra fast CUDA kernels, but from the AI’s perspective, this is sort of like it has a weak AI tool attached to it (in a well integrated way) which is superhuman at this skill. The part which is writing these CUDA kernels (or otherwise doing the task) is effectively weak and can’t draw in a deep way on the AI’s overall skills or knowledge to generalize (likely it can shallowly draw on these in a way which is similar to the overall AI providing input to the weak tool AI). Further, you could actually break out these capabilities into a separate weak model that humans can use. Humans would use this somewhat less fluently as they can’t use it as quickly and smoothly due to being unable to instantaneously translate their thoughts and not being absurdly practiced at using the tool (like AIs would be), but the difference is ultimately mostly convenience and practice.
Integrated superhumanness: The AI is superhuman at things like writing ultra fast CUDA kernels via a mix of applying relatively general (and actually smart) abilities, having internalized a bunch of clever cognitive strategies which are applicable to CUDA kernels and sometimes to other domains, as well as domain specific knowledge and heuristics. (Similar to how humans learn.) The AI can access and flexibly apply all of the things it learned from being superhuman at CUDA kernels (or whatever skill) and with a tiny amount of training/practice it can basically transfer all these things to some other domain even if the domain is very different. The AI is at least as good at understanding and flexibly applying what it has learned as humans would be if they learned the (superhuman) skill to the same extent (and perhaps the AIs are actually much better at this than humans). You can’t separate these capabilities into a weak model, the weak model RL’d on this (and distilled into) would either be much worse at CUDA or would need to actually be generally quite capable (rather than weak).
My sense is that the current frontier LLMs are much closer to (1) than (2) for most of their skills, particularly the skills which they’ve been heavily trained on (e.g. next token prediction or competitive programming). As AIs in the current paradigm get more capable, they appear to shift some toward (2) and I expect that at the point when AIs are capable of automating virtually all cognitive work that humans can do, we’ll be much closer to (2). That said, it seems likely that powerful AIs built in the current paradigm[1] which otherwise match humans at downstream performance will somewhat lag behind humans in integrating/generalizing skills they learn (at least without spending a bunch of extra compute on skill integration) because this ability currently seems to be lagging behind other capabilities relative to humans and AIs can compensate for worse skill integration with other advantages (being extremely knowledgeable, fast speed, parallel training on vast amounts of relevant data including “train once, deploy many”, better memory, faster and better communication, etc).
I think different views about the extent to which future powerful AIs will deeply integrate their superhuman abilities versus these abilities being shallowly attached partially drive some disagreements about misalignment risk and what takeoff will look like.
If the paradigm radically shifts by the time we have powerful AIs, then the relative level of integration is much less clear.
I suppose that most tasks that an LLM can accomplish could theoretically be performed more efficiently by a dedicated program optimized for that task (and even better by a dedicated physical circuit). Hypothesis 1) amounts to considering that such a program, a dedicated module within the model, is established during training. This module can be seen as a weak AI used as a tool by the stronger AI. A bit like how the human brain has specialized modules that we (the higher conscious module) use unconsciously (e.g., when we read, the decoding of letters is executed unconsciously by a specialized module).
We can envision that at a certain stage the model becomes so competent in programming that it will tend to program code on the fly, a tool, to solve most tasks that we might submit to it. In fact, I notice that this is already increasingly the case when I ask a question to a recent model like Claude Sonnet 3.7. It often generates code, a tool, to try to answer me rather than trying to answer the question ‘itself.’ It clearly realizes that dedicated code will be more effective than its own neural network. This is interesting because in this scenario, the dedicated module is not generated during training but on the fly during normal production operation. In this way, it would be sufficient for AI to become a superhuman programmer to become superhuman in many domains thanks to the use of these tool-programs. The next stage would be the on-the-fly production of dedicated physical circuits (FPGA, ASIC, or alien technology), but that’s another story.
This refers to the philosophical debate about where intelligence resides: in the tool or in the one who created it? In the program or in the programmer? If a human programmer programs a superhuman AI, should we attribute this superhuman intelligence to the programmer? Same question if the programmer is itself an AI? It’s the kind of chicken and egg debate where the answer depends on how we divide the continuity of reality into discrete categories. You’re right that integration is an interesting criterion as it is a kind of formal / non arbitrary solution to this problem of defining discrete categories among the continuity of reality.
Good articulation.
People also disagree greatly about how much humans tend towards integration rather than non-integration, and how much human skill comes from domain transfer. And I think some / a lot of the beliefs about artificial intelligence are downstream of these beliefs about the origins of biological intelligence and human expertise, i.e., in Yudkowsky / Ngo dialogues. (Object level: Both the LW-central and alternatives to the LW-central hypotheses seem insufficiently articulated; they operate as a background hypothesis too large to see rather than something explicitly noted, imo.)
Makes me wonder whether most of what people believe to be “domain transfer” could simply be IQ.
I mean, suppose that you observe a person being great at X, then you make them study Y for a while, and it turns out that they are better at Y than an average person who spend the same time studying Y.
One observer says: “Clearly some of the skills at X have transferred to the skills of Y.”
Another observer says: “You just indirectly chose a smart person (by filtering for high skills at X), duh.”
This seems important to think about, I strong upvoted!
I’m not sure that link supports your conclusion.
First, the paper is about AI understanding its own behavior. This paper makes me expect that a CUDA-kernel-writing AI would be able to accurately identify itself as being specialized at writing CUDA kernels, which doesn’t support the idea that it would generalize to non-CUDA tasks.
Maybe if you asked the AI “please list heuristics you use to write CUDA kernels,” it would be able to give you a pretty accurate list. This is plausibly more useful for generalizing, because if the model can name these heuristics explicitly, maybe it can also use the ones that generalize, if they do generalize. This depends on 1) the model is aware of many heuristics that it’s learned, 2) many of these heuristics generalize across domains, and 3) it can use its awareness of these heuristics to successfully generalize. None of these are clearly true to me.
Second, the paper only tested GPT-4o and Llama 3, so the paper doesn’t provide clear evidence that more capable AIs “shift some towards (2).” The authors actually call out in the paper that future work could test this on smaller models to find out if there are scaling laws—has anybody done this? I wouldn’t be too surprised if small models were also able to self-report simple attributes about themselves that were instilled during training.
Fair, but I think the AI being aware of its behavior is pretty continuous with being aware of the heuristics it’s using and ultimately generalizing these (e.g., in some cases the AI learns what code word it is trying to make the user say which is very similar to being aware of any other aspect of the task it is learning). I’m skeptical that very weak/small AIs can do this based on some other papers which show they fail at substantially easier (out-of-context reasoning) tasks.
I think most of the reason why I believe this is improving with capabilities is due to a broader sense of how well AIs generalize capabilities (e.g., how much does o3 get better at tasks it wasn’t trained on), but this paper was the most clearly relevant link I could find.
I’m not sure o3 does get significantly better at tasks it wasn’t trained on. Since we don’t know what was in o3′s training data, it’s hard to say for sure that it wasn’t trained on any given task.
To my knowledge, the most likely example of a task that o3 does well on without explicit training is GeoGuessr. But see this Astral Codex Ten post, quoting Daniel Kang:[1]
I think this is a bit overstated, since GeoGuessr is a relatively obscure task, and implementing an idea takes much longer than thinking of it.[2] But it’s possible that o3 was trained on GeoGuessr.
The same ACX post also mentions:
Do you have examples in mind of tasks that you don’t think o3 was trained on, but which it nonetheless performs significantly better at than GPT-4o?
Disclaimer: Daniel happens to be my employer
Maybe not for cracked OpenAI engineers, idk
I would guess that OpenAI has trained on GeoGuessr. It should be pretty easy to implement—just take images off the web which have location metadata attached, and train to predict the location. Plausibly getting good at Geoguessr imbues some world knowledge.
I think this might be wrong when it comes to our disagreements, because I don’t disagree with this shortform.[1] Maybe a bigger crux is how valuable (1) is relative to (2)? Or the extent to which (2) is more helpful for scientific progress than (1)?
As long as “downstream performance” doesn’t include downstream performance on tasks that themselves involve a bunch of integrating/generalising.
I don’t think this explains our disagreements. My low confidence guess is we have reasonably similar views on this. But, I do think it drives parts of some disagreements between me and people who are much more optimisitic than me (e.g. various not-very-concerned AI company employees).
I agree the value of (1) vs (2) might also be a crux in some cases.
Is the crux that the more optimistic folks plausibly agree (2) is cause for concern, but believe that mundane utility can be reaped with (1), and they don’t expect us to slide from (1) into (2) without noticing?
When people are skeptical about the concept of AGI being meaningful or having clear boundaries, it could sometimes be downstream of skepticism about very fast and impactful R&D done by AIs, such as software-only singularity or things like macroscopic biotech where compute buildout happens at a speed impossible for human industry. Such events are needed to serve as landmarks, anchoring a clear concept of AGI, otherwise the definition remains contentious.
So AI company CEOs who complain about AGI being too nebulous to define might already be expecting a scaling slowdown, with their strategy being primarily about the fight for the soul of the 2028-2030 market. When scaling is slow, it’ll become too difficult to gain a significant quality advantage sufficient to defeat the incumbents. So the decisive battle is happening now, with the rhetoric making it more palatable to push through the decisions to build the $140bn training systems of 2028.
This behavior doesn’t need to be at all related to expecting superintelligence, it makes sense as a consequence of not expecting superintelligence in the near future.
I think short timelines just don’t square with the way intelligence agencies are behaving. The NSA took Y2K more seriously than it currently seems to be taking near-term AGI. You can make the argument that intelligence agencies are less competent than they used to be, but I don’t buy that they aren’t at least extremely paranoid and moderately competent: that seems like their job.
Researchers at AGI labs seem to genuinely believe the hype they’re selling, a significant fraction of non-affiliated top-of-the-line DL researchers is inclined to believe them as well, and basically all competent well-informed people agree that the short-timelines position is not unreasonable to hold.
Dismissing short timelines based on NSA’s behavior requires assuming that they’re much more competent in the field of AI than everyone in the above list. After all, that’d require them to be strongly (and correctly) confident that all these superstar researchers above are incorrect.
While that’s not impossible, it seems highly unlikely to me. Much more likely that they’re significantly less competent, and accordingly dismissive.
This is a late reply, but at least from this article, it seems like Ilya Sutskever was running out of confidence that OpenAI would reach AGI by mid 2023. Additionally, if the rumors about GPT-5 are true, it’s mainly going to be a unification of existing models rather than something entirely new. Combined with the GPT-4.5 release, it sure seems like progress at OpenAI is slowing down rather than speeding up.
How do you know that researchers at AGI labs genuinely believe what they’re saying? Couldn’t the companies just put pressure on them to act like they believe Transformative AI is imminent? I just don’t buy that these agencies are dismissive without good reason. They’ve explored remote viewing and other ideas that are almost certainly bullshit. If they are willing to consider those possibilities, I don’t know why they wouldn’t consider the possibility of current deep learning techniques creating a national security threat. That seems like their job, and they’ve explored significantly weirder ideas.
On what possible publicly-unavailable evidence could they have updated in order to correctly attain such a high degree of dismissiveness?
I could think of three types of evidence:
Strong theoretical reasons.
E. g., some sort of classified, highly advanced, highly empirically supported theory of deep learning/intelligence/agency, such that you can run a bunch of precise experiments, or do a bunch of math derivations, and definitively conclude that DL/LLMs don’t scale to AGI.
Empirical tests.
E. g., perhaps the deep state secretly has 100x the compute of AGI labs, and they already ran the pretraining game to GPT-6 and been disappointed by the results.
Overriding expert opinions.
E. g., a large number of world-class best-of-the-best AI scientists with an impeccable track record firmly and unanimously saying that LLMs don’t scale to AGI. This requires either a “shadow industry” of AI experts working for the government, or for the AI-expert public speakers to be on the deep state’s payroll and lying in public about their uncertainty.
I mean, I guess it’s possible that what we see of the AI industry is just the tip of the iceberg and the government has classified research projects that are a decade ahead of the public state of knowledge. But I find this rather unlikely.
And unless we do postulate that, I don’t see any possible valid pathway by which they could’ve attained high certainty regarding the current paradigm not working out.
There are two ways we can update on it:
The fact that they investigated psychic phenomena means they’re willing to explore a wide variety of ambitious ideas, regardless of their weirdness – and therefore we should expect them not do dismiss the AGI Risk out of hand.
The fact that they investigated psychic phenomena means they have a pretty bad grip on reality – and therefore we should not expect them to get the AGI Risk right.
I never looked into it enough to know which interpretation is the correct one. Expecting less competence rather than more is usually a good rule of thumb, though.
To be clear, I personally very much agree with that. But:
I find that I’m not inclined to take Sutskever’s current claims about this at face value. He’s raising money for his thing, he has a vested interest in pushing the agenda that the LLM paradigm is a dead end and that his way is the only way. Same how it became advantageous for him to talk about the data wall once he’s no longer with the unlimited-compute company.
Again, I do believe both in LLMs being a dead end and in the data wall. But I don’t trust Sutskever to be a clean source of information regarding that, so I’m not inclined to update on his claims to that end.
Those are good points. The last thing i’ll say drastically reduces the amount of competence required by the government in order for them to be dismissive while still being rational, and it is that the leading AI labs may already be fairly confident that the current techniques of deep-learning won’t get to AGI in the near-future, so the security agencies know this as well.
That would make sense. But I doubt all AGI companies are that good at informational security and deception. This would require all of {OpenAI, Anthropic, DeepMind, Meta, xAI} to decide on the deceptive narrative, and then not fail to keep up the charade, which would require both sending the right public messages and synchronizing their research publications such that the set of paradigm-damning ones isn’t public.
In addition, how do we explain people who quit AGI companies and remain with short timelines?
I guess I would respond to the first point by saying all of the companies you mentioned have incentive to say they are closing in on AGI even if they aren’t. It doesn’t seem that sophisticated to say “we’re close to AGI” when you’re not. Mark Zuckerberg said that AI would be at the level of a junior SWE this year, and Meta proceeded to release Llama 4. Unless prognosticators at Meta seriously fucked up, the most likely scenario is that Zuckerberg made that comment knowing it was bullshit. And the sharing of research did slow down a lot in 2023, which gave companies cover to not release unflattering results.
And to your last point, it seems reasonable that companies could pressure former employees to act as if they believe AGI is imminent. And some researchers may be emotionally invested in believing that what they worked on is what will lead to superintelligence.
And my question for you is: if DeepMind had solid evidence that AGI would be here in 1 year, and if the security agencies had access to DeepMind’s evidence and reasoning, do you believe they would still do nothing?
As someone who thinks superintelligence could come in the near future, I basically agree with @snewman’s view that AIs have to automate the entire economy, or automate a sector that could then automate everything else very fast, but unfortunately for us this basically gives us no good fire alarms for AGI unless @Ege Erdil and @Matthew Barnett et al are right that takeoff is slow enough that most value comes from broad automation, and external use dominates internal use:
https://amistrongeryet.substack.com/p/defining-agi
On YouTube, @Evbo’s parkour civilization and PVP civilization drama movies, professionally produced, set in Minecraft, and half-parody of YA dystopia serves as a surprisingly good demonstration of Instrumental Convergence (the protagonist kills or bribes most people they meet to “rank up” in the beginning), and non-human morality (the characters basically only care about the Minecraft activity of their series, without a hint of irony).
I think using existing non-AI media as an analogy for AI could be helpful, because people think that a terminator-like ASI would be robots shooting people, one of the reasons why a common suggestion for unaligned AI is to just turn it off, pour water on the servers, etc.
Link: https://www.youtube.com/@Evbo