I’ll say that one of my key cruxes on whether AI progress actually becomes non-bullshit/actually leading into an explosion is whether in-context learning/meta-learning can act as an effective enough substitute for human neuron weight neuroplasticity with realistic compute budgets in 2030, because the key reason why AIs have a lot of weird deficits/are much worse than humans at simple tasks is because after an AI is trained, there is no neuroplasticity in the weights anymore, and thus it can learn nothing more after it’s training date unless it uses in-context learning/meta-learning:
Noosphere89
lc has argued that the measured tasks are unintentionally biased towards ones where long-term memory/context length doesn’t matter:
https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#vFq87Ge27gashgwy9
I like your explanation of why normal reliability engineering is not enough, but I’ll flag that security against actors are probably easier than LW in general portrays, and I think computer security as a culture is prone to way overestimating the difficulty of security because of incentive issues, not remembering the times something didn’t happen, and more generally side-channels arguably being much more limited than people think they do (precisely because they rely on very specific physical stuff, rather than attacking the algorithm).
It’s a non-trivial portion of my optimism on surviving AGI coming in that security, while difficult is not unreasonably difficult, and partial successes matter from a security standpoint.
Link below:
I have 2 cruxes here:
I buy Heinrich’s theory far less than I used to, because Heinrich made easily checkable false claims that all point in the direction of culture being more necessary for human success.
In particular, I do not buy that humans and chimpanzees are nearly that similar as Heinrich describes, and a big reason for this is that the study that showed that had heavily optimized and selected the best chimpanzees against reasonably average humans, which is not a good way to compare performance if you want the results to generalize.
I don’t think they’re wildly different, and I’d usually put chimps effective flops as 1-2 OOMs lower, but I wouldn’t go nearly as far as Heinrich on the similarities.
I do think culture actually matters, but nowhere near as much as Heinrich wants it to matter.
I basically disagree that most of the valuable learning takes place before age 2, and indeed if I wanted to argue the most valuable point for learning, it would probably be from 0-25 years, or more specifically 2-7 years olds and then 13-25 years old again.
I agree evolution has probably optimized human learning, but I don’t think that it’s so heavily optimized that we can use it to give a tighter upper bound than 13 OOMs, and the reason for this is I do not believe that humans are in equilibrium, and this means that there are probably optimizations left to discover, so I do think the 13 OOMs number is plausible )with high uncertainty).
Comment below:
https://www.lesswrong.com/posts/DbT4awLGyBRFbWugh/#mmS5LcrNuX2hBbQQE
I’ll flag that while I personally didn’t believe in the idea that orcas are on average >6 SDs smarter than humans, and never considered it that plausible, I’d say that I don’t think orcas could actually benefit that much from +6 SDs even if applied universally, and the reason is that they are in water, which severely limits your available technology options, and makes it really, really hard to form the societies needed to generate the explosion that happened post-industrial Revolution or even the agricultural revolution.
And there is a deep local optimum issue in which their body plan is about as unsuited to using tools as possible, and changing this requires technology they almost certainly can’t invent because the things you would need to make the tech are impossible to get at the pressure and saltiniess of the water, so it is pretty much impossible for orcas to get that much better with large increases in intelligence.
Thus, orca societies have a pretty hard limit on what they can achieve, at least ruling out technologies they cannot invent.
My take is that the big algorithmic difference that explains a lot of weird LLM deficits, and plausibly explains the post’s findings, is that current neural networks do not learn at run-time, instead their weights are frozen, and this explains a central difference of why humans are able to outperform LLMs at longer tasks, because humans have the ability to learn at run-time, as do a lot of other animals.
Unfortunately, this ability is generally lost gradually starting in your 20s, but still the existence of non-trivial learning at runtime is a huge explainer of why humans are more successful at longer tasks than AIs currently are.
And thus if OpenAI or Anthropic found this secret to life long learning, this would explain the hype (though I personally place very low probability that they succeeded on this for anything that isn’t math or coding/software).
Gwern explains below:
Re other theories, I don’t think that all other theories in existence have infinitely many adjustable parameters, and if he’s referring to the fact that lots of theories have adjustable parameters that can range over the real numbers, which are infinitely complicated in general, than that’s different, and string theory may have this issue as well.
Re string theory’s issue of being vacuous, I think the core thing that string theory predicts that other quantum gravity models don’t is that at the large scale, you recover general relativity and the standard model, whereas no other theory can yet figure out a way to properly include both the empirical effects of gravity and quantum mechanics in the parameter regimes where they are known to work, so string theory predicts more just by predicting the things other quantum mechanics predicts while having the ability to include in gravity without ruining the other predictions, whereas other models of quantum gravity tend to ruin empirical predictions like general relativity approximately holding pretty fast.
That said, for the purposes of alignment, it’s still good news that cats (by and large) do not scheme against their owner’s wishes, and the fact that cats can be as domesticated as they are while they aren’t cooperative or social is a huge boon for alignment purposes (within the analogy, which is arguably questionable).
I basically don’t buy the conjecture of humans being super-cooperative in the long run, or hatred decreasing and love increasing.
To the extent that something like this is true, I expect it to be a weird industrial to information age relic that utterly shatters if AGI/ASI is developed, and this remains true even if the AGI is aligned to a human.
To clarify a point here:
“Oh but physical devices can’t run an arbitrarily long tape”
This is not the actual issue.
The actual issue is that even with unbounded resources, you still couldn’t simulate an unbounded tape because you can’t get enough space for positional encodings.
Humans are not Turing-complete in some narrow sense;
Note that for the purpose of Turing-completeness, we only need to show that if we gave it unbounded resources, it could solve any computable problem without having to change the code, and we haven’t actually proven that humans aren’t Turing complete (indeed my big guess is that humans are Turing completeness).
Some technical comments on this post:
There’s a theorem that a large enough neural network can approximate any algorithm, but when a neural network learns, we have no control over which algorithms it will end up implementing, and don’t know how to read the algorithms off the numbers.
This theorem buys us a lot less than most people think it does, because of the requirement that the function’s domain be bounded.
More here:
https://lifeiscomputation.com/the-truth-about-the-not-so-universal-approximation-theorem/
On LLM Turing-completeness:
LLMs are Turing-complete and can implement any algorithms (researchers even came up with compilers of code into LLM parameters; though we don’t really know how to “decompile” an existing LLM to understand what algorithms the numbers represent).
LLMs I think can’t be Turing-complete, assuming they are based on the transformer architecture (which basically all LLMs are based on), at least if @Lucius Bushnaq is correct about the basic inability to simulate an unbounded tape using an unbounded context window, but RNNs are Turing complete, but there are a lot of contextual issues here.
Quote from Lucius Bushnaq:
Well, transformers are not actually Turing complete in real life where parameters aren’t real numbers, because if you want an unbounded context window to simulate unbounded tape, you eventually run out of space for positional encodings. But the amount of bits they can hold in memory does grow exponentially with the residual stream width, which seems good enough to me. Real computers don’t have infinite memory either.
And on the issues with claiming LLMs are Turing-complete in general:
https://lifeiscomputation.com/transformers-are-not-turing-complete/
IMO, the discontinuity that is sufficient here is that I expect societal responses to be discontinuous, rather than continuous, and in particular, I expect societal responses will come when people start losing jobs en masse, and at that point, either the AI is aligned well enough that existential risk is avoided, or the takeover has inevitably happened and we have very little influence over the outcome.
On this point:
Meaningful representative example in what class: I think it’s representative in ‘weird stuff may happen’, not in we will get more teenage-intern-trapped-in-a-machine characters.
Yeah, I expect society to basically not respond at all if weird stuff just happens, unless we assume more here, and in particular I think societal response is very discontinuous, even if AI progress is continuous, for both good and bad reasons.
This is cruxy, because I don’t think that noise/non-error freeness alone of your observations lead to bribing surveyors unless we add in additional assumptions about what that noise/non-error freeness is.
(in particular, simple IID noise/quantum noise likely doesn’t lead to extremal Goodhart/bribing surveyors.)
More generally, the reason I maintain a difference between these 2 failure modes of goodharting, like regressional and extremal goodharting is because they respond differently to decreasing the error.
I suspect that in the limit of 0 error, regressional Goodhart like noisy sensors leading to slight overspending on reducing mosquitos vanishes, whereas extremal Goodhart like bribing surveyors doesn’t vanish Goodhart. More importantly, the error of your sensors being means there’s only a bounded error in how much you can regulate X, and error can’t dominate, while extremal Goodhart like bribing surveyors can make the error dominate.
So I basically disagree with this statement:
Goodharting is robust. That is, the mechanism of Goodharting seems impossible to overcome. Goodharting is just a fact of any control system.
(Late comment here).
My own take is I do endorse a version of the “pausing now is too late objection”, more specifically I think that for most purposes, we should assume pauses are too late to be effective when thinking about technical alignment, and a big portion of the reason is that I don’t think we will be able to convince many people that AI is powerful enough to need governance without them first hand seeing massive job losses, and at that point we are well past the point of no return for when we could control AI as a species.
In particular, I think Eliezer is probably vindicated/made a correct prediction around how people would react to AI in there’s no fire alarm for AGI (more accurately, the fire alarm will go off way too late to serve as a fire alarm.)
More here:
Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries.
So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and convergent over a long timescale, and it’s democracies/coalition politics that are fragile over the sweep of history, because they only became dominant-ish starting in the 18th century and end sometime in the 21st century.
My guess is that the answer is also likely no, because the self-model is still retained to a huge degree, so p-zombies can’t really exist without hugely damaging the brain/being dead.
I explain a lot more about the (IMO) best current model of how consciousness works in general, since I reviewed a post on this topic:
I was implicitly assuming a closed system here, to be clear.
The trick that makes the game locally positive sum is that the earth isn’t a closed system relative to the sun, and when I said globally I was referring to the entire accessible universe.
Thinking about that though, I now think this is way less relevant except on extremely long timescales, but the future may be dominated by very long-term people, so this does matter again.
I think I understand the question now.
I actually agree that if we assume that there’s a finite maximum of atoms, we could in principle reformulate the universal computer as a finite state automaton, and if we were willing to accept the non-scalability of a finite state automaton, this could actually work.
The fundamental problem is that now we would have software that only works up to a specified memory limit, because we essentially burned the software into the hardware of the finite automaton and if you are ever uncertain of how much memory or time a problem requires, or more worryingly if we were ever uncertain about how much resources we could actually use, then our “software” for the finite automaton is no longer usable and we’d have to throw it away and recreate a new computer for every input length.
Turing Machine models automatically handle arbitrarily large inputs without having to throw away expensive work on developing the software.
So in essence, if you want to handle the most general case, or believe unbounded atoms are possible, like me, then you really want the universal computer architecture of modern computers.
The key property of real computers that makes them Turing Complete in theory is that they can scale with more memory and time arbitrarily without changing the system descriptior/code.
More below:
(If we assume that we can only ever get access to a finite number of atoms. If you dispute this I won’t argue with that, neither of us has a Theory of Everything to say for certain.)
Some thoughts on this post:
I’ll flag that for AI safety specifically, the world hasn’t yet weighed in that much, and can be treated as mostly fixed for the purposes of analysis (with caveats), but yes AI safety in general does need to prepare for the real possibility that the world in general will weigh in a lot more on AI safety, and there are a non-trivial amount of worlds where AI safety becomes a lot more mainstream.
I don’t think we should plan on this happening, but I definitely agree that the world may weigh in way more on AI safety than before, especially just before an AI explosion.
On environmentalism’s fuckups:
I definitely don’t think environmentalists caused climate change, and that’s despite thinking that the nuclear restrictions were very dumb, mostly because oil companies already were causing climate change (albeit far more restrained at the time) when they pumped oil, and this is also true of gas and coal companies.
I do think there’s a problem with environmentalism not accepting solutions that don’t fit a nature aesthetic, but that’s another problem that is mostly seperate fromcausing climate change.
Also, most of the solution here would have been to be more consequentialist and more willingness to accept expected utility maximization.
There’s arguments to be said that expected utility maximization is overrated on LW, but it’s severely underrated by basically everyone else, and basically everyone would be helped by adopting more of a utility maximization mindset.
My general views on how AI safety could go wrong is that they go wrong though either becoming somewhat like climate change/environmentalist partisans, where they systematically overestimate the severity of plausible harms, even when the harms do exist, and the other side then tries to dismiss the harms entirely, causing a polarization cascade, and another worry is that they might not realize that the danger from AI misalignment has passed, so they desperately try to keep relevance.
I have a number of takes on the bounty questions, but I’ll wait until you actually post them.
I basically agree with this, but would perhaps avoid virtue ethics, but yes one of the main things I’d generally like to see is more LWers treating stuff like saving the world with the attitude you’d have from being in a job, perhaps at a startup or government bodies like the Senate or House of Representatives in say America, rather than viewing it as your heroic responsibility.
In this respect, I think Eliezer was dangerously wrong to promote a norm of heroism/heroic responsibility.
My controversial take here is that most of the responsibility can be divvied up to the voters first and foremost, and secondly to the broad inability to actually govern using normal legislative methods once in power.
I think this is plausible, but not very likely to happen, and I do think it’s still plausible we will be in a moment where AI safety doesn’t become mainstream by default.
This especially is likely to occur if software singularities/FOOM/software intelligence explosion is at all plausible, and in these cases we cannot rely on our institutions automatically keeping up.
Link below:
https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion
I do think it’s worthwhile for people to focus on worlds where AI safety does become a mainstream political topic, but that we shouldn’t bank on AI safety going mainstream in our technical plans to make AI safe.
My takes on what we should do, in reply to you:
Suffice it to say that I’m broadly unconvinced by your criticism of Bayesianism from a philosophical perspective, for roughly the reasons @johnswentworth identified below:
https://www.lesswrong.com/posts/TyusAoBMjYzGN3eZS/why-i-m-not-a-bayesian#AGxg2r4HQoupkdCWR
On mechanism design:
One particular wrinkle to add here is that institutions/countries of the future are going to have to be value-aligned to their citizenry in a way that is genuinely unprecedented of basically any institution, because if they are not value aligned, then we just have the alignment problem again, where the people in power have very large incentives to just get rid of the rest, given arbitrary selfish values (and I don’t buy the hypothesis that consumption/human wants are fundamentally limited).
The biggest story of the 21st century is how AI is making alignment way, way more necessary than in the past.
Some final points:
On the one hand, I partially agree that in general a willingness to make plans that depend on others cooperating was definitely lacking, and I definitely agree that some ability to cooperate is necessary.
On the other hand, I broadly do not buy the idea that we are on a team with the rest of humanity, and more importantly I do think we need to prepare for worlds in which uncooperative/fighty actions like restraining open-source/potentially centralizing AI development is necessary to ensure human survival, which means that EA should be prepared to win power struggles over AI if necessary to do so.
The one big regret I have in retrospect on AI governance is that they tried to ride the wave too early, before AI was salient to the general public, which meant polarization partially happen.
Veaulans is right here:
https://x.com/veaulans/status/1890245459861729432