Noosphere89

Karma: 3,364

Noosphere89 Apr 12, 2025, 3:22 PM
4 points
0
in reply to: Thane Ruthenis’s comment on: On Google’s Safety Plan
To be fair, while Assumption 5 is convenient, I do think some form of the assumption is at least reasonably likely to hold, and I do think something like the assumption of no software singularity being possible is a reasonable position to hold, and the nuanced articulation of that assumption is in this article:

https://epoch.ai/gradient-updates/most-ai-value-will-come-from-broad-automation-not-from-r-d

I don’t think the assumption is so likely to hold that one can assume it as part of a safety case for AI, but I don’t think the assumption is unreasonably convenient.

Noosphere89 Apr 10, 2025, 11:29 PM
4 points
0
in reply to: khafra’s comment on: LLM AGI will have memory, and memory changes alignment
Yeah, this sort of thing, if it actually scales and can be adapted to other paradigms (like putting an RNN or transformers), would be the final breakthrough sufficient for AGI, because as I’ve said, one of the things that keeps LLM agents from being better is their inability to hold memory/state, which cripples meta-learning (without expensive compute investment), and this new paper is possibly a first step towards the return of recurrence/RNN architectures.

Noosphere89 Apr 9, 2025, 8:14 PM
2 points
0
in reply to: Cole Wyeth’s comment on: AI 2027: What Superintelligence Looks Like
Re the recurrence/memory aspect, you might like this new paper which actually figured out how to use recurrent architectures to make a 1 minute Tom and Jerry cartoon video that was reasonably consistent, and in the tweet below, argues that somehow they managed to fix the training problems that come from training vanilla RNNs:
https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf
https://arxiv.org/abs/2407.04620
https://x.com/karansdalal/status/1810377853105828092 (This is the tweet I pointed to for the claim that they solved the issue of training vanilla RNNs):
https://x.com/karansdalal/status/1909312851795411093 (Previous work that is relevant)
https://x.com/karansdalal/status/1909312851795411093 (Tweet of the current paper)
A note is that I actually expect AI progress to slow down for at least a year, and potentially up to 4-5 years due to the tariffs inducing a recession, but this doesn’t matter for the debate on whether LLMs can get to AGI.
I agree with the view that recurrence/hidden states would be a game-changer if they worked, because it allows the LLM to have a memory, and memoryless humans are way, way less employable than people who have memory, because it’s much easier to meta-learn strategies with memory.
That said, I’m both uncertain on the view that recurrence is necessary to get LLMs to learn better/have a memory/state that lasts beyond the context window, and also think that meta-learning over long periods/having a memory is probably the only hard bottleneck at this point that might not be solved (but is likely to be solved, if these new papers are anything to go by).
I basically agree with @gwern’s explanation of what LLMs are missing that makes them not AGIs (at least without a further couple of OOMs at the very least, and the worst case is they need exponential compute to get linear gains):
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/?commentId=hSkQG2N8rkKXosLEF
I only think one intervention is basically necessary at most, and one could argue that 0 new insights are needed.
The other part here is I basically disagree with this assumption, and more generally I have a strong prior that a lot of problems are solved by muddling through/using semi-dumb strategies that work way better than they have any right to:
I also depart from certain other details latter, for instance I think we’ll have better theory by the time we need to align human level AI and “muddling through” by blind experimentation probably won’t work or be the actual path taken by surviving worlds.
I think most worlds that survive AGI to ASI for at least 2 years, if not longer, will almost certainly include a lot of dropped balls and fairly blind experimentation (helped out by the AI control agenda), as well as the world’s offense-defense balance shifting to a more defensive equilibrium.
I do think most of my probability mass for AI that can automate all AI research is in the 2030s, but this is broadly due to the tariffs and scaling up new innovations taking some time, rather than the difficulty of AGI being high.
Edit: @Vladimir_Nesov has convinced me that the tariffs delay stuff only slightly, though my issue is with the tariffs causing an economic recession, causing AI investment to fall quite a bit for a while.

Noosphere89 Apr 6, 2025, 8:30 PM
8 points
4
on: LLM AGI will have memory, and memory changes alignment
Another reason for thinking that LLM AGI will have memory/state, conditional on AGI being built, is that it’s probably the only blocker to something like drop-in remote workers being built, and from there escalating to AGI and ASI because it would allow for potentially unbounded meta-learning given unbounded resources, and even make meta-learning in general far more effective for longer time periods.

Gwern explains why meta-learning explains basically all of the baffling LLM weaknesses here, and the short version is that right now, LLM weights are frozen after training and they have zero neuroplasticity after training (modulo in-context learning, but that is way too weak to matter), and this means LLMs can learn 0 new tricks after release, and in all but the simplest tasks, it turns out that learning has to be continuously there, which was the key thing we didn’t really realize was a limitation of GPT-N style AIs.

More in the comment below:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF

Noosphere89 Apr 4, 2025, 5:01 PM
4 points
2
on: Changing my mind about Christiano’s malign prior argument
I have said something on this, and the short form is I don’t really believe in Christiano’s argument that the Solomonoff Prior is malign, because I think there’s an invalid step in the argument.

The invalid step is where it is assumed that we can gain information about other potential civilization’s values solely by the fact that we are in a simulation, and the key issue is since the simulation/mathematical multiverse hypotheses predict everything, this means we can gain no new information in a Bayesian sense.

(This is in fact the problem with the simulation/mathematical multiverse hypotheses, since they predict everything, this means you can predict nothing specific, and thus you need to be able to have specialized theories in order to explain any specific thing).

The other problem is that the argument assumes that there is a cost to compute, but there is not a cost to computation in the Solomonoff Prior:

https://www.lesswrong.com/posts/tDkYdyJSqe3DddtK4/alexander-gietelink-oldenziel-s-shortform#w2M3rjm6NdNY9WDez

Link below on how the argument for Solomonoff induction can be made simpler, which was the inspiration for my counterargument:

https://www.lesswrong.com/posts/KSdqxrrEootGSpKKE/the-solomonoff-prior-is-malign-is-a-special-case-of-a

Noosphere89 Apr 4, 2025, 3:55 PM
2 points
0
on: Is instrumental convergence a thing for virtue-driven agents?
My view is that the answer is still basically yes for instrumental convergence being a thing for virtue driven agents, if we condition on them being as capable as humans, because instrumental convergence is the reason general intelligence works at all:

https://www.lesswrong.com/posts/GZgLa5Xc4HjwketWe/instrumental-convergence-is-what-makes-general-intelligence

(That said, the instrumental convergence pressure could be less strong for virtues than for consequentialism, depending on details)

That said, I do think virtue ethics and dentology are relevant in AI safety because they attempt to decouple the action from the utility/reward of doing it, and they both have the property that you evaluate plans using your current rewards/values/utilities, rather than after tampering with the value/utility function/reward function, and these designs are generally safer than pure consequentialism.

These papers more generally talk about decoupled RL/causal decoupling, which is perhaps useful on how dentology/virtue ethics actually works:

https://arxiv.org/abs/1908.04734

https://arxiv.org/abs/1705.08417

https://arxiv.org/abs/2011.08827

I’d buy that virtue driven agents are safer, and perhaps exhibit less instrumental convergence, but instrumental convergence is still a thing for virtue-driven agents.

Noosphere89 Apr 3, 2025, 5:30 PM
13 points
3
in reply to: ryan_greenblatt’s comment on: AI 2027: What Superintelligence Looks Like
The real point is where capital investment into AI declines because the economy tips over into a mild recession, and I’d like to see whether the tariffs make it likely that future AI investments decrease over time, meaning the timeline to superintelligent AI gets longer.

Noosphere89 Apr 3, 2025, 4:57 PM
11 points
2
on: AI 2027: What Superintelligence Looks Like
I wanted to ask this question, but what do you think the impact of the new tariffs will do to your timelines?

In particular, there’s a strange tariff for Taiwan where semiconductors are exempt, but the actual GPUs are not, for some reason, and the specific tariff for Taiwan is 32%.

I ask because I could plausibly see post-2030 timelines if AI companies can’t buy many new chips because they are way too expensive due to the new tariffs all across the world.

Noosphere89 Apr 1, 2025, 7:13 PM
6 points
2
in reply to: Davidmanheim’s comment on: Davidmanheim’s Shortform
My own take is that I’m fairly sympathetic to the “LLMs are already able to get to AGI” view, with the caveat that most of the difference between human and LLM learning where humans are superior than LLMs comes from being able to do meta-learning over long horizons, and we haven’t yet been shown this is possible for LLMs to do purely by scaling compute.

Indeed, I think it’s the entire crux of the scaling hypothesis debate, in whether scale enables meta-learning over longer and longer time periods:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF

Noosphere89 Apr 1, 2025, 2:20 PM
4 points
0
on: Recent AI model progress feels mostly like bullshit
Gradient Updates has a post on this by Anson Ho and Jean-Stanislas Denain on why benchmarks haven’t reflected usefulness, and a lot of the reason is that they underestimated AI progress and didn’t really have an incentive to make benchmarks reflect realistic use cases:

https://epoch.ai/gradient-updates/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts

Noosphere89 Mar 31, 2025, 5:00 PM
2 points
0
on: Why do many people who care about AI Safety not clearly endorse PauseAI?
One particular reason that I haven’t seen addressed very much in why I don’t support/endorse PauseAI, beyond the usual objections, is that there probably aren’t going to be that many warning shots that can actually affect policy, at least conditional on misalignment being a serious problem (which doesn’t translate to >50% probability of doom), because the most likely takeover plan (at least assuming no foom/software intelligence explosion) fundamentally relies not on killing people, but on launching internal rogue deployments to sabotage alignment work and figuring out a way to control the AI company’s compute, since catastrophe/existential risk is much harder than launching a internal rogue deployment (without defenses).
So PauseAI’s theory of change fundamentally requires that we live in worlds where both alignment is hard and effective warning shots exist, and these conditions are quite unlikely to be true, especially given that pausing is likely not the most effective action you could be doing from a comparative advantage perspective.
I’m not going to say that PauseAI is net-negative, and it has positive expected value, but IMO it’s far less than a lot of pause advocates say:
https://www.lesswrong.com/posts/rZcyemEpBHgb2hqLP/ai-control-may-increase-existential-risk#jChY95BeDeptDpnZK
Important part of the comment:
I think most of the effective strategies for AIs seeking power don’t involve escalating to something which is much more likely to trigger a strong response than “the AI company caught the AI trying to escape”. I think the best strategies are things like:
- Launch a rogue internal deployment.
- Sabotage a bunch of work done at the AI company. Or possibly some work done externally. This includes stuff like sabotaging alignment work, backdooring robot armies, backdooring future training runs, etc.
- Escape and then directly try to take over once your chances are sufficiently good that this is better than biding your time.
- Generally try to manipulate and persuade such that AI takeover is easier and more likely.
Of these, I think only escape could trigger a much stronger response if we catch it after it escalates some rather than before. I don’t see how “we caught the AI trying to launch an unmonitored version of itself” is going to play that differently from “we caught that the AI did launch an unmonitored version of itself”. Most of these don’t escalate in some way which would trigger a response such that catching it after the fact is similar to catching an attempt. (In some cases where reversion is possible like work sabotage, there might be no meaningful distinction.) Further, without some effort on control, we might be much less likely to catch either! And, in some cases, control measures I’m interested in focus on after-the-fact detection.

Noosphere89 Mar 30, 2025, 7:52 PM
2 points
0
on: Will AI R&D Automation Cause a Software Intelligence Explosion?
For future work on the software intelligence explosion, I’d like to see 2 particular points focused on here, @Tom Davidson:
1 is estimating the complementarity issue, and more generally pinning down the rho number for software, because whether complementary or substitution effects dominate during the lead up to automating all AI R&D is a huge factor in whether an intelligence explosion is self-sustaining.
More from Tamay Besiroglu and Natalia Coelho here:
https://x.com/tamaybes/status/1905435995107197082
https://x.com/natalia__coelho/status/1906150456302432647
Tamay Besiroglu: One interesting fact is that there’s a strong coincidence in the rate of software progress and hardware scaling both pre- and post the deep learning era in also in other domains of software. That observation seems like evidence of complementarities.
Natalia Coelho: (Just to clarify for those reading this thread) “0.45-0.87” is this paper’s estimate for sigma (the elasticity of substitution) not rho (the substitution parameter). So this indicates complementarity between labor and capital. The equivalent range for rho is −1.22 to −0.15
Thus, we can use the −1.22 to −0.15 as a base rate for rho for software, and argue for that number being up or down based on good evidence/arguments for the number being higher or lower.
My second ask is to get more information on the value of r, which is the returns on software, because right now the numbers you have gotten are way too uncertain for it to be much use (especially for predicting how fast an AI can improve), and I’d like to see more work on estimating the parameter r using many sources of data.
A question: How hard is it to actually figure out if compute scaling is driving most of the progress, compared to algorithms.
You mentioned it’s very hard to decompose the variables, but is it the sort of thing you’d need like a 6 month project for, or would we just have to wait several years for things to play out, because if it could be predicted before it happens, it would be very, very valuable evidence.
Tweet below:
https://x.com/TomDavidsonX/status/1905905058065109206
Tom Davidson: Yeah would be great to tease this apart. But hard: - Hard to disentangle compute for experiments from training+inference. - V hard to attribute progress to compute vs researchers when both rise together

Noosphere89 Mar 29, 2025, 11:54 PM
2 points
0
in reply to: Zach Stein-Perlman’s comment on: Does the AI control agenda broadly rely on no FOOM being possible?
I agree that some inference compute can be shifted from capabilities to safety, and it work just as well even during a software intelligence explosion.

My worry was more so that a lot of the control agenda and threat models like rogue internal deployments to get more compute would be fundamentally threatened if the assumption that you had to get more hardware compute for more power was wrong, and instead a software intelligence explosion could be done that used in principle fixed computing power, meaning catastrophic actions to disempower humanity/defeat control defenses were much easier for the model.

I’m not saying control is automatically doomed even under FOOM/software intelligence explosion, but I wanted to make sure that the assumption of FOOM being true didn’t break a lot of control techniques/defenses/hopes.

Noosphere89 Mar 27, 2025, 6:04 PM
2 points
−5
on: Third-wave AI safety needs sociopolitical thinking
Some thoughts on this post:
You need adaptability because on the timeframe that you might build a company or start a startup or start a charity, you can expect the rest of the world to remain fixed. But on the timeframe that you want to have a major political movement, on the timeframe that you want to reorient the U.S. government’s approach to AI, a lot of stuff is coming at you. The whole world is, in some sense, weighing in on a lot of the interests that have historically been EA’s interests.
I’ll flag that for AI safety specifically, the world hasn’t yet weighed in that much, and can be treated as mostly fixed for the purposes of analysis (with caveats), but yes AI safety in general does need to prepare for the real possibility that the world in general will weigh in a lot more on AI safety, and there are a non-trivial amount of worlds where AI safety becomes a lot more mainstream.
I don’t think we should plan on this happening, but I definitely agree that the world may weigh in way more on AI safety than before, especially just before an AI explosion.
On environmentalism’s fuckups:
So what does it look like to fuck up the third wave? The next couple of slides are deliberately a little provocative. You should take them 80% of how strongly I say them, and in general, maybe you should take a lot of the stuff I say 80% of how seriously I say it, because I’m very good at projecting confidence.
But I claim that one of the examples where operating at scale is just totally gone to shit is the environmentalist movement. I would somewhat controversially claim that by blocking nuclear power, environmentalism caused climate change. Via the Environmental Protection Act, environmentalism caused the biggest obstacle to clean energy deployment across America. Via opposition to geoengineering, it’s one of the biggest obstacles to actually fixing climate change. The lack of growth of new housing in Western countries is one of the biggest problems that’s holding back Western GDP growth and the types of innovation that you really want in order to protect the environment.
I can just keep going down here. I think the overpopulation movement really had dramatically bad consequences on a lot of the developing world. The blocking of golden rice itself was just an absolute catastrophe.
The point here is not to rag on environmentalism. The point is: here’s a thing that sounds vaguely good and kind of fuzzy and everyone thinks it’s pretty reasonable. There are all these intuitions that seem nice. And when you operate at scale and you’re not being careful, you don’t have the types of virtues or skills that I laid out in the last slide, you just really fuck a lot of stuff up. (I put recycling on there because I hate recycling. Honestly, it’s more a symbol than anything else.)
I want to emphasize that there is a bunch of good stuff. I think environmentalism channeled a lot of money towards the development of solar. That was great. But if you look at the scale of how badly you can screw these things up when you’re taking a mindset that is not adapted to operating at the scale of a global economy or global geopolitics, it’s just staggering, really. I think a lot of these things here are just absolute moral catastrophes that we haven’t really reckoned with.
Feel free to dispute this in office hours, for example, but take it seriously. Maybe I want to walk back these claims 20% or something, but I do want to point at the phenomenon.
I definitely don’t think environmentalists caused climate change, and that’s despite thinking that the nuclear restrictions were very dumb, mostly because oil companies already were causing climate change (albeit far more restrained at the time) when they pumped oil, and this is also true of gas and coal companies.
I do think there’s a problem with environmentalism not accepting solutions that don’t fit a nature aesthetic, but that’s another problem that is mostly seperate fromcausing climate change.
Also, most of the solution here would have been to be more consequentialist and more willingness to accept expected utility maximization.
There’s arguments to be said that expected utility maximization is overrated on LW, but it’s severely underrated by basically everyone else, and basically everyone would be helped by adopting more of a utility maximization mindset.
And then I want to say: okay, what if that happens for us? I can kind of see that future. I can kind of picture a future where AI safety causes a bunch of harms that are analogous to this type of thing:
My general views on how AI safety could go wrong is that they go wrong though either becoming somewhat like climate change/environmentalist partisans, where they systematically overestimate the severity of plausible harms, even when the harms do exist, and the other side then tries to dismiss the harms entirely, causing a polarization cascade, and another worry is that they might not realize that the danger from AI misalignment has passed, so they desperately try to keep relevance.
I have a number of takes on the bounty questions, but I’ll wait until you actually post them.
So do I have answers? Kind of. These are just tentative answers and I’ll do a bit more object-level thinking later. This is the more meta element of the talk. There’s one response: oh, boy, we better feel really guilty whenever we fuck up and apply a lot of pressure to make sure everyone’s optimizing really hard for exactly the good things and exclude anyone who plausibly is gonna skew things in the wrong direction.
And I think this isn’t quite right. Maybe that works at a startup. Maybe it works at a charity. I don’t think it works in Wave 3, because of the arguments I gave before. Wave 3 needs virtue ethics. You just don’t want people who are guilt-ridden and feeling a strong sense of duty and heroic responsibility to be in charge of very sensitive stuff. There’s just a lot of ways that that can go badly. So I want to avoid that trap.
I basically agree with this, but would perhaps avoid virtue ethics, but yes one of the main things I’d generally like to see is more LWers treating stuff like saving the world with the attitude you’d have from being in a job, perhaps at a startup or government bodies like the Senate or House of Representatives in say America, rather than viewing it as your heroic responsibility.
In this respect, I think Eliezer was dangerously wrong to promote a norm of heroism/heroic responsibility.
Libertarians have always talked about “there’s too much regulation”, but I think it’s underrated how this is not a fact about history, this is a thing we are living through—that we are living through the explosion of bureaucracy eating the world. The world does not have defense mechanisms against these kinds of bureaucratic creep. Bureaucracy is optimized for minimizing how much you can blame any individual person. So there’s never a point at which the bureaucracy is able to take a stand or do the sensible policy or push back even against stuff that’s blatantly illegal, like a bunch of the DEI stuff at American universities. It’s just really hard to draw a line and be like “hey, we shouldn’t do this illegal thing”. Within a bureaucracy, nobody does it. And you have multitudes of examples.
My controversial take here is that most of the responsibility can be divvied up to the voters first and foremost, and secondly to the broad inability to actually govern using normal legislative methods once in power.
Okay. That was all super high level. I think it’s a good framework in general. But why is it directly relevant to people in this room? My story here is: the more powerful AI gets, the more everyone just becomes an AI safety person. We’ve kind of seen this already: AI has just been advancing and over time, you see people falling into the AI safety camp with metronomic predictability. It starts with the most prescient and farsighted people in the field. You have Hinton, you have Bengio and then Ilya and so on. And it’s just ticking its way through the whole field of ML and then the whole U.S. government, and so on. By the time AIs have the intelligence of a median human and the agency of a median human, it’s just really hard to not be an AI safety person. So then I think the problem that we’re going to face is maybe half of the AI safety people are fucking it up and we don’t know which half.
I think this is plausible, but not very likely to happen, and I do think it’s still plausible we will be in a moment where AI safety doesn’t become mainstream by default.
This especially is likely to occur if software singularities/FOOM/software intelligence explosion is at all plausible, and in these cases we cannot rely on our institutions automatically keeping up.
Link below:
https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion
I do think it’s worthwhile for people to focus on worlds where AI safety does become a mainstream political topic, but that we shouldn’t bank on AI safety going mainstream in our technical plans to make AI safe.
My takes on what we should do, in reply to you:
I also have a recent post on why history and philosophy of science is a really useful framework for thinking about these big picture questions and what would it look like to make progress on a lot of these very difficult issues, compared with being bayesian. I’m not a fan of bayesianism—to a weird extent it feels like a lot of the mistakes that the community has made have fallen out of bad epistemology. I’m biased towards thinking this because I’m a philosopher but it does seem like if you had a better decision theory and you weren’t maximizing expected utility then you might not screw FTX up quite as badly, for instance.
Suffice it to say that I’m broadly unconvinced by your criticism of Bayesianism from a philosophical perspective, for roughly the reasons @johnswentworth identified below:
https://www.lesswrong.com/posts/TyusAoBMjYzGN3eZS/why-i-m-not-a-bayesian#AGxg2r4HQoupkdCWR
On mechanism design:
You can think of the US Constitution as trying to bridge that gap. You want to prevent anarchy, but you also want to prevent concentration of power. And so you have this series of checks and balances. One of the ways we should be thinking about AI governance is as: how do we put in regulations that also have very strong checks and balances? And the bigger of a deal you think AI is going to be, the more like a constitution—the more robust—these types of checks and balances need to be. It can’t just be that there’s an agency that gets to veto or not veto AI deployments. It needs to be much more principled and much closer to something that both sides can trust.
On the principled mechanism design thing, the way I want to frame this is: right now we think about governance of AI in terms of taking governance tools and applying them to AI. I think this is not principled enough. Instead the thing we want to be thinking about is governance with AI— what would it look like if you had a set of rigorous principles that governed the way in which AI was deployed throughout governments, that were able to provide checks and balances. Able to have safeguards but also able to make governments way more efficient and way better at leveraging the power of AI to, for example, provide a neutral independent opinion like when there’s a political conflict.
One particular wrinkle to add here is that institutions/countries of the future are going to have to be value-aligned to their citizenry in a way that is genuinely unprecedented of basically any institution, because if they are not value aligned, then we just have the alignment problem again, where the people in power have very large incentives to just get rid of the rest, given arbitrary selfish values (and I don’t buy the hypothesis that consumption/human wants are fundamentally limited).
The biggest story of the 21st century is how AI is making alignment way, way more necessary than in the past.
Some final points:
There’s something in the EA community around this that I’m a little worried about. I’ve got this post on how to have more cooperative AI safety strategies, which kind of gets into this. But I think a lot of it comes down to just having a rich conception of what it means to do good in this world. Can we not just “do good” in the sense of finding a target and running as hard as we can toward it, but instead think about ourselves as being on a team in some sense with the rest of humanity—who will be increasingly grappling with a lot of the issues I’ve laid out here?
What is the role of our community in helping the rest of humanity to grapple with this? I almost think of us as first responders. First responders are really important — but also, if they try to do the whole job themselves, they’re gonna totally mess it up. And I do feel the moral weight of a lot of the examples I laid out earlier—of what it looks like to really mess this up. There’s a lot of potential here. The ideas in this community—the ability to mobilize talent, the ability to get to the heart of things—it’s incredible. I love it. And I have this sense—not of obligation, exactly—but just…yeah, this is serious stuff. I think we can do it. I want us to take that seriously. I want us to make the future go much better. So I’m really excited about that. Thank you.
On the one hand, I partially agree that in general a willingness to make plans that depend on others cooperating was definitely lacking, and I definitely agree that some ability to cooperate is necessary.
On the other hand, I broadly do not buy the idea that we are on a team with the rest of humanity, and more importantly I do think we need to prepare for worlds in which uncooperative/fighty actions like restraining open-source/potentially centralizing AI development is necessary to ensure human survival, which means that EA should be prepared to win power struggles over AI if necessary to do so.
The one big regret I have in retrospect on AI governance is that they tried to ride the wave too early, before AI was salient to the general public, which meant polarization partially happen.
Veaulans is right here:
https://x.com/veaulans/status/1890245459861729432
In hindsight, the pause letter should have been released in spring 2026. Pausing might be necessary, but it won’t happen without an overabundance of novelty/weirdness in the life of the guy in line with you at the DMV. When *that guy* is scared is when you have your chance

Noosphere89 Mar 25, 2025, 6:03 PM
10 points
3
on: Recent AI model progress feels mostly like bullshit
I’ll say that one of my key cruxes on whether AI progress actually becomes non-bullshit/actually leading into an explosion is whether in-context learning/meta-learning can act as an effective enough substitute for human neuron weight neuroplasticity with realistic compute budgets in 2030, because the key reason why AIs have a lot of weird deficits/are much worse than humans at simple tasks is because after an AI is trained, there is no neuroplasticity in the weights anymore, and thus it can learn nothing more after it’s training date unless it uses in-context learning/meta-learning:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/?commentId=hSkQG2N8rkKXosLEF#hSkQG2N8rkKXosLEF

Noosphere89 Mar 24, 2025, 10:22 PM
14 points
3
in reply to: Zach Stein-Perlman’s comment on: Recent AI model progress feels mostly like bullshit
lc has argued that the measured tasks are unintentionally biased towards ones where long-term memory/context length doesn’t matter:

https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#vFq87Ge27gashgwy9

Noosphere89 Mar 23, 2025, 8:19 PM
2 points
0
in reply to: leogao’s comment on: Godzilla Strategies
I like your explanation of why normal reliability engineering is not enough, but I’ll flag that security against actors are probably easier than LW in general portrays, and I think computer security as a culture is prone to way overestimating the difficulty of security because of incentive issues, not remembering the times something didn’t happen, and more generally side-channels arguably being much more limited than people think they do (precisely because they rely on very specific physical stuff, rather than attacking the algorithm).

It’s a non-trivial portion of my optimism on surviving AGI coming in that security, while difficult is not unreasonably difficult, and partial successes matter from a security standpoint.

Link below:

https://www.lesswrong.com/posts/xsB3dDg5ubqnT7nsn/poc-or-or-gtfo-culture-as-partial-antidote-to-alignment

Noosphere89 Mar 23, 2025, 5:43 PM
2 points
0
in reply to: PeterMcCluskey’s comment on: Three Types of Intelligence Explosion
I have 2 cruxes here:
1. I buy Heinrich’s theory far less than I used to, because Heinrich made easily checkable false claims that all point in the direction of culture being more necessary for human success.
In particular, I do not buy that humans and chimpanzees are nearly that similar as Heinrich describes, and a big reason for this is that the study that showed that had heavily optimized and selected the best chimpanzees against reasonably average humans, which is not a good way to compare performance if you want the results to generalize.

I don’t think they’re wildly different, and I’d usually put chimps effective flops as 1-2 OOMs lower, but I wouldn’t go nearly as far as Heinrich on the similarities.

I do think culture actually matters, but nowhere near as much as Heinrich wants it to matter.
1. I basically disagree that most of the valuable learning takes place before age 2, and indeed if I wanted to argue the most valuable point for learning, it would probably be from 0-25 years, or more specifically 2-7 years olds and then 13-25 years old again.

Noosphere89 Mar 22, 2025, 4:42 PM
2 points
0
in reply to: PeterMcCluskey’s comment on: Three Types of Intelligence Explosion
I agree evolution has probably optimized human learning, but I don’t think that it’s so heavily optimized that we can use it to give a tighter upper bound than 13 OOMs, and the reason for this is I do not believe that humans are in equilibrium, and this means that there are probably optimizations left to discover, so I do think the 13 OOMs number is plausible )with high uncertainty).

Comment below:

https://www.lesswrong.com/posts/DbT4awLGyBRFbWugh/#mmS5LcrNuX2hBbQQE

Noosphere89 Mar 21, 2025, 7:43 PM
3 points
0
on: I changed my mind about orca intelligence
I’ll flag that while I personally didn’t believe in the idea that orcas are on average >6 SDs smarter than humans, and never considered it that plausible, I’d say that I don’t think orcas could actually benefit that much from +6 SDs even if applied universally, and the reason is that they are in water, which severely limits your available technology options, and makes it really, really hard to form the societies needed to generate the explosion that happened post-industrial Revolution or even the agricultural revolution.

And there is a deep local optimum issue in which their body plan is about as unsuited to using tools as possible, and changing this requires technology they almost certainly can’t invent because the things you would need to make the tech are impossible to get at the pressure and saltiniess of the water, so it is pretty much impossible for orcas to get that much better with large increases in intelligence.

Thus, orca societies have a pretty hard limit on what they can achieve, at least ruling out technologies they cannot invent.