Thane Ruthenis

Karma: 6,631

Thane Ruthenis Apr 22, 2025, 1:27 AM
3 points
0
in reply to: ryan_greenblatt’s comment on: Vladimir_Nesov’s Shortform
I’d guess this paper doesn’t have the actual optimal methods.
Intuitively, this shouldn’t matter much. They use some RL-on-CoTs method that works, and I expect its effects are not fundamentally different from optimal methods’. Thus, optimal methods might yield better quantitative results, but similar qualitative results: maybe they’d let elicit pass@800 capabilities instead of “just” pass@400, but it’d still be just pass@k elicitation for not-astronomical k.
Not strongly convinced of that, though.

Thane Ruthenis Apr 22, 2025, 1:21 AM
5 points
0
in reply to: Vladimir_Nesov’s comment on: Vladimir_Nesov’s Shortform
Huh. This is roughly what I’d expected, but even I didn’t expect it to be so underwhelming.^[1]
I weakly predict that the situation isn’t quite as bad for capabilities as this makes it look. But I do think something-like-this is likely the case.
1. ^
  Of course, moving a pass@400 capability to pass@1 isn’t nothing, but it’s clearly astronomically short of a Singularity-enabling technique that RL-on-CoTs is touted as.

Thane Ruthenis Apr 19, 2025, 8:04 PM
30 points
26
in reply to: Mis-Understandings’s comment on: Why Should I Assume CCP AGI is Worse Than USG AGI?
Since the US government is expected to treat other stakeholders in its previous block better than China treats members of it’s block
At the risk of getting too into politics...
IMO, this was maybe-true for the previous administrations, but is completely false for the current one. All people making the argument based on something like this reasoning need to update.
Previous administrations were more or less dead inertial bureaucracies. Those actually might have carried on acting in democracy-ish ways even when facing outside-context events/situations, such as suddenly having access to overwhelming ASI power. Not necessarily because were particularly “nice”, as such, but because they weren’t agenty enough to do something too out-of-character compared to their previous democracy-LARP behavior.
I still wouldn’t have bet on them acting in pro-humanity ways (I would’ve expected some more agenty/power-hungry governmental subsystem to grab the power, circumventing e. g. the inertial low-agency Presidential administration). But there was at least a reasonable story there.
The current administration seems much more agenty: much more willing to push the boundaries of what’s allowed and deliberately erode the constraints on what it can do. I think it doesn’t generalize to boring democracy-ish behavior out-of-distribution, I think it eagerly grabs and exploits the overwhelming power. It’s already chomping at the bit to do so.

Thane Ruthenis Apr 19, 2025, 4:18 AM
3 points
0
in reply to: MichaelDickens’s comment on: Training AGI in Secret would be Unsafe and Unethical
Mm, yeah, maybe. The key part here is, as usual, “who is implementing this plan”? Specifically, even if someone solves the the preference-agglomeration problem (which may be possible to do for a small group of researchers), why would we expect it to end up implemented at scale? There are tons of great-on-paper governance ideas which governments around the world are busy ignoring.
For things like superbabies (or brain-computer interfaces, or uploads), there’s at least a more plausible pathway for wide adoption, similar motives for maximizing profit/geopolitical power as with AGI.

Thane Ruthenis Apr 18, 2025, 10:33 PM
46 points
23
on: Training AGI in Secret would be Unsafe and Unethical
I also think there is a genuine alternative in which power never concentrates to such an extreme degree.
I don’t see it.
The distribution of power post-ASI depends on the constraint/goal structures instilled into the (presumed-aligned) ASI. That means the entity in whose hands all power is concentrated are the people deciding on what goals/constraints to instill into the ASI, in the time prior to the ASI’s existence. What people could those be?
1. By default, it’s the ASI’s developers, e. g., the leadership of the AGI labs. “They will be nice and put in goals/constraints that make the ASI loyal to humanity, not to them personally” is more or less isomorphic to “they will make the ASI loyal to them personally, but they’re nice and loyal to humanity”; in both cases, they have all the power.^[1]
2. If the ASI’s developers go inform the US’s President about it in a faithful way^[2], the overwhelming power will end up concentrated in the hands of the President/the extant powers that be. Either by way of ham-fisted nationalization (with something isomorphic to putting guns to the developers’ (families’) heads), or by subtler manipulation where e. g. everyone is forced to LARP believing in the US’ extant democratic processes (which the President would be actively subverting, especially if that’s still Trump), with this LARP being carried far enough to end up in the ASI’s goal structure.
  - The stories in which the resultant power struggles shake out in a way that leads to the humanity-as-a-whole being given true meaningful input in the process (e. g., the slowdown ending in AI-2027) seem incredibly fantastical to me. (Again, especially given the current US administration.)
  - Yes, acting in ham-fisted ways would be precarious and have various costs. But I expect the USG to be able to play it well enough to avoid actual armed insurrection (especially given that the AGI concerns are currently not very legible to the public), and inasmuch as they actually “feel the AGI”, they’d know that nothing less than that would ultimately matter.
3. If the ASI’s developers somehow go public with the whole thing, and attempt to unilaterally set up some actually-democratic process for negotiating on the ASI goal/constraint structures, then either (1) the US government notices it, realizes what’s happening, takes control, and subverts the process, (2) they set up some very broken process – as broken as the US electoral procedures which end up with Biden and Trump as Top 2 choice of president – and those processes output some basically random, potentially actively harmful results (again, something as bad as Biden vs. Trump).
Fundamentally, the problem is that there’s currently no faithful mechanism of human preference agglomeration that works at scale. That means, both, that (1) it’s currently impossible to let humanity-as-a-whole actually weigh in on the process, (2) there are no extant outputs of that mechanism around, all people and systems that currently hold power aren’t aligned to humanity in a way that generalizes to out-of-distribution events (such as being given godlike power).
Thus, I could only see three options:
- Power is concentrated in some small group’s hands, with everyone then banking on that group acting in a prosocial way, perhaps by asking the ASI to develop a faithful scalable preference-agglomeration process. (I. e., we use a faithful but small-scale human-preference-agglomeration process.)
- Power is handed off to some random, unstable process. (Either a preference agglomeration system as unfaithful as US’ voting systems, or “open-source the AGI and let everyone in the world fight it out”, or “sample a random goal system and let it probably tile the universe with paperclips”.)
- ASI development is stopped and some different avenue of intelligence enhancement (e. g., superbabies) is pursued; one that’s more gradual and is inherently more decentralized.
1. ^
  A group of humans that compromises on making the ASI loyal to humanity is likely more realistic than a group of humans which is actually loyal to humanity. E. g., because the group has some psychopaths and some idealists, and all psychopaths have to individually LARP being prosocial in order to not end up with the idealists ganging up against them, with this LARP then being carried far enough to end up in the ASI’s goals. But this still involves that small group having ultimate power; still involves the future being determined by how the dynamics within that small group shake out.
2. ^
  Rather than keeping him in the dark or playing him, which reduces to Scenario 1.

Thane Ruthenis Apr 17, 2025, 6:33 PM
4 points
0
in reply to: Brendan Long’s comment on: AI #112: Release the Everything
“you can create conversation categories and memory will only apply to other conversations in that category”
Yeah, that’s what I’d meant.

Thane Ruthenis Apr 17, 2025, 5:38 PM
3 points
0
in reply to: Brendan Long’s comment on: AI #112: Release the Everything
I disagree. People don’t find it confusing to sort files into folders.

Thane Ruthenis Apr 17, 2025, 4:15 PM
6 points
0
on: AI #112: Release the Everything
ChatGPT memory now extends to the full contents of all your conversations. You can opt out of this. You can also do incognito windows that won’t interact with your other chats. You can also delete select conversations.
The way they should actually set this up is to let users create custom “memory categories”, e. g. “professional context”, “personal context”, “legal-advice context”, “hobby#1 context”, and let people choose in which category (if any!) a given conversation goes.
It seems obvious and trivial to implement. I’m confused why they haven’t done that yet. (Clashes with their “universal AI assistant” ideas?)

Thane Ruthenis Apr 17, 2025, 1:05 AM
8 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
46 agreement/diasgreement-votes, 0 net agreement score
Gotta love how much of a perfect Scissor statement this is. (Same as my “o3 is not that impressive”.)

Thane Ruthenis Apr 16, 2025, 9:50 PM
6 points
1
in reply to: StanislavKrym’s comment on: Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
OpenAI’s o3 and o4-mini models are likely to become accessible for $20000 per month
Already available for $20/month.
The $20,000/month claims seems to originate from that atrocious The Information article, which threw together a bunch of unrelated sentences at the end to create the (false) impression that o3 and o4-mini are innovator-agents which will become available for $20,000/month this week. In actuality, the sentences “OpenAI believes it can charge $20,000 per month for doctorate-level AI”, “new AI aims to resemble inventors”, and “OpenAI is preparing to launch [o3 and o4-mini] this week” are separately true, but have nothing to do with each other.

Thane Ruthenis Apr 16, 2025, 4:25 PM
7 points
1
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most
… or, if we do live in a world in which LLMs are not AGI-complete, it might accelerate timelines. After all, this would force the capabilities people to turn their brains on again instead of mindlessly scaling, and that might lead to them stumbling on something which is AGI-complete. And it would, due to a design constraint, need much less compute for committing omnicide.
How likely would that be? Companies/people able to pivot like this would need to be live players, capable of even conceiving of new ideas that aren’t “scale LLMs”. Naturally, that means 90% of the current AI industry would be out of the game. But then, 90% of the current AI industry aren’t really pushing the frontier today either; that wouldn’t be much of a loss.
To what extent are the three AGI labs alive vs. dead players, then?
- OpenAI has certainly been alive back in 2022. Maybe the coup and the exoduses killed it and it’s now a corpse whose apparent movement is just inertial (the reasoning models were invented prior to the coup, if Q* rumors are to be trusted, so it’s little evidence that OpenAI was still alive in 2024). But maybe not.
- Anthropic houses a bunch of the best OpenAI researchers now, and it’s apparently capable of inventing some novel tricks (whatever’s the mystery behind Sonnet 3.5 and 3.6).
- DeepMind is even now consistently outputting some interesting non-LLM research.
I think there’s a decent chance that they’re alive enough. Currently, they’re busy eating the best AI researchers and turning them into LLM researchers. If they stop focusing people’s attention on the potentially-doomed paradigm, if they’re forced to correct the mistake (on this model) that they’re making...
This has always been my worry about all the proposals to upper-bound FLOPs, complicated by my uncertainty regarding whether LLMs are or are not AGI-complete after all.
One major positive effect this might have is memetic. It might create the impression of an (artificially created) AI Winter, causing people to reflexively give up. In addition, not having an (apparent) in-paradigm roadmap to AGI would likely dissolve the race dynamics, both between AGI companies and between geopolitical entities. If you can’t produce straight-line graphs suggesting godhood by 2027, and are reduced to “well we probably need a transformer-sized insight here...”, it becomes much harder to generate hype and alarm that would be legible to investors and politicians.
But then, in worlds in which LLMs are not AGI-complete, how much actual progress to AGI is happening due to the race dynamic? Is it more or less progress than would be produced by a much-downsized field in the counterfactual in which LLM research is banned? How much downsizing would it actually cause, now that the ideas of AGI and the Singularity have gone mainstream-ish? Comparatively, how much downsizing would be caused by the chilling effect if the presumably doomed LLM paradigm is let to run its course of disappointing everyone by 2030 (when the AGI labs can scale no longer)?
On balance, upper-bounding FLOPs is probably still a positive thing to do. But I’m not really sure.

Thane Ruthenis Apr 15, 2025, 5:33 PM
16 points
4
on: OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
So, since Altman asked so nicely, what are the most prominent examples of Altman potentially being corrupted by The Ring of Power? Here is an eightfold path.
Missing the “non-disparagement clauses enforced by PPUs allowing OpenAI to withhold equity it paid out as compensation” debacle. This is, IMO, a qualitatively different flavor of corruption. Maneuvering against Musk, stealing the non-profit, outplaying the board, being quieter on AI x-risk than he should be, not supporting regulations, the jingoist messaging – okay, none of this is good. But from a certain perspective, those are fair-play far-mode strategic moves.
Trying to get a death grip on the throats of your own employees and coworkers? That’s near-mode antisocial behavior, and is, in some ways, more indicative of the underlying character than any of the above. And unlike with the coup, it wasn’t even arguably in self-defense, it was proactive.

Thane Ruthenis Apr 15, 2025, 4:31 AM
4 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Oh, if you’re in the business of compiling a comprehensive taxonomy of ways the current AI thing may be fake, you should also add:
- Vibe coders and “10x’d engineers”, who (on this model) would be falling into one of the failure modes outlined here: producing applications/features that didn’t need to exist, creating pointless code bloat (which helpfully show up in productivity metrics like “volume of code produced” or “number of commits”), or “automatically generating” entire codebases in a way that feels magical, then spending so much time bugfixing them it eats up ~all perceived productivity gains.
- e/acc and other Twitter AI fans, who act like they’re bleeding-edge transhumanist visionaries/analysts/business gurus/startup founders, but who are just shitposters/attention-seekers who will wander off and never look back the moment the hype dies down.

Thane Ruthenis Apr 14, 2025, 4:40 AM
3 points
0
in reply to: O O’s comment on: Zach Stein-Perlman’s Shortform
Interesting. Source? Last I heard, they’re not hiring anyone because they expect SWE to be automated soon.

Thane Ruthenis Apr 14, 2025, 2:17 AM
1 point
−1
on: An Unbiased Evaluation of My Debate with Thane Ruthenis—Run It Yourself
Really, man?
> Thane is fairly open about his views, although he does engage in some dismissive language (e.g., “thoughtless kneejerk reaction”)
> Thane’s comments, particularly calling funnyfranco’s response a “thoughtless kneejerk reaction,”
I give ChatGPT a C- on reading comprehension.^[1] I suggest that you stop taking LLMs’ word as gospel. If it can misunderstand something that clear-cut this severely, how can you trust any other conclusions it draws? How can you even post an “unbiased evaluation” with an error this severe, not acknowledge its abysmal quality, and then turn around and lecture people about truth-seeking?
I definitely advice against going to LLMs for social validation.
Here’s Claude 3.7 taking my side, lest you assume I’m dismissing LLMs because they denounce me. For context, Anthropic doesn’t pass the user’s name to Claude and has no cross-session memory, so it didn’t know my identity, there was no system prompt, and the .pdfs were generated by just “right-click → print → save as PDF” on the relevant LW pages.
1. ^
  For context, if someone else stumbles on this trainwreck: I was sarcastically calling my response a “thoughtless kneejerk reaction”. ChatGPT apparently somehow concluded I’d been referring to funnyfranco’s writing. I wonder if it didn’t read the debate properly and just skimmed it? I mean, all the cool kids were doing it.

Thane Ruthenis 12 Apr 2025 20:46 UTC
7 points
2
in reply to: AnthonyC’s comment on: Why are neuro-symbolic systems not considered when it comes to AI Safety?
Let’s assume you or anyone else really did have a proposed path to AGI/ASI that would be in some important senses safer than our current path. Who is the entity for whom this would or would not be a “viable course?”
A new startup created specifically for the task. Examples: one, two.
Like, imagine that we actually did discover a non-DL AGI-complete architecture with strong safety guarantees, such that even MIRI would get behind it. Do you really expect that the project would then fail at the “getting funded”/”hiring personnel” stages?
tailcalled’s argument is the sole true reason: we don’t know of any neurosymbolic architecture that’s meaningfully safer than DL. (The people in the examples above are just adding to the AI-risk problem.) That said, I think the lack of alignment research going into it is a big mistake, mainly caused by the undertaking seeming too intimidating/challenging to pursue / by the streetlighting effect.

Thane Ruthenis 12 Apr 2025 18:24 UTC
7 points
0
on: What is autism?
@Steven Byrnes’ intense-world theory of autism seems like the sort of thing you’re looking for.

Thane Ruthenis 12 Apr 2025 15:41 UTC
2 points
0
in reply to: Noosphere89’s comment on: On Google’s Safety Plan
I agree that this isn’t an obviously unreasonable assumption to hold. But...
I don’t think the assumption is so likely to hold that one can assume it as part of a safety case for AI
… that.

Thane Ruthenis 12 Apr 2025 15:18 UTC
6 points
2
in reply to: johnswentworth’s comment on: Short Timelines don’t Devalue Long Horizon Research
The idea that all the labs focus on speeding up their own research threads rather than serving LLMs to customers is already pretty dubious. Developing LLMs and using them are two different skillsets; it would make economic sense for different entities to specialize in those things
I can maybe see it. Consider the possibility that the decision to stop providing public access to models past some capability level is convergent: e. g., the level at which they’re extremely useful for cyberwarfare (with jailbreaks still unsolved) such that serving the model would drown the lab in lawsuits/political pressure, or the point at which the task of spinning up an autonomous business competitive with human businesses, or making LLMs cough up novel scientific discoveries, becomes trivial (i. e., such that the skill level required for using AI for commercial success plummets – which would start happening inasmuch as AGI labs are successful in moving LLMs to the “agent” side of the “tool/agent” spectrum).
In those cases, giving public access to SOTA models would stop being the revenue-maximizing thing to do. It’d either damage your business reputation^[1], or it’d simply become more cost-effective to hire a bunch of random bright-ish people and get them to spin up LLM-wrapper startups in-house (so that you own 100% stake in them).
Some loose cannons/open-source ideologues like DeepSeek may still provide free public access, but those may be few and far between, and significantly further behind. (And getting progressively scarcer; e. g., the CCP probably won’t let DeepSeek keep doing it.)
Less extremely, AGI labs may move to a KYC-gated model of customer access, such that only sufficiently big, sufficiently wealthy entities are able to get access to SOTA models. Both because those entities won’t do reputation-damaging terrorism, and because they’d be the only ones able to pay the rates (see OpenAI’s maybe-hype maybe-real whispers about $20,000/month models).^[2] And maybe some EA/R-adjacent companies would be able to get in on that, but maybe not.
Also,
no lab has a significant moat, and the cutting edge is not kept private for long, and those facts look likely to remain true for a while
This is a bit flawed, I think. I think the situation is that runner-ups aren’t far behind the leaders in wall-clock time. Inasmuch as the progress is gradual, this translates to runner-ups being not-that-far-behind the leaders in capability level. But if AI-2027-style forecasts come true, with the capability progress accelerating, a 90-day gap may become a “GPT-2 vs. GPT-4”-level gap. In which case alignment researchers having privileged access to true-SOTA models becomes important.
(Ideally, we’d have some EA/R-friendly company already getting cozy with e. g. Anthropic so that they can be first-in-line getting access to potential future research-level models so that they’d be able to provide access to those to a diverse portfolio of trusted alignment researchers...)
1. ^
  Even if the social benefits of public access would’ve strictly outweighed the harms on a sober analysis, the public outcry at the harms may be significant enough to make the idea commercially unviable. Asymmetric justice, etc.
2. ^
  Indeed, do we know it’s not already happening? I can easily imagine some megacorporations having had privileged access to o3 for months.

Thane Ruthenis 12 Apr 2025 14:15 UTC
7 points
0
in reply to: O O’s comment on: Zach Stein-Perlman’s Shortform
Orrr he’s telling comforting lies to tread the fine line between billion-dollar hype and nationalization-worthy panic.
Could realistically be either, but it’s probably the comforting-lies thing. Whatever the ground-truth reality may be, the AGI labs are not bearish.