The Y-axis seemed to me like roughly ‘populist’.
Nick_Tarleton
The impressive performance we have obtained is because supervised (in this case technically “self-supervised”) learning is much easier than e.g. reinforcement learning and other paradigms that naturally learn planning policies. We do not actually know how to overcome this barrier.
What about current reasoning models trained using RL? (Do you think something like, we don’t know, and won’t easily figure out, how to make that work well outside a narrow class of tasks that doesn’t include ‘anything important’?)
Few people who take radical veganism and left-anarchism seriously either ever kill anyone, or are as weird as the Zizians, so that can’t be the primary explanation. Unless you set a bar for ‘take seriously’ that almost only they pass, but then, it seems relevant that (a) their actions have been grossly imprudent and predictably ineffective by any normal standard + (b) the charitable[1] explanations I’ve seen offered for why they’d do imprudent and ineffective things all involve their esoteric beliefs.
I do think ‘they take [uncommon, but not esoteric, moral views like veganism and anarchism] seriously’ shouldn’t be underrated as a factor, and modeling them without putting weight on it is wrong.
- ^
to their rationality, not necessarily their ethics
- ^
I don’t think it’s an outright meaningless comparison, but I think it’s bad enough that it feels misleading or net-negative-for-discourse to describe it the way your comment did. Not sure how to unpack that feeling further.
https://artificialanalysis.ai/leaderboards/providers claims that Cerebras achieves that OOM performance, for a single prompt, for 70B-parameter models. So nothing as smart as R1 is currently that fast, but some smart things come close.
I don’t see how it’s possible to make a useful comparison this way; human and LLM ability profiles, and just the nature of what they’re doing, are too different. An LLM can one-shot tasks that a human would need non-typing time to think about, so in that sense this underestimates the difference, but on a task that’s easy for a human but the LLM can only do with a long chain of thought, it overestimates the difference.
Put differently: the things that LLMs can do with one shot and no CoT imply that they can do a whole lot of cognitive work in a single forward pass, maybe a lot more than a human can ever do in the time it takes to type one word. But that cognitive work doesn’t compound like a human’s; it has to pass through the bottleneck of a single token, and be substantially repeated on each future token (at least without modifications like Coconut).
(Edit: The last sentence isn’t quite right — KV caching means the work doesn’t have to all be recomputed, though I would still say it doesn’t compound.)
I don’t really have an empirical basis for this, but: If you trained something otherwise comparable to, if not current, then near-future reasoning models without any mention of angular momentum, and gave it a context with several different problems to which angular momentum was applicable, I’d be surprised if it couldn’t notice that was a common interesting quantity, and then, in an extension of that context, correctly answer questions about it. If you gave it successive problem sets where the sum of that quantity was applicable, the integral, maybe other things, I’d be surprised if a (maybe more powerful) reasoning model couldn’t build something worth calling the ability to correctly answer questions about angular momentum. Do you expect otherwise, and/or is this not what you had in mind?
It seems right to me that “fixed, partial concepts with fixed, partial understanding” that are “mostly ‘in the data’” likely block LLMs from being AGI in the sense of this post. (I’m somewhat confused / surprised that people don’t talk about this more — I don’t know whether to interpret that as not noticing it, or having a different ontology, or noticing it but disagreeing that it’s a blocker, or thinking that it’ll be easy to overcome, or what. I’m curious if you have a sense from talking to people.)
These also seem right
“LLMs have a weird, non-human shaped set of capabilities”
“There is a broken inference”
“we should also update that this behavior surprisingly turns out to not require as much general intelligence as we thought”
“LLMs do not behave with respect to X like a person who understands X, for many X”
(though I feel confused about how to update on the conjunction of those, and the things LLMs are good at — all the ways they don’t behave like a person who doesn’t understand X, either, for many X.)
But: you seem to have a relatively strong prior[1] on how hard it is to get from current techniques to AGI, and I’m not sure where you’re getting that prior from. I’m not saying I have a strong inside view in the other direction, but, like, just for instance — it’s really not apparent to me that there isn’t a clever continuous-training architecture, requiring relatively little new conceptual progress, that’s sufficient; if that’s less sample-efficient than what humans are doing, it’s not apparent to me that it can’t still accomplish the same things humans do, with a feasible amount of brute force. And it seems like that is apparent to you.
Or, looked at from a different angle: to my gut, it seems bizarre if whatever conceptual progress is required takes multiple decades, in the world I expect to see with no more conceptual progress, where probably:
AI is transformative enough to motivate a whole lot of sustained attention on overcoming its remaining limitations
AI that’s narrowly superhuman on some range of math & software tasks can accelerate research
- ^
It’s hard for me to tell how strong: “—though not super strongly” is hard for me to square with your butt-numbers, even taking into account that you disclaim them as butt-numbers.
To be more object-level than Tsvi:
o1/o3/R1/R1-Zero seem to me like evidence that “scaling reasoning models in a self-play-ish regime” can reach superhuman performance on some class of tasks, with properties like {short horizons, cheap objective verifiability, at most shallow conceptual innovation needed} or maybe some subset thereof. This is important! But, for reasons similar to this part of Tsvi’s post, it’s a lot less apparent to me that it can get to superintelligence at all science and engineering tasks.
Also the claim that Ziz “did the math” with relation to making decisions using FDT-ish theories
IMO Eliezer correctly identifies a crucial thing Ziz got wrong about decision theory:
… the misinterpretation “No matter what, I must act as if everyone in the world will perfectly predict me, even though they won’t.” …
i think “actually most of your situations do not have that much subjunctive dependence” is pretty compelling personally
it’s not so much that most of the espoused decision theory is fundamentally incorrect but rather that subjunctive dependence is an empirical claim about how the world works, can be tested empirically, and seems insufficiently justified to me
however i think the obvious limitation of this kind of approach is that it has no model for ppl behaving incoherent ways except as a strategy for gaslighting ppl about how accountable you are for your actions. this is a real strategy ppl often do but is not the whole of it imo
this is implied by how, as soon as ppl are not oppressing you “strategically”, the game theory around escalation breaks. by doing the Ziz approach, you wind up walking into bullets that were not meant for you, or maybe anyone, and have exerted no power here or counterfactually
Let’s look at preference for eating lots of sweets, for example. Society tries to teach us not to eat too much sweets because it’s unhealthy, and from the perspective of someone who likes eating sweets, this often feels coercive. Your explanation applied here would be that upon reflection, people will decide “Actually, eating a bunch of candy every day is great”—and no doubt, to a degree that is true, at least with the level of reflection that people actually do.
However when I decided to eat as much sweet as I wanted, I ended up deciding that sweets were gross, except in very small amounts or as a part of extended exercise where my body actually needs the sugar. What’s happening here is that society has a bit more wisdom than the candy loving kid, tries clumsily to teach the foolish kid that their ways are wrong and they’ll regret it, and often ends up succeeding more in constraining behavior than integrating the values in a way that the kid can make sense of upon reflection.
The OP addresses cases like this:
One thing that can cause confusion here—by design—is that perverted moralities are stabler if they also enjoin nonperversely good behaviors in most cases. This causes people to attribute the good behavior to the system of threats used to enforce preference inversion, imagining that they would not be naturally inclined to love their neighbor, work diligently for things they want, and rest sometimes. Likewise, perverted moralities also forbid many genuinely bad behaviors, which primes people who must do something harmless but forbidden to accompany it with needlessly harmful forbidden behaviors, because that’s what they’ve been taught to expect of themselves.
I agree that the comment you’re replying to is (narrowly) wrong (if understanding ‘prior’ as ‘temporally prior’), because someone might socially acquire a preference not to overeat sugar before they get the chance to learn they don’t want to overeat sugar. ISTM this is repaired by comparing not to ‘(temporally) prior preference’ but something like ‘reflectively stable preference absent coercive pressure’.
I can easily imagine an argument that: SBF would be safe to release in 25 years, or for that matter tomorrow, not because he’d be decent and law-abiding, but because no one would trust him and the only crimes he’s likely to (or did) commit depend on people trusting him. I’m sure this isn’t entirely true, but it does seem like being world-infamous would have to mitigate his danger quite a bit.
More generally — and bringing it back closer to the OP — I feel interested in when, and to what extent, future harms by criminals or norm-breakers can be prevented just by making sure that everyone knows their track record and can decide not to trust them.
Though — I haven’t read all of his recent novels, but I think — none of those are (for lack of a better word) transhumanist like Permutation City or Diaspora, or even Schild’s Ladder or Incandescence. Concretely: no uploads, no immortality, no artificial minds, no interstellar civilization. I feel like this fits the pattern, even though the wildness of the physics doesn’t. (And each of those four earlier novels seems successively less about the implications of uploading/immortality/etc.)
In practice, it just requires hardware with limited functionality and physical security — hardware security modules exist.
An HSM-analogue for ML would be a piece of hardware that can have model weights loaded into its nonvolatile memory, can perform inference, but doesn’t provide a way to get the weights out. (If it’s secure enough against physical attack, it could also be used to run closed models on a user’s premises, etc.; there might be a market for that.)
This doesn’t work. (Recording is Linux Firefox; same thing happens in Android Chrome.)
An error is logged when I click a second time (and not when I click on a different probability):
[GraphQL error]: Message: null value in column "prediction" of relation "ElicitQuestionPredictions" violates not-null constraint, Location: line 2, col 3, Path: MakeElicitPrediction instrument.ts:129:35
How can I remove an estimate I created with an accidental click? (Said accidental click is easy to make on mobile, especially because the way reactions work there has habituated me to tapping to reveal hidden information and not expecting doing so to perform an action.)
If specifically with IQ, feel free to replace the word with “abstract units of machine intelligence” wherever appropriate.
By calling it “IQ”,
you were(EDIT: the creator of that table was) saying that gpt4o is comparable to a 115 IQ human, etc.If you don’t intend that claim, if that replacement would preserve your meaning, you shouldn’t have called it IQ.(IMO that claim doesn’t make sense — LLMs don’t have human-like ability profiles.)
Learning on-the-fly remains, but I expect some combination of sim2real and muZero to work here.
Hmm? sim2real AFAICT is an approach to generating synthetic data, not to learning. MuZero is a system that can learn to play a bunch of games, with an architecture very unlike LLMs. This sentence doesn’t typecheck for me; what way of combining these concepts with LLMs are you imagining?
I don’t think it much affects the point you’re making, but the way this is phrased conflates ‘valuing doing X oneself’ and ‘valuing that X exist’.
I don’t feel a different term is needed/important, but n=1, due to some uses I’ve seen of ‘lens’ as a technical metaphor it strongly makes me think ‘different mechanically-generated view of the same data/artifact’, not ‘different artifact that’s (supposed to be) about the same subject matter’, so I find the usage here a bit disorienting at first.