johnswentworth

Karma: 49,264

johnswentworth 21 Nov 2024 16:55 UTC
12 points
6
in reply to: johnswentworth’s comment on: Why Don’t We Just… Shoggoth+Face+Paraphraser?
One example, to add a little concreteness: suppose that the path to AGI is to scale up o1-style inference-time compute, but it requires multiple OOMs of scaling. So it no longer has a relatively-short stream of “internal” thought, it’s more like the natural-language record of an entire simulated society.
Then:
- There is no hope of a human reviewing the whole thing, or any significant fraction of the whole thing. Even spot checks don’t help much, because it’s all so context-dependent.
- Accurate summarization would itself be a big difficult research problem.
- There’s likely some part of the simulated society explicitly thinking about intentional deception, even if the system as a whole is well aligned.
- … but that’s largely irrelevant, because in the context of a big complex system like a whole society, the effects of words are very decoupled from their content. Think of e.g. a charity which produces lots of internal discussion about reducing poverty, but frequently has effects entirely different from reducing poverty. The simulated society as a whole might be superintelligent, but its constituent simulated subagents are still pretty stupid (like humans), so their words decouple from effects (like humans’ words).
… and that’s how the proposal breaks down, for this example.

johnswentworth 21 Nov 2024 4:38 UTC
10 points
11
in reply to: Daniel Kokotajlo’s comment on: Why Don’t We Just… Shoggoth+Face+Paraphraser?
I haven’t decided yet whether to write up a proper “Why Not Just...” for the post’s proposal, but here’s an overcompressed summary. (Note that I’m intentionally playing devil’s advocate here, not giving an all-things-considered reflectively-endorsed take, but the object-level part of my reflectively-endorsed take would be pretty close to this.)
Charlie’s concern isn’t the only thing it doesn’t handle. The only thing this proposal does handle is an AI extremely similar to today’s, thinking very explicitly about intentional deception, and even then the proposal only detects it (as opposed to e.g. providing a way to solve the problem, or even a way to safely iterate without selecting against detectability). And that’s an extremely narrow chunk of the X-risk probability mass—any significant variation in the AI breaks it, any significant variation in the threat model breaks it. The proposal does not generalize to anything.
Charlie’s concern is just one specific example of a way in which the proposal does not generalize. A proper “Why Not Just...” post would list a bunch more such examples.
And as with Charlie’s concern, the meta-level problem is that the proposal also probably wouldn’t get us any closer to handling those more-general situations. Sure, we could make some very toy setups (like the chess thing), and see what the shoggoth+face AI does on those very toy setups, but we get very few bits, and the connection is very tenuous to both other threat models and AIs with any significant differences from the shoggoth+face. Accounting for the inevitable failure to measure what we think we’re measuring (with probability close to 1), such experiments would not actually get us any closer to solving any of the problems which constitute the bulk of the X-risk probability mass. It’s not “a start”, because “a start” would imply that the experiment gets us closer, i.e. that the problem gets easier after doing the experiment. If you try to think about the You Are Not Measuring What You Think You Are Measuring problem as “well, we got at least some tiny epsilon of evidence, right?”, then you will shoot yourself in the foot; such reasoning is technically correct, but the correct value of epsilon is small enough that the correct update from it is not distinguishable from zero in practice.

johnswentworth 21 Nov 2024 2:27 UTC
10 points
11
in reply to: Daniel Kokotajlo’s comment on: Why Don’t We Just… Shoggoth+Face+Paraphraser?
The problem with that sort of attitude is that, when the “experiment” yields so few bits and has such a tenuous connection to the thing we actually care about (as in Charlie’s concern), that’s exactly when You Are Not Measuring What You Think You Are Measuring bites real hard. Like, sure, you’ll see this system do something in the toy chess experiment, but that’s just not going to be particularly relevant to the things an actual smarter-than-human AI does in the situations Charlie’s concerned about. If anything, the experimenter is far more to likely to fool themselves into thinking their results are relevant to Charlie’s concern than they are to correctly learn anything relevant to Charlie’s concern.

johnswentworth 18 Nov 2024 21:03 UTC
6 points
4
in reply to: Leon Lang’s comment on: Leon Lang’s Shortform
I think this misunderstands what discussion of “barriers to continued scaling” is all about. The question is whether we’ll continue to see ROI comparable to recent years by continuing to do the same things. If not, well… there is always, at all times, the possibility that we will figure out some new and different thing to do which will keep capabilities going. Many people have many hypotheses about what those new and different things could be: your guess about interaction is one, inference time compute is another, synthetic data is a third, deeply integrated multimodality is a fourth, and the list goes on. But these are all hypotheses which may or may not pan out, not already-proven strategies, which makes them a very different topic of discussion than the “barriers to continued scaling” of the things which people have already been doing.

johnswentworth 17 Nov 2024 22:40 UTC
2 points
0
in reply to: Bogdan Ionut Cirstea’s comment on: johnswentworth’s Shortform
Some of the underlying evidence, like e.g. Altman’s public statements, is relevant to other forms of scaling. Some of the underlying evidence, like e.g. the data wall, is not. That cashes out to differing levels of confidence in different versions of the prediction.

johnswentworth 15 Nov 2024 23:50 UTC
5 points
3
in reply to: Linch’s comment on: The Median Researcher Problem
Oh I see, you mean that the observation is weak evidence for the median model relative to a model in which the most competent researchers mostly determine memeticity, because higher median usually means higher tails. I think you’re right, good catch.

johnswentworth 15 Nov 2024 23:11 UTC
1 point
−11
in reply to: Vladimir_Nesov’s comment on: johnswentworth’s Shortform
FYI, my update from this comment was:
- Hmm, seems like a decent argument...
- … except he said “we don’t know that it doesn’t work”, which is an extremely strong update that it will clearly not work.

johnswentworth 15 Nov 2024 22:07 UTC
5 points
3
in reply to: Leon Lang’s comment on: johnswentworth’s Shortform
Still very plausible as a route to continued capabilities progress. Such things will have very different curves and economics, though, compared to the previous era of scaling.

johnswentworth 15 Nov 2024 21:20 UTC
6 points
−9
in reply to: Vladimir_Nesov’s comment on: johnswentworth’s Shortform
I don’t expect that to be particularly relevant. The data wall is still there; scaling just compute has considerably worse returns than the curves we’ve been on for the past few years, and we’re not expecting synthetic data to be anywhere near sufficient to bring us close to the old curves.

johnswentworth 15 Nov 2024 21:10 UTC
2 points
0
in reply to: Linch’s comment on: The Median Researcher Problem
unless you additionally posit an additional mechanism like fields with terrible replication rates have a higher standard deviation than fields without them
Why would that be relevant?

johnswentworth 15 Nov 2024 16:52 UTC
37 points
12
on: johnswentworth’s Shortform
Regarding the recent memes about the end of LLM scaling: David and I have been planning on this as our median world since about six months ago. The data wall has been a known issue for a while now, updates from the major labs since GPT-4 already showed relatively unimpressive qualitative improvements by our judgement, and attempts to read the tea leaves of Sam Altman’s public statements pointed in the same direction too. I’ve also talked to others (who were not LLM capability skeptics in general) who had independently noticed the same thing and come to similar conclusions.
Our guess at that time was that LLM scaling was already hitting a wall, and this would most likely start to be obvious to the rest of the world around roughly December of 2024, when the expected GPT-5 either fell short of expectations or wasn’t released at all. Then, our median guess was that a lot of the hype would collapse, and a lot of the investment with it. That said, since somewhere between 25%-50% of progress has been algorithmic all along, it wouldn’t be that much of a slowdown to capabilities progress, even if the memetic environment made it seem pretty salient. In the happiest case a lot of researchers would move on to other things, but that’s an optimistic take, not a median world.
(To be clear, I don’t think you should be giving us much prediction-credit for that, since we didn’t talk about it publicly. I’m posting mostly because I’ve seen a decent number of people for whom the death of scaling seems to be a complete surprise and they’re not sure whether to believe it. For those people: it’s not a complete surprise, this has been quietly broadcast for a while now.)

johnswentworth 12 Nov 2024 18:54 UTC
5 points
0
in reply to: habryka’s comment on: johnswentworth’s Shortform
I am posting this now mostly because I’ve heard it from multiple sources. I don’t know to what extent those sources are themselves correlated (i.e. whether or not the rumor started from one person).

johnswentworth 12 Nov 2024 18:04 UTC
31 points
−26
on: johnswentworth’s Shortform
Epistemic status: rumor.
Word through the grapevine, for those who haven’t heard: apparently a few months back OpenPhil pulled funding for all AI safety lobbying orgs with any political right-wing ties. They didn’t just stop funding explicitly right-wing orgs, they stopped funding explicitly bipartisan orgs.

johnswentworth 9 Nov 2024 23:18 UTC
3 points
0
in reply to: drozdj’s comment on: The Median Researcher Problem
I don’t think statistics incompetence is the One Main Thing, it’s just an example which I expect to be relatively obvious and legible to readers here.

johnswentworth 7 Nov 2024 17:49 UTC
4 points
0
in reply to: Raemon’s comment on: Abstractions are not Natural
The way I think of it, it’s not quite that some abstractions are cheaper to use than others, but rather:
- One can in-principle reason at the “low(er) level”, i.e. just not use any given abstraction. That reasoning is correct but costly.
- One can also just be wrong, e.g. use an abstraction which doesn’t actually match the world and/or one’s own lower level model. Then predictions will be wrong, actions will be suboptimal, etc.
- Reasoning which is both cheap and correct routes through natural abstractions. There’s some degrees of freedom insofar as a given system could use some natural abstractions but not others, or be wrong about some things but not others.

johnswentworth 6 Nov 2024 23:02 UTC
3 points
3
in reply to: Lorxus’s comment on: Some Rules for an Algebra of Bayes Nets
Proof that the quoted bookkeeping rule works, for the exact case:
- The original DAG $G$ asserts $P [X] = \prod_{i} P [X_{i} | X_{p a^{G} (i)}]$
- If $G^{'}$ just adds an edge from $j$ to $k$ , then $G^{'}$ says $P [X] = P [X_{k} | X_{p a^{G} (k)}, X_{j}] \prod_{i \neq k} P [X_{i} | X_{p a^{G} (i)}]$
- The original DAG’s assertion $P [X] = \prod_{i} P [X_{i} | X_{p a^{G} (i)}]$ also implies $P [X_{k} | X_{p a^{G} (k)}, X_{j}] = P [X_{k} | X_{p a^{G} (k)}]$ , and therefore implies $G^{'}$ ’s assertion $P [X] = P [X_{k} | X_{p a^{G} (k)}, X_{j}] \prod_{i \neq k} P [X_{i} | X_{p a^{G} (i)}]$ .
The approximate case then follows by the new-and-improved Bookkeeping Theorem.
Not sure where the disconnect/confusion is.

johnswentworth 5 Nov 2024 17:11 UTC
6 points
3
in reply to: Alfred Harwood’s comment on: Abstractions are not Natural
All dead-on up until this:
… the universe will force them to use the natural abstractions (or else fail to achieve their goals). [...] Would the argument be that unnatural abstractions are just in practice not useful, or is it that the universe is such that its ~impossible to model the world using unnatural abstractions?
It’s not quite that it’s impossible to model the world without the use of natural abstractions. Rather, it’s far instrumentally “cheaper” to use the natural abstractions (in some sense). Rather than routing through natural abstractions, a system with a highly capable world model could instead e.g. use exponentially large amounts of compute (e.g. doing full quantum-level simulation), or might need enormous amounts of data (e.g. exponentially many training cycles), or both. So we expect to see basically-all highly capable systems use natural abstractions in practice.

johnswentworth 5 Nov 2024 17:04 UTC
8 points
4
in reply to: zoop’s comment on: The Median Researcher Problem
The problem with this model is that the “bad” models/theories in replication-crisis-prone fields don’t look like random samples from a wide posterior. They have systematic, noticeable, and wrong (therefore not just coming from the data) patterns to them—especially patterns which make them more memetically fit, like e.g. fitting a popular political narrative. A model which just says that such fields are sampling from a noisy posterior fails to account for the predictable “direction” of the error which we see in practice.

johnswentworth 4 Nov 2024 17:33 UTC
20 points
13
on: Abstractions are not Natural
Walking through your first four sections (out of order):
- Systems definitely need to be interacting with mostly-the-same environment in order for convergence to kick in. Insofar as systems are selected on different environments and end up using different abstractions as a result, that doesn’t say much about NAH.
- Systems do not need to have similar observational apparatus, but the more different the observational apparatus the more I’d expect that convergence requires relatively-high capabilities. For instance: humans can’t see infrared/UV/microwave/radio, but as human capabilities increased all of those became useful abstractions for us.
- Systems do not need to be subject to similar selection pressures/constraints or have similar utility functions; a lack of convergence among different pressures/constraints/utility is one of the most canonical things which would falsify NAH. That said, the pressures/constraints/utility (along with the environment) do need to incentivize fairly general-purpose capabilities, and the system needs to actually achieve those capabilities.
More general comment: the NAH says that there’s a specific, discrete set of abstractions in any given environment which are “natural” for agents interacting with that environment. The reason that “general-purpose capabilities” are relevant in the above is that full generality and capability requires being able to use ~all those natural abstractions (possibly picking them up on the fly, sometimes). But a narrower or less-capable agent will still typically use some subset of those natural abstractions, and factors like e.g. similar observational apparatus or similar pressures/utility will tend to push for more similar subsets among weaker agents. Even in that regime, nontrivial NAH predictions come from the discreteness of the set of natural abstractions; we don’t expect to find agents e.g. using a continuum of abstractions.

johnswentworth 3 Nov 2024 19:48 UTC
7 points
0
in reply to: Kaj_Sotala’s comment on: The Median Researcher Problem
Our broader society has community norms which require basically everyone to be literate. Nonetheless, there are jobs in which one can get away without reading, and the inability to read does not make it that much harder to make plenty of money and become well-respected. These statements are not incompatible.