Researcher at the Center on Long-Term Risk. All opinions my own.
Anthony DiGiovanni
you’d eventually meet copies of yourself
But a copy of me =/= me. I don’t see how you establish this equivalence without assuming the algorithmic ontology in the first place.
it’s not an independent or random sample
What kind of sample do you think it is?
Sure, but isn’t the whole source of weirdness the fact that it’s metaphysically unclear (or indeterminate) what the real “sampling procedure” is?
I don’t understand. It seems that when people appeal to the algorithmic ontology to motivate interesting decision-theoretic claims — like, say, “you should choose to one-box in Transparent Newcomb” — they’re not just taking a more general perspective. They’re making a substantive claim that it’s sensible to regard yourself as an algorithm, over and above your particular instantiation in concrete reality.
This post was a blog post day project. For its purpose of general sanity waterline-raising, I’m happy with how it turned out. If I still prioritized the kinds of topics this post is about, I’d say more about things like:
“equilibrium” and how it’s a misleading and ill-motivated frame for game theory, especially acausal trade;
why the logical/algorithmic ontology for decision theory is far from obviously preferable.
But I’ve come to think there are far deeper and higher-priority mistakes in the “orthodox rationalist worldview” (scare quotes because I know individuals’ views are less monolithic than that, of course). Mostly concerning pragmatism about epistemology and uncritical acceptance of precise Bayesianism. I wrote a bit about the problems with pragmatism here, and critiques of precise Bayesianism are forthcoming, though previewed a bit here.
Linkpost: Why Evidential Cooperation in Large Worlds might not be action-guiding
A while back I wrote up why I was skeptical of ECL. I think this basically holds up, with the disclaimers at the top of the post. But I don’t consider it that important compared to other things relevant to LW that people could be thinking about, so I decided to put it on my blog instead.
(I might misunderstand you. My impression was that you’re saying it’s valid to extrapolate from “model XYZ does well at RE-Bench” to “model XYZ does well at developing new paradigms and concepts.” But maybe you’re saying that the trend of LLM success at various things suggests we don’t need new paradigms and concepts to get AGI in the first place? My reply below assumes the former:)
I’m not saying LLMs can’t develop new paradigms and concepts, though. The original claim you were responding to was that success at RE-Bench in particular doesn’t tell us much about success at developing new paradigms and concepts. “LLMs have done various things some people didn’t expect them to be able to do” doesn’t strike me as much of an argument against that.
More broadly, re: your burden of proof claim, I don’t buy that “LLMs have done various things some people didn’t expect them to be able to do” determinately pins down an extrapolation to “the current paradigm(s) will suffice for AGI, within 2-3 years.” That’s not a privileged reference class forecast, it’s a fairly specific prediction.
I don’t think this distinction between old-paradigm/old-concepts and new-paradigm/new-concepts is going to hold up very well to philosophical inspection or continued ML progress; it smells similar to ye olde “do LLMs truly understand, or are they merely stochastic parrots?” and “Can they extrapolate, or do they merely interpolate?”
I find this kind of pattern-match pretty unconvincing without more object-level explanation. Why exactly do you think this distinction isn’t important? (I’m also not sure “Can they extrapolate, or do they merely interpolate?” qualifies as “ye olde,” still seems like a good question to me at least w.r.t. sufficiently out-of-distribution extrapolation.)
“Music is emotional” is something almost everyone can agree to, but, for some, emotional songs can be frequently tear-jerking and for others that never happens
I’m now curious if anyone thinks “this gave me chills” is just a metaphor. Music has literally given me chills quite a few times.
Adding to Jesse’s comment, the “We’ve often heard things along the lines of...” line refers both to personal communications and to various comments we’ve seen, e.g.:
[link]: “Since this intuition leads to the (surely false) conclusion that a rational beneficent agent might just as well support the For Malaria Foundation as the Against Malaria Foundation, it seems to me that we have very good reason to reject that theoretical intuition”
[link]: “including a few mildly stubborn credence functions in some judiciously chosen representors can entail effective altruism from the longtermist perspective is a fool’s errand. Yet this seems false”
[link]: “I think that if you try to get any meaningful mileage out of the maximality rule … basically everything becomes permissible, which seems highly undesirable”
(Also, as we point out in the post, this is only true insofar as you only use maximality, applied to total consequences. You can still regard obviously evil things as unacceptable on non-consequentialist grounds, for example.)
Without a clear definition of “winning,”
This is part of the problem we’re pointing out in the post. We’ve encountered claims of this “winning” flavor that haven’t been made precise, so we survey different things “winning” could mean more precisely, and argue that they’re inadequate for figuring out which norms of rationality to adopt.
The key claim is: You can’t evaluate which beliefs and decision theory to endorse just by asking “which ones perform the best?” Because the whole question is what it means to systematically perform better, under uncertainty. Every operationalization of “systematically performing better” we’re aware of is either:
Incomplete — like “avoiding dominated strategies”, which leaves a lot unconstrained;
A poorly motivated proxy for the performance we actually care about — like “doing what’s worked in the past”; or
Secretly smuggling in nontrivial non-pragmatic assumptions — like “doing what’s worked in the past, not because that’s what we actually care about, but because past performance predicts future performance”
This is what we meant to convey with this sentence: “On any way of making sense of those words, we end up either calling a very wide range of beliefs and decisions “rational”, or reifying an objective that has nothing to do with our terminal goals without some substantive assumptions.”
(I can’t tell from your comment if you agree with all of that. But, if this was all obvious to you, great! But we’ve often had discussions where someone appealed to “which ones perform the best?” in a way that misses these points.)
Sorry this was confusing! From our definition here:
We’ll use “pragmatic principles” to refer to principles according to which belief-forming or decision-making procedures should “perform well” in some sense.
“Avoiding dominated strategies” is pragmatic because it directly evaluates a decision procedure or set of beliefs based on its performance. (People do sometimes apply pragmatic principles like this one directly to beliefs, see e.g. this work on anthropics.)
Deference isn’t pragmatic, because the appropriateness of your beliefs is evaluated by how your beliefs relate to the person you’re deferring to. Someone could say, “You should defer because this tends to lead to good consequences,” but then they’re not applying deference directly as a principle — the underlying principle is “doing what’s worked in the past.”
Winning isn’t enough
at time 1 you’re in a strictly better epistemic position
Right, but 1-me has different incentives by virtue of this epistemic position. Conditional on being at the ATM, 1-me would be better off not paying the driver. (Yet 0-me is better off if the driver predicts that 1-me will pay, hence the incentive to commit.)
I’m not sure if this is an instance of what you call “having different values” — if so I’d call that a confusing use of the phrase, and it doesn’t seem counterintuitive to me at all.
(I might not reply further because of how historically I’ve found people seem to simply have different bedrock intuitions about this, but who knows!)
I intrinsically only care about the real world (I find the Tegmark IV arguments against this pretty unconvincing). As far as I can tell, the standard justification for acting as if one cares about nonexistent worlds is diachronic norms of rationality. But I don’t see an independent motivation for diachronic norms, as I explain here. Given this, I think it would be a mistake to pretend my preferences are something other than what they actually are.
Thanks for clarifying!
covered under #1 in my list of open questions
To be clear, by “indexical values” in that context I assume you mean indexing on whether a given world is “real” vs “counterfactual,” not just indexical in the sense of being egoistic? (Because I think there are compelling reasons to reject UDT without being egoistic.)
I strongly agree with this, but I’m confused that this is your view given that you endorse UDT. Why do you think your future self will honor the commitment of following UDT, even in situations where your future self wouldn’t want to honor it (because following UDT is not ex interim optimal from his perspective)?
I’m afraid I don’t understand your point — could you please rephrase?
I understand, I’m just rejecting the premise that “same physical structure” implies identity to me. (Perhaps confusingly, despite the fact that I’m defending the “physicalist ontology” in the context of this thread (in contrast to algorithmic ontology), I reject physicalism in the metaphysics sense.)
This also seems tangential, though, because the substantive appeals to the algorithmic ontology that get made in the decision theory context aren’t about physically instantiated copies. They’re about non-physically-instantiated copies of your algorithm. I unfortunately don’t know of a reference for this off the top of my head, but it has come up in some personal communications FWIW.