Veedrac

Karma: 1,276

Veedrac May 19, 2025, 9:03 AM
2 points
0
in reply to: cubefox’s comment on: leogao’s Shortform
It’s just Bayes, but I’ll give it a shot.
You’re having a conversation with someone. They believe certain things are more probable than other things. They mention a reference class: if you look at this grouping of claims, most of them are wrong. Then you consider the set of hypotheses: under each of them, how plausible is it given the noted tendency for this grouping of claims to be wrong? Some of them pass easily, eg. the hypothesis that this is just another such claim. Some of them less easily; they are either a modal part of this group and uncommon on base rate, or else nonmodal or not part of the group at all. You continue, with maybe a different reference class, or an observation about the scenario.
Hopefully this illustrates the point. Reference classes are just evidence about the world. There’s no special operation needed for them.

Veedrac May 19, 2025, 8:49 AM
2 points
0
in reply to: Ben Pace’s comment on: leogao’s Shortform
Firstly, it’s just not more reasonable. When you ask yourself “Is a machine learning run going to lead to human extinction?” you should not first say “How trustworthy are people who have historically claimed the world is ending?”
But you should absolutely ask “does it look like I’m making the same mistakes they did, and how would I notice if it were so?” Sometimes one is indeed in a cult with your methods of reason subverted, or having a psychotic break, or captured by a content filter that hides the counterevidence, or many of the more mundane and pervasive failures in kind.

Veedrac May 19, 2025, 8:43 AM
2 points
0
in reply to: Ben Pace’s comment on: leogao’s Shortform
c. I don’t get this one. I’m pretty sure I said that if you believe that you’re in a highly adversarial epistemic environment, then you should become more distrusting of evidence about memetically fit claims.
Well, sure, it’s just you seemed to frame this as a binary on/off thing, sometimes you’re exposed and need to count it and sometimes you’re not, whereas to me it’s basically never implausible that a belief has been exposed to selection pressures, and the question is of probabilities and degrees.

Veedrac May 19, 2025, 8:41 AM
2 points
0
in reply to: cubefox’s comment on: leogao’s Shortform
I think you’re underestimating the inferential gap here. I’m not sure why you’d think the Bayes updating rule is meant to “tell you anything about” the original post. My claim was that the whole proposal about selecting reference classes was framed badly and you should just do (approximate) Bayes instead.

Veedrac May 19, 2025, 8:30 AM
3 points
1
in reply to: leogao’s comment on: leogao’s Shortform
I think the framing that sits better to me is ‘You should meet people where they’re at.’ If they seem like they need confidence that you’re arguing from a place of reason, that’s probably indeed the place to start.

Veedrac May 19, 2025, 8:24 AM
2 points
0
in reply to: cubefox’s comment on: leogao’s Shortform
What argument are you referring to when you say “doesn’t tell you anything about the original argument”?
My framing is basically this: you generally don’t start a conversation with someone as a blank pre-priors slate that you get to inject your priors into. The prior is what you get handed, and then the question is how people should respond to the evidence and arguments available. Well, you should use (read: approximate) the basic Bayesian update rule: hypotheses where an observation is unlikely are that much less probable.

Veedrac May 19, 2025, 7:41 AM
2 points
0
in reply to: cubefox’s comment on: leogao’s Shortform
I agree this is an interesting philosophical question but again I’m not sure why you’re bringing it up.
Given your link maybe you think me mentioning Bayes was referring to some method of selecting a single final hypothesis? I’m not, I’m using it to refer to the Bayesian update rule.

Veedrac May 19, 2025, 12:41 AM
4 points
0
in reply to: Ben Pace’s comment on: leogao’s Shortform
The heuristic “be more skeptical of claims that would have big implications if true” makes sense only when you suspect a claim may have been adversarially optimized for memetic fitness; it is not otherwise true that “a claim that something really bad is going to happen is fundamentally less likely to be true than other claims”.
This seems wrong to me.
a. More smaller things happen and there are fewer kinds of smaller thing that happen.
b. I bet people genuinely have more evidence for small claims they state than big ones on average.
c. The skepticism you should have because particular claims are frequently adversarially generated shouldn’t first depend on deciding to be skeptical about it.
If you’ll forgive the lack of charity, ISTM that leogao is making IMO largely true points about the reference class and then doing the wrong thing with those points, and you’re reacting to the thing being done wrong at the end, but trying to do this in part by disagreeing with the points being made about the reference class. leogao is right that people are reasonable in being skeptical of this class of claims on priors, and right that when communicating with someone it’s often best to start within their framing. You are right that regardless it’s still correct to evaluate the sum of evidence for and against a proposition, and that other people failing to communicate honestly in this reference class doesn’t mean we ought to throw out or stop contributing to the good faith conversations avaialable to us.

Veedrac May 19, 2025, 12:07 AM
2 points
0
in reply to: cubefox’s comment on: leogao’s Shortform
I’m not really sure what that has to do with my comment. My point is the original post seemed to be operating as if you look for the argmax reference class, you start there, and then you allow arguments. My point isn’t that their prior is wrong, it’s that this whole operation is wrong.
I think also you’re maybe assuming I’m saying the prior looks something like {reference class A, reference class B} and arguing about the relative probability of each, but it doesn’t, a prior should be over all valid explanations of the prior evidence. Reference classes come in because they’re evidence about base rates of particular causal structures; you can say ‘given the propensity for the world to look this way, how should I be correcting the probability of the hypotheses under consideration? Which new hypotheses should I be explicitly tracking?’
I can see where the original post might have gone astray. People have limits on what they can think about and it’s normal to narrow one’s consideration to the top most likely hypothesis. But it’s important to be aware of what you’re approximating here, else you get into a confusion where you have two valid reference classes and you start telling people that there’s a correct one to start arguing from.

Veedrac May 18, 2025, 3:40 AM
4 points
2
in reply to: leogao’s comment on: leogao’s Shortform
This is kind of missing the point of Bayes. One shouldn’t “choose” a reference class to update on. One should update to the best of your ability on the whole distribution of hypotheses available to describe the situation. Neither is a ‘right’ or ‘wrong’ reference class to use, they’re both just valid pieces of evidence about base rates, and you should probably be using both of them.

Veedrac May 12, 2025, 3:06 AM
4 points
0
on: a confusion about preference orderings
It seems correct that if you have a planner operating over the full set of transition you’ll ever see, the it can avoid money-pumping of this kind. However it doesn’t seem to hold that this prevents money-pumping in general.
Consider, you have a preference that at some specific future time you will get C > B > A > C, and this preference is independent of how much cash you have. You have a a voucher that will pay out to A at that time, which you value only instrumentally. You start with no access to any trades.
Someone comes up to you and offers you a voucher for B in exchange for your voucher for A and epsilon cash. Let’s not assume any knowledge of any future trades you might be offered. What decision do you make? If a preference is useful for anything it surely should inform you what decision to make, and if preference is meant for anything it should surely say that the chosen actions result in it. If the result of planning is anything but ‘take the trade’ I don’t know what properties this preference ordering has that would give it that label.
Though, I suppose this argument generalizes. If you believe only C > B > A, you might still take a trade from B to A if you expect it’s sufficiently more probable to let you trade to C. If you have persistently wrong non-static expectations, those might be pumped for money the same way. Another way to put this is maybe that trying to value access to states independently of their trade value doesn’t work.
I’ve no doubt that it’s possible to make a planner that’s somewhat robust to inconsistent preference ordering, but this does seem driven by the fact we’ve not given the term ‘preference ordering’ any coherent meaning yet and have no coherent measure of success. Once you demand those it would be easier to point at any flaws.

Veedrac May 8, 2025, 6:55 PM
2 points
−2
in reply to: winstonBosan’s comment on: Chess—“Elo” of random play?
Consider as a near-limiting case, imagine an engine that before the game began flipped a coin. On heads, it plays the game as Stockfish. On tails, it plays the game as Worstfish. What is this engine’s ELO?
I’m short of time to go into details but this should help illustrate why one should be careful treating ELO as a well defined space rather than a local approximation that’s empirically useful for computationally-limited players.

Veedrac May 1, 2025, 7:27 AM
4 points
0
in reply to: Seth Herd’s comment on: Veedrac’s Shortform
Standard xrisk arguments generally don’t extrapolate down to systems that don’t solve tasks that require instrumental goals. I think it’s reasonable to say common LLMs don’t exhibit many instrumental goals, but they also can’t solve for long-horizon goal-directed problem solving.
Prosaic risks like biorisk evals often go further and ask, if we assume the AI systems aren’t themselves very capable at this task, can we still exhibit dangerous behaviors from them ‘in the loop’? These are legitimate and interesting questions but they are a different thing.

Veedrac May 1, 2025, 12:44 AM
4 points
0
in reply to: Seth Herd’s comment on: Veedrac’s Shortform
I think Robert Miles does excellent introductory videos for newer people, and I linked him in the HN post. My goal here was different, though, which was to give a short, affirmative argument made of only directly defensible high probability claims.
I like your spin on it, too, more than those given in the linked thread, but it’s still looser, and I think there’s value giving an argument where it’s harder to disagree with the conclusion without first disagreeing with a premise. Eg. ‘some optimists assume we just won’t make AI with goals’ directly contradicts ‘capable systems necessarily have instrumental goals’, but I’m not sure it directly contradicts a premise you used.

Veedrac Apr 30, 2025, 5:30 PM
13 points
2
on: Veedrac’s Shortform
I saw a recentish post challenging people to state a clear AI xrisk argument and was surprised at how poorly formed the arguments in the comments were despite the issues getting called out. So, if you’re like apparently most of LessWrong, here’s what I consider the primary reduced argument, copied with slight edits from an HN post I made a couple years ago:
It is plausible that future systems achieve superhuman capability; capable systems necessarily have instrumental goals; instrumental goals tend to converge; human preferences are unlikely to be preserved when other goals are heavily selected for unless intentionally preserved; we don’t know how to make AI systems encode any complex preference robustly.
I should note that having a direct argument doesn’t mean other arguments like statistical precedent, analogy to evolution, or even intuition aren’t useful. It is however good mental hygiene to track when you have short reasoning chains that don’t rely on getting analogies right, since analogies are hard^[1].
1. ^
  Complete sidenote but I find this link fascinating. I wrote ‘analogies are hard’ thinking there there ought to be a Sequences post for that, not that there is. The post I found is somehow all the more convincing for the point I was making with how Yudkowsky messes up the discussion of neural networks. Were I the kind of person to write LessWrong posts rather than just imagine what they might be if I did, a better Analogies are hard would be one of the first.
What links here?
- Anthropomorphizing AI might be good, actually by Seth Herd (May 1, 2025, 1:50 PM; 35 points)

Veedrac Apr 25, 2025, 11:58 PM
2 points
4
in reply to: less-wronger-numb89’s comment on: less-wronger-numb89′s Shortform
Ultimately you have to make a bet on your guesses of reality. If your modal guess is civilizational collapse in 2-3 years, skipping uni is hardly a disproportionate action, but at the same time it’s not going to win you much either. Personally I’d leave the uni-or-not decision to the plausible worlds where the choice matters more, and look for some higher leverage change you can make for the rest.

Veedrac Apr 16, 2025, 8:38 AM
6 points
0
in reply to: Lucius Bushnaq’s comment on: johnswentworth’s Shortform
Consider, in support: Netflix has a $418B market cap. It is inconsistent to think that a $300B valuation for OpenAI or whatever’s in the news requires replacing tens of trillions of dollars of capital before the end of the decade.
Similarly, for people wanting to argue from the other direction, who might think a low current valuation is case-closed evidence against their success chances, consider that just a year ago the same argument would have discredited how they are valued today, and a year before that would have discredited where they were a year ago, and so forth. This holds similarly for historic busts in other companies. Investor sentiment is informational but clearly isn’t definitive, else stocks would never change rapidly.

Veedrac Apr 15, 2025, 8:37 PM
1 point
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
That’s how I interpreted it originally; you were arguing their product org vibed fake, I was arguing your vibes were miscalibrated. I’m not sure what to say to this that I didn’t say originally.

Veedrac Apr 15, 2025, 8:11 PM
2 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
But most of your criticisms in the point you gave have ~no bearing on that? If you want to make a point about how effectively OpenAI’s research moves towards AGI you should be saying things relevant to that, not giving general malaise about their business model.
Or, I might understand ‘their business model is fake which implies a lack of competence about them broadly,’ but then I go back to the whole ‘10% of people in the entire world’ and ‘expects 12B revenue’ thing.

Veedrac Apr 15, 2025, 8:04 PM
4 points
2
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Your very first point is, to be a little uncharitable, ‘maybe OpenAI’s whole product org is fake.’ I know you have a disclaimer here but you’re talking about a product category that didn’t exist 30 months ago that today has this one website now reportedly used by 10% of people in the entire world and that the internet is saying expects ~12B revenue this year.
If your vibes are towards investing in that class of thing being fake or ‘mostly a hype machine’ then your vibes are simply not calibrated well in this domain.