Adam Scholl

Karma: 3,670

Adam Scholl Jul 2, 2025, 11:47 PM
13 points
9
in reply to: Ben Pace’s comment on: A case for courage, when speaking of AI danger
For two, a person who has done evil, versus a person who is evil, are quite different things. I think that it’s sadly not always the case that a person’s character is aligned with a particular behavior of theirs.
I do think many of the historical people most widely considered to be evil now were similarly not awful in full generality, or even across most contexts. For example, Eichmann, the ops lead for the Holocaust, was apparently a good husband and father, and generally took care not to violate local norms in his life or work. Yet personally I feel quite comfortable describing him as evil, despite “evil” being a fuzzy folk term of the sort which tends to imperfectly/lossily describe any given referent.

Adam Scholl Jun 5, 2025, 9:50 PM
13 points
3
in reply to: Alexander Gietelink Oldenziel’s comment on: “Flaky breakthroughs” pervade coaching — but no one tracks them
I share the sense that “flaky breakthroughs” are common, but also… I mean, it clearly is possible for people to learn and improve, right? Including by learning things about themselves which lastingly affect their behavior.
Personally, I’ve had many such updates which have had lasting effects—e.g., noticing when reading the Sequences that I’d been accidentally conflating “trying as hard as I can” with “appearing to others to be trying as hard as one might reasonably be expected to” in some cases, and trying thereafter to correct for that.
I do think it’s worth tracking the flaky breakthrough issue—which seems to me most common with breakthroughs primarily about emotional processing, or the experience of quite-new-feeling sorts of mental state, or something like that?—but it also seems worth tracking that people can in fact sometimes improve!

Adam Scholl May 15, 2025, 1:54 AM
21 points
20
in reply to: Matthew Barnett’s comment on: AI Doomerism in 1879
I think the word “technical” is a red herring here. If someone tells me a flood is coming, I don’t much care how much they know about hydrodynamics, even if in principle this knowledge might allow me to model the threat with more confidence. Rather, I care about things like e.g. how sure they are about the direction from which the flood is coming, about the topography of our surroundings, etc. Personally, I expect I’d be much more inclined to make large/confident updates on the basis of information at levels of abstraction like these, than at levels about e.g. hydrodynamics or particle physics or so forth, however much more “technical,” or related-in-principle in some abstract reductionist sense, the latter may be.
I do think there are also many arguments beyond this simple one which clearly justify additional (and more confident) concern. But I try to assess such arguments based on how compelling they are, where “technical precision” is one, but hardly the only factor which might influence this; e.g., another is whether the argument even involves the relevant level of abstraction, or bears on the question at hand.

Adam Scholl May 15, 2025, 1:01 AM
27 points
19
in reply to: Matthew Barnett’s comment on: AI Doomerism in 1879
I think the simple argument “building minds vastly smarter than our own seems dangerous” is in fact pretty compelling, and seems relatively easy to realize beforehand, as e.g. Turing and many others did. Personally, there are not any technical facts about current ML systems which update me more overall either way about our likelihood of survival than this simple argument does.
And I see little reason why they should—technical details of current AI systems strike me as around as relevant to predicting whether future, vastly more intelligent systems will care about us as do e.g. technical details about neuronal firing in beetles about whether a given modern government will care about us. Certainly modern governments wouldn’t exist if neurons hadn’t evolved, and I expect one could in fact probably gather some information relevant to predicting them by studying beetle neurons; maybe even a lot, in principle. It just seems a rather inefficient approach, given how distant the object of study is from the relevant question.

Adam Scholl Apr 24, 2025, 10:00 PM
7 points
5
in reply to: kave’s comment on: Putting up Bumpers
I interpreted Habryka’s comment as making two points, one of which strikes me as true and important (that it seems hard/unlikely for this approach to allow for pivoting adequately, should that be needed), and the other of which was a misunderstanding (that they don’t literally say they hope to pivot if needed).

Adam Scholl Apr 24, 2025, 9:43 PM
9 points
5
on: Putting up Bumpers
We believe that, even without further breakthroughs, this work can almost entirely mitigate the risk that we unwittingly put misaligned circa-human-expert-level agents in a position where they can cause severe harm.
Sam, I’m confused where this degree of confidence is coming from? I found this post helpful for understanding Anthropic’s strategy, but there wasn’t much argument given about why one should expect the strategy to work, much less to “almost entirely” mitigate the risk!
To me, this seems wildly overconfident given the quality of the available evidence—which, as Aysja notes, involves auditing techniques like e.g. simply asking the models themselves to rate their evil-ness on a scale from 1-10… I can kind of understand evidence like this informing your background intuitions and choice of research bets and so forth, but why think it justifies this much confidence you’ll catch/fix misalignment?

Adam Scholl Mar 14, 2025, 5:26 AM
2 points
−2
in reply to: Raemon’s comment on: Anthropic, and taking “technical philosophy” more seriously
Yeah, I buy that he cares about misuse. But I wouldn’t quite use the word “believe,” personally, about his acting as though alignment is easy—I think if he had actual models or arguments suggesting that, he probably would have mentioned them by now.

Adam Scholl Mar 14, 2025, 5:16 AM
2 points
0
in reply to: Raemon’s comment on: Anthropic, and taking “technical philosophy” more seriously
No, I agree it’s worth arguing the object level. I just disagree that Dario seems to be “reasonably earnestly trying to do good things,” and I think this object-level consideration seems relevant (e.g., insofar as you take Anthropic’s safety strategy to rely on the good judgement of their staff).

Adam Scholl Mar 14, 2025, 2:17 AM
16 points
0
on: Anthropic, and taking “technical philosophy” more seriously
Dario/Anthropic-leadership are at least reasonably earnestly trying to do good things within their worldview
I think as stated this is probably true of the large majority of people, including e.g. the large majority of the most historically harmful people. “Worldviews” sometimes reflect underlying beliefs that lead people to choose actions, but they can of course also be formed post-hoc, to justify whatever choices they wished to make.
In some cases, one can gain evidence about which sort of “worldview” a person has, e.g. by checking it for coherency. But this isn’t really possible to do with Dario’s views on alignment, since to my knowledge, excepting the Concrete Problems paper he has actually not ever written anything about the alignment problem.^[1] Given this, I think it’s reasonable to guess that he does not have a coherent set of views which he’s neglected to mention, so much as the more human-typical “set of post-hoc justifications.”
(In contrast, he discusses misuse regularly—and ~invariably changes the subject from alignment to misuse in interviews—in a way which does strike me as reflecting some non-trivial cognition).
1. ^
  Counterexamples welcome! I’ve searched a good bit and could not find anything, but it’s possible I missed something.

Adam Scholl Feb 22, 2025, 8:09 PM
22 points
11
in reply to: Jan_Kulveit’s comment on: Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
I spent some time learning about neural coding once, and while interesting it sure didn’t help me e.g. better predict my girlfriend; I think in general neuroscience is fairly unhelpful for understanding psychology. For similar reasons, I’m default-skeptical of claims that work on the level of abstraction of ML is likely to help with figuring out whether powerful systems trained via ML are trying to screw us, or with preventing that.

Adam Scholl Feb 12, 2025, 9:22 PM
8 points
10
in reply to: Alexander Gietelink Oldenziel’s comment on: Eli’s shortform feed
I haven’t perceived the degree of focus as intense, and if I had I might be tempted to level similar criticism. But I think current people/companies do clearly matter some, so warrant some focus. For example:
- I think it’s plausible that governments will be inclined to regulate AI companies more like “tech startups” than “private citizens building WMDs,” the more those companies strike them as “responsible,” earnestly trying their best, etc. In which case, it seems plausibly helpful to propagate information about how hard they are in fact trying, and how good their best is.
  - So far, I think many researchers who care non-trivially about alignment—and who might have been capable of helping, in nearby worlds—have for similar reasons been persuaded to join whatever AI company currently has the most safetywashed brand instead. This used to be OpenAI, is now Anthropic, and may be some other company in the future, but it seems useful to me to discuss the details of current examples regardless, in the hope that e.g. alignment discourse becomes better calibrated about how much to expect such hopes will yield.
- There may exist some worlds where it’s possible to get alignment right, yet also possible not to, depending on the choices of the people involved. For example, you might imagine that good enough solutions—with low enough alignment taxes—do eventually exist, but that not all AI companies would even take the time to implement those.
  - Alternatively, you might imagine that some people who come to control powerful AI truly don’t care whether humanity survives, or are even explicitly trying to destroy it. I think such people are fairly common—both in the general population (relevant if e.g. powerful AI is open sourced), and also among folks currently involved with AI (e.g. Sutton, Page, Schmidhuber). Which seems useful to discuss, since e.g. one constraint on our survival is that those who actively wish to kill everyone somehow remain unable to do so.

Adam Scholl Feb 3, 2025, 7:31 AM
22 points
19
in reply to: Zac Hatfield-Dodds’s comment on: Mikhail Samin’s Shortform
When do you think would be a good time to lock in regulation? I personally doubt RSP-style regulation would even help, but the notion that now is too soon/risks locking in early sketches, strikes me as in some tension with e.g. Anthropic trying to automate AI research ASAP, Dario expecting ASL-4 systems between 2025—the current year!—and 2028, etc.

Adam Scholl Feb 2, 2025, 12:12 PM
64 points
43
in reply to: Nathan Helm-Burger’s comment on: Mikhail Samin’s Shortform
Give me your model, with numbers, that shows supporting Anthropic to be a bad bet, or admit you are confused and that you don’t actually have good advice to give anyone.
It seems to me that other possibilities exist, besides “has model with numbers” or “confused.” For example, that there are relevant ethical considerations here which are hard to crisply, quantitatively operationalize!
One such consideration which feels especially salient to me is the heuristic that before doing things, one should ideally try to imagine how people would react, upon learning what you did. In this case the action in question involves creating new minds vastly smarter than any person, which pose double-digit risk of killing everyone on Earth, so my guess is that the reaction would entail things like e.g. literal worldwide riots. If so, this strikes me as the sort of consideration one should generally weight more highly than their idiosyncratic utilitarian BOTEC.

Adam Scholl Jan 29, 2025, 7:44 PM
26 points
17
in reply to: Nathan Helm-Burger’s comment on: Ten people on the inside
The only safety techniques that count are the ones that actually get deployed in time.
True, but note this doesn’t necessarily imply trying to maximize your impact in the mean timelines world! Alignment plans vary hugely in potential usefulness, so I think it can pretty easily be the case that your highest EV bet would only pay off in a minority of possible futures.

Adam Scholl Nov 20, 2024, 9:02 AM
24 points
2
on: What are the good rationality films?
Prelude to Power is my favorite depiction of scientific discovery. Unlike any other such film I’ve seen, it adequately demonstrates the inquiry from the perspective of the inquirer, rather than from conceptual or biographical retrospect.

Adam Scholl Nov 15, 2024, 1:27 PM
LW: 2 AF: 1
0
AF
on: Untrusted smart models and trusted dumb models
I’m curious if “trusted” in this sense basically just means “aligned”—or like, the superset of that which also includes “unaligned yet too dumb to cause harm” and “unaligned yet prevented from causing harm”—or whether you mean something more specific? E.g., are you imagining that some powerful unconstrained systems are trusted yet unaligned, or vice versa?

Adam Scholl Nov 9, 2024, 11:41 AM
19 points
11
in reply to: habryka’s comment on: evhub’s Shortform
I would guess it does somewhat exacerbate risk. I think it’s unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there’s more room for things to go wrong, the more that training is for lethality/adversariality.
Given the state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-glance arguments about risk—like e.g., “all else equal, probably we should expect more killing people-type problems from models trained to kill people”—without decently strong countervailing arguments.

Adam Scholl Nov 6, 2024, 11:43 PM
94 points
6
on: adam_scholl’s Shortform
It seems the pro-Trump Polymarket whale may have had a real edge after all. Wall Street Journal reports (paywalled link, screenshot) that he’s a former professional trader, who commissioned his own polls from a major polling firm using an alternate methodology—the neighbor method, i.e. asking respondents who they expect their neighbors will vote for—he thought would be less biased by preference falsification.
I didn’t bet against him, though I strongly considered it; feeling glad this morning that I didn’t.

Adam Scholl Nov 2, 2024, 8:39 PM
4 points
0
in reply to: DanielFilan’s comment on: JargonBot Beta Test
Thanks; it makes sense that use cases like these would benefit, I just rarely have similar ones when thinking or writing.

Adam Scholl Nov 2, 2024, 8:31 AM
2 points
0
in reply to: DanielFilan’s comment on: JargonBot Beta Test
I also use them rarely, fwiw. Maybe I’m missing some more productive use, but I’ve experimented a decent amount and have yet to find a way to make regular use even neutral (much less helpful) for my thinking or writing.