MichaelDickens

Karma: 1,018

MichaelDickens May 17, 2025, 1:45 AM
8 points
2
on: Focus on the places where you feel shocked everyone’s dropping the ball

In the version of this mental motion I’m proposing here, you keep your eye out for ways that everyone’s being totally inept and incompetent, ways that maybe you could just do the job correctly if you reached in there and mucked around yourself.

I would go even further than that—you don’t have to think you can do the job correctly. Some of my most fruitful projects started with me thinking something like

Everyone is dropping the ball on this. I really don’t think I can do the job correctly, but someone should at least give it a shot, so I might as well do it.

MichaelDickens May 17, 2025, 12:15 AM
5 points
2
in reply to: StefanHex’s comment on: StefanHex’s Shortform
An untested hypothesis:

LLMs are fundamentally text predictors. There are many high-probability replies to “Tell me a funny joke”, so you wouldn’t necessarily expect them all to tell the same one. But perhaps, somewhere in the training data, someone published their conversation with an LLM in which they said “Tell me a funny joke” and it replied with the joke about atoms. Next-gen LLMs learn from this training data that if an LLM is asked to tell a joke, the probability-maximizing answer is that particular joke. So now they all start telling the same joke.

MichaelDickens May 14, 2025, 8:05 PM
15 points
1
on: MichaelDickens’s Shortform
What can ordinary people do to reduce AI risk? People who don’t have expertise in AI research / decision theory / policy / etc.

Some ideas:
- Donate to orgs that are working to AI risk (which ones, though?)
- Write letters to policy-makers expressing your concerns
- Be public about your concerns. Normalize caring about x-risk

MichaelDickens May 10, 2025, 4:01 PM
3 points
0
in reply to: Kaj_Sotala’s comment on: o3 Is a Lying Liar
I was just thinking about this, and it seems to imply something about AI consciousness so I want to hear if you have any thoughts on this:

If LLM output is the LLM roleplaying an AI assistant, that suggests that anything it says about its own consciousness is not evidence about its consciousness. Because any statement the LLM produces isn’t actually a statement about its own consciousness, it’s a statement about the AI assistant that it’s roleplaying as.

Counterpoint: The LLM is, in a way, roleplaying as itself, so statements about its consciousness might be self-describing.

MichaelDickens May 10, 2025, 2:08 PM
1 point
0
in reply to: Neel Nanda’s comment on: Please Donate to CAIP (Post 1 of 3 on AI Governance)
Thank you! Good to hear your perspective

MichaelDickens May 10, 2025, 2:32 AM
8 points
9
in reply to: MichaelDickens’s comment on: Please Donate to CAIP (Post 1 of 3 on AI Governance)
I was wondering why this comment had a disproportionately large disagree-score but then I saw it was only one vote. Still, that leaves me wondering why that one person disagreed so strongly.

Obviously you’re allowed to vote without commenting, it’s just nice to understand how I might be making a mistake when I make a significant decision. Perhaps there’s a perspective I’m missing.

MichaelDickens May 8, 2025, 7:29 PM
10 points
−8
in reply to: MichaelDickens’s comment on: Please Donate to CAIP (Post 1 of 3 on AI Governance)
Update: Having slept on it, I just donated $10,000.

MichaelDickens May 8, 2025, 1:56 AM
36 points
1
on: Please Donate to CAIP (Post 1 of 3 on AI Governance)
Update: Having now slept on it, I just donated $10,000.

I’m considering donating $10,000 to CAIP. I am not going to decide anything until tomorrow because I want to sleep on it, but I’ll write up my thoughts now.

I wrote about CAIP in 2024 and it was one of my top donation candidates. I haven’t re-assessed the AI safety landscape since then, but my current best guess is that CAIP is the single best use of marginal funds.

My basic thinking:

Positives
- I believe international treaties + US regulations are the best strategies for keeping AI safe.
  - Even if they’re not the best thing to do, they’re good and we should be trying to make them happen.
- I agree with OP that having good US regulations increases the chance of getting international treaties.
- A critical step in getting legislation passed is writing and advocating for model legislation.
- To my knowledge, the only orgs that have written legislation are CAIS and CAIP. It’s important that these two orgs continue to exist.
- There are a handful of other orgs that do political outreach on AI x-risk, and I like most of them too, but none of them are as funding-constrained as CAIP.
Potential concerns

My list of concerns is more wordy than my list of positives, Lest that gives readers the wrong idea, I would like to note that I think the positives are much stronger than the negatives.
- Why aren’t more institutional donors funding CAIP?
  - My best guess is they’re (overly) averse to supporting political action, especially because CAIP is open about its concern for x-risk, and I think many donors are timid about pushing the Overton window. That’s just a guess, I don’t want to claim to be able to read donors’ minds.
  - There could be some externally-illegible reason why CAIP is actually ineffective or counterproductive. CAIP looks quite competent as far as I can tell, but I know little about politics.
- Why didn’t CAIP write this post until they had 30 days of runway left?
  - But I’d expect they’d try to fundraise from institutional donors first and only write this post after that failed.
- If I donate, CAIP might shut down anyway, and then my donation didn’t achieve anything.
  - This is less of a concern if I donate through Manifund because the donation only goes through if the fundraiser reaches its minimum goal.
  - The fact that CAIP can’t downscale means it might have continual difficulty raising funding.
  - I think donations still have high expected value because there’s a small chance that my donation will buy CAIP just enough time to get the fundraising momentum it needs, and then in some sense I can claim 100% credit for everything CAIP does after that.
  - My current thinking is that I should donate regardless, but I would feel better about it if there were some mechanism (like what Manifund has) where I only pay if CAIP gets enough donation commitments to sustain itself.
Some personal considerations
- I wasn’t planning on donating for another 4–6 months, but this is a special situation so I may adjust my plans.
- I may donate more than $10,000, but I’ll need to review my financial situation first.
  - I can donate $10,000 immediately out of my bank account. I can donate more than that from my DAF, but the recipient must be a 501(c)(3). (So I could donate via Manifund.)

MichaelDickens May 8, 2025, 1:12 AM
6 points
3
in reply to: Austin Chen’s comment on: Please Donate to CAIP (Post 1 of 3 on AI Governance)
As an aside, I’ve noticed you routinely go out of your way to help charities get funding when they’re in difficult situations, which I think is great for the ecosystem and I’m glad you do it.

MichaelDickens May 7, 2025, 11:23 PM
2 points
0
on: Please Donate to CAIP (Post 1 of 3 on AI Governance)
How do Manifund donations interact with your 501(c)(4) status? Are donations through Manifund just as good as unrestricted donations, or are you more constrained on how you can use that money?

MichaelDickens May 5, 2025, 5:02 AM
1 point
0
in reply to: Severin T. Seehrich’s comment on: Overview: AI Safety Outreach Grassroots Orgs
As I understand it, PauseAI Global aka PauseAI supports protests in most regions, whereas US-based protests are run by PauseAI US which is a separate group of people.

MichaelDickens May 4, 2025, 6:50 PM
3 points
0
on: Overview: AI Safety Outreach Grassroots Orgs
PauseAI US is a separate entity from PauseAI so I believe it should also be listed.

MichaelDickens Apr 30, 2025, 9:20 PM
1 point
0
in reply to: Richard_Kennaway’s comment on: Feedback loops for exercise (VO2Max)
Not OP but I think Functional Threshold Power is fine. I don’t know of any literature directly comparing it to VO2max, but much of the literature on VO2max didn’t actually measure VO2max, it used proxies like “maximum gradient at which a participant can walk for 3 minutes” (called the Balke treadmill test. When meta-analyses report that VO2max strongly predicts health outcomes, what they usually* mean is “VO2max, and also various proxies for VO2max, when thrown together into a meta-analysis, strongly predict health outcomes”. So as far as I can tell from what (little) research I’ve looked at, there are a lot of metrics that work and it’s not clear which ones work better than others. And FTP seems like as good a measure as any.

For example, have a look at Table 2 in Impact of Cardiorespiratory Fitness on All-Cause and Disease-Specific Mortality: Advances Since 2009, which gives a list of studies and what measure each study used. You can see that they used a variety of fitness metrics.

*I’ve only actually looked at two meta-analyses

MichaelDickens Apr 29, 2025, 1:29 AM
8 points
3
in reply to: Knight Lee’s comment on: “The Urgency of Interpretability” (Dario Amodei)
I think there is some hope but I don’t really know how to do it. I think if their behavior was considered sufficiently shameful according to their ingroup then they would stop. But their ingroup specifically selects for people who think they are doing the right thing.

I have some small hope that they can be convinced by good arguments, although if that were true, surely they would’ve already been convinced by now? Perhaps they are simply not aware of the arguments for why what they’re doing is bad?

MichaelDickens Apr 28, 2025, 3:05 PM
2 points
2
in reply to: 1a3orn’s comment on: “The Urgency of Interpretability” (Dario Amodei)
The question under discussion was: Is Anthropic “quite in sync with the AI x-risk community”? If it’s taking unilateral actions that are unpopular with the AI x-risk community, then it’s not in sync.

MichaelDickens Apr 28, 2025, 2:11 PM
1 point
0
in reply to: Davidmanheim’s comment on: “The Urgency of Interpretability” (Dario Amodei)

prosaic alignment is clearly not scalable to the types of systems they are actively planning to build

Why do you believe this?

(FWIW I think it’s foolish that all (?) frontier companies are all-in on prosaic alignment, but I am not convinced that it “clearly” won’t work.)

MichaelDickens Apr 28, 2025, 2:08 PM
10 points
2
in reply to: Knight Lee’s comment on: “The Urgency of Interpretability” (Dario Amodei)
Just my personal opinion:

My sense is that Anthropic is somewhat more safety-focused than the other frontier AI companies, in that most of the companies only care maybe 10% as much about safety as they should, and Anthropic cares 15% as much as it should.

What numbers would you give to these labs?

My median guess is that if an average company is −100 per dollar then Anthropic is −75. I believe Anthropic is making things worse on net by pushing more competition, but an Anthropic-controlled ASI is a bit less likely to kill everyone than an ASI controlled by anyone else.

But I also have significant (< 50%) probability on Anthropic being the worst company in terms of actual consequences because its larger-but-still-insufficient focus on safety may create a false sense of security that ends up preventing good regulations from being implemented.

You may also be interested in SaferAI’s risk management ratings.

I used to think Anthropic was [...] quite in sync with the AI x-risk community.

I think Anthropic leadership respects the x-risk community in their words but not in their actions. Anthropic says safety is important, and invests a decent amount into safety research; but also opposes coordination, supports arms races, and has no objection to taking unilateral actions that are unpopular in the x-risk community (and among the general public for that matter).

MichaelDickens Apr 26, 2025, 12:34 AM
1 point
0
in reply to: Kaj_Sotala’s comment on: o3 Is a Lying Liar
Huh. I knew that’s how ChatGPT worked but I had assumed they would’ve worked out a less hacky solution by now!

MichaelDickens Apr 25, 2025, 10:20 PM
1 point
0
in reply to: ryan_greenblatt’s comment on: Why would AI companies use human-level AI to do alignment research?
I wrote several attempts at a reply and deleted them all because none of them were cruxes for me. I went for a walk and thought more deeply about my cruxes.

I am now back from my walk. Here is what I have determined:

No reply I could write would be cruxy because my original post is not cruxy with respect to my personal behavior.

I believe the correct thing for me to do is to advocate for slowing down AI development, and to donate to orgs that cost-effectively advocate for slowing down AI development. And my post is basically irrelevant to why I believe that.

So why did I write the post? When I wrote it, I wasn’t thinking about cruxes. It was just an argument I had been thinking about that I’d never read before, and I thought someone ought to write it out.

And I’m not sure exactly who this post is a crux for. Perhaps if someone had a particular combination of beliefs about
1. the probability that slowing down AI development will work
2. the probability that bootstrapped alignment will work
where they’re teetering on the edge between “slowing down AI development is good” and “slowing down AI development is bad because it prevents bootstrapped alignment from happening”. My argument might shift that person from the second position to the first. I don’t know if any such person exists.

This is most relevant to slowing down AI development at particular companies—say, if DeepMind slows down and gets significantly surpassed by Meta, then Meta will probably do something that’s even less likely to work than bootstrapped alignment. But a global coordinated slowdown—which is my preferred outcome—does not replace bootstrapped alignment with a worse alignment strategy.

Even though it’s not cruxy, I feel like I should give an object-level response to your comment:

I agree with the denotation of your comment because it is well-hedged—I agree that 5% of resources might be enough to solve alignment. But it probably won’t be.

I think my biggest concern isn’t that AI alignment has no scalable solutions (I agree with you that it probably does have them); my concern is more that alignment is likely to be too hard / get outpaced by capabilities and we will have ASI before alignment is solved.

We can in principle solve large fraction of safety/alignment with fully theoretical safety research without any compute while it seems harder to do purely theoretical capabilities research.

Not to say I disagree (my intuition is that theoretical approaches are underrated), but this contradicts AI companies’ plans (or at least Anthropic’s). Anthropic has claimed that they need to build frontier AI systems on which to do safety research. They seem to think they can’t solve alignment with theoretical approaches. More broadly, if they’re correct, then it seems to me (although it’s not a straightforward contradiction) that alignment bootstrapping won’t have significant advantages to scale because they will need increasing amounts of compute for alignment-related experiments.

FWIW I think your point is more reasonable than Anthropic’s position (I wrote some relevant stuff here). But I thought it was worthwhile to point out the contradiction.

MichaelDickens Apr 25, 2025, 5:54 PM
−1 points
−2
in reply to: Shankar Sivarajan’s comment on: MichaelDickens’s Shortform
I would not describe it as heroic. I think it’s approximately morally equivalent to choosing an 80% chance of making all Americans immortal (but not non-Americans) and a 20% chance of killing everyone in the world.

This is not a perfect analogy because the philosophical arguments for discounting future generations are stronger than the arguments for discounting non-Americans.

(Also my P(doom) is higher than 20%, that’s just an example)

MichaelDickens

Positives

Potential concerns

Some personal considerations