Orpheus16

Karma: 6,681

Orpheus16 Jul 1, 2025, 12:15 AM
3 points
0
on: RTFB: The RAISE Act
Again, while I have concerns that the bill is insufficient strong, I think all of this is a very good thing. I strongly support the bill.
Suppose you magically gained a moderate amount of Political Will points and you can spend them on 1-2 things that would make the bill stronger (or introduce a separate bill– no need to anchor too much on the current RAISE vibe.)
What do you think are the 1-2 things you’d change about RAISE or the 1-2 extra things you’d push for?

Orpheus16 Jul 1, 2025, 12:04 AM
2 points
0
in reply to: Zach Stein-Perlman’s comment on: Substack and Other Blog Recommendations
I would be excited about someone doing a blog on what the companies are doing RE AI policy (including comms that are relevant to policy or directed at policymakers.)
I suspect good posts from such a blog would be shared reasonably frequently among tech policy staffers in DC.
(Not necessarily saying this needs to be you).

Orpheus16 Jun 24, 2025, 2:24 AM
4 points
0
on: Comparing risk from internally-deployed AI to insider and outsider threats from humans
First, when I talk to security staff at AI companies about computer security, they often seem to fail to anticipate what insider threat from AIs will be like.
Why do you think this? Is it that they are not thinking about large numbers of automated agents running around doing a bunch of research?
Or is it that they are thinking about these kinds of scenarios, and yet they still don’t apply the insider threat frame for some reason?

Orpheus16 Jun 20, 2025, 10:11 PM
2 points
0
in reply to: Ryan Kidd’s comment on: Ryan Kidd’s Shortform
My understanding is that AGI policy is pretty wide open under Trump. I don’t think he and most of his close advisors have entrenched views on the topic.
If AGI is developed in this Admin (or we approach it in this Admin), I suspect there is a lot of EV on the table for folks who are able to explain core concepts/threat models/arguments to Trump administration officials.
There are some promising signs of this so far. Publicly, Vance has engaged with AI2027. Non-publicly, I think there is a lot more engagement/curiosity than many readers might expect.
This isn’t to say “everything is great and the USG is super on track to figure out AGI policy” but it’s more to say “I think people should keep an open mind– even people who disagree with the Trump Admin on mainstream topics should remember that AGI policy is a weird/niche/new topic where lots of people do not have strong/entrenched/static positions (and even those who do have a position may change their mind as new events unfold.)”

Orpheus16 Jun 19, 2025, 3:44 AM
9 points
5
in reply to: Ryan Kidd’s comment on: Ryan Kidd’s Shortform
There are definitely still benefits to doing alignment research, but this only justifies the idea that doing alignment research is better than doing nothing.
IMO the thing that matters (for an individual making decisions about what to do with their career) is something more like “on the margin, would it be better to have one additional person do AI governance or alignment/control?”
I happen to think that given the current allocation of talent, on-the-margin it’s generally better for people to choose AI policy. (Particularly efforts to contribute technical expertise or technical understanding/awareness to governments, think-tanks interfacing with governments, etc.) There is a lot of demand in the policy community for these skills/perspectives and few people who can provide them. In contrast, technical expertise is much more common at the major AI companies (though perhaps some specific technical skills or perspectives on alignment are neglected.)
In other words, my stance is something like “by default, anon technical person would have more expected impact in AI policy unless they seem like an unusually good fit for alignment or an unusually bad fit for policy.”

Orpheus16 Jun 8, 2025, 9:35 PM
11 points
0
on: Akash’s Shortform
There’s a video version of AI2027 that is quite engaging/accessible. Over 1.5M views so far.
Seems great. My main critique is that the “good ending” seems to assume alignment is rather easy to figure out, though admittedly that might be more of a critique of AI2027 itself rather than the way the video portrays it.

Orpheus16 May 28, 2025, 2:52 PM
33 points
5
on: What We Learned from Briefing 70+ Lawmakers on the Threat from AI
This is fantastic work. There’s also something about this post that feels deeply empathic and humble, in ways that are hard-to-articulate but seem important for (some forms of) effective policymaker engagement.
A few questions:
- Are you planning to do any of this in the US?
- What have your main policy proposals or “solutions” been? I think it’s becoming a lot more common for me to encounter policymakers who understand the problem (at least a bit) and are more confused about what kinds of solutions/interventions/proposals are needed (both in the short-term and the long-term).
- Can you say more about what kinds of questions you encounter when describing loss of control, as well as what kinds of answers have been most helpful? I’m increasingly of the belief that getting people to understand “AI has big risks” is less important than getting people to understand “some of the most significant risks come from this unique thing called loss of control that you basically don’t really have to think about for other technologies, and this is one of the most critical ways in which AI is different than other major/dangerous/dual-use technologies.”
- Did you notice any major differences between parties? Did you change your approach based on whether you were talking to conservatives or labour? Did they have different perspectives or questions? (My own view is that people on the outside probably overestimate the extent to which there are partisan splits on these concerns—they’re so novel that I don’t think the mainstream parties have really entrenched themselves in different positions. But would be curious if you disagree.)
  - Sub-question: Was there any sort of backlash against Rishi Sunak’s focus on existential risks? Or the UK AI Security Institute? In the US, it’s somewhat common for Republicans to assume that things Biden did were bad (and for Democrats to assume that things Trump does is bad). Have you noticed anything similar?

Orpheus16 May 26, 2025, 4:29 PM
12 points
12
in reply to: Josh You’s comment on: We’re Not Advertising Enough (Post 3 of 6 on AI Governance)
I think we should be careful not to overestimate the success of AI2027. “Vance has engaged with your work” is an impressive feat, but it’s still relatively far away from something like “Vance and others in the Admin have taken your work seriously enough to start to meaningfully change their actions or priorities based on it.” (That bar is very high, but my impression is that the AI2027 folks would be like “yea, that’s what would need to happen in order to steer toward meaningfully better futures.”)
My impression is that AI2027 will have (even) more success if it is accompanied by an ambitious policymaker outreach effort (e.g., lots of 1-1 meetings with relevant policymakers and staffers, writing specific pieces of legislation or EOs and forming a coalition around those ideas, publishing short FAQ memos that address misconceptions or objections they are hearing in their meetings with policymakers, etc.)
This isn’t to say that research is unnecessary—much of the success of AI2027 comes from Daniel (and others on the team) having dedicated much of their lives to research and deep understanding. There are plenty of Government Relations people who are decent at “general policy engagement” but will fail to provide useful answers when staffers ask things like “But why won’t we just code in the goals we want?”, or “But don’t you think the real thing here is about how quickly we diffuse the technology?”, or “Why don’t you think existing laws will work to prevent this?” or a whole host of other questions.
But on the margin, I would probably have Daniel/AI2027 spend more time on policymaker outreach and less time on additional research (especially now that AI2027 is done). There is some degree of influence one can have with the “write something that is thoroughly researched and hope it spreads organically” effort, and I think AI2027 has essentially saturated that. For additional influence, I expect it will be useful for Daniel (or other competent communicators on his team) to advance to “get really good at having meetings with the ~100-1000 most important people, understanding their worldviews, going back and forth with them, understanding their ideological or political constraints, and finding solutions/ideas/arguments that are tailored to these particular individuals.” This is still a very intellectual task in some ways, but it involves a lot more “having meetings” and “forming models of social/political reality” than the classic “sit in your room with a whiteboard and understand technical reality” stuff that we typically associate with research.

Orpheus16 May 15, 2025, 12:37 AM
32 points
8
on: Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
Note that IFP (a DC-based think tank) recently had someone deliver 535 copies of their new book to every US Congressional office.
Note also that my impression is that DC people (even staffers) are much less “online” than tech audiences. Whether or not you copy IFP, I would suggest thinking about in-person distribution opportunities for DC.

Orpheus16 May 4, 2025, 7:06 PM
7 points
0
in reply to: habryka’s comment on: RA x ControlAI video: What if AI just keeps getting smarter?
I think there are organizations that themselves would be more likely to be robustly trustworthy and would be more fine to link to
I would be curious for your thoughts on which organizations you feel are robustly trustworthy.
Bonus points for a list that is kind of a weighted sum of “robustly trustworthy” and “having a meaningful impact RE improving public/policymaker understanding”. (Adding this in because I suspect that it’s easier to maintain “robustly trustworthy” status if one simply chooses not to do a lot of externally-focused comms, so it’s particularly impressive to have the combination of “doing lots of useful comms/policy work” and “managing to stay precise/accurate/trustworthy”).

Orpheus16 May 4, 2025, 6:59 PM
11 points
0
on: AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions
I appreciate the articulation and assessment of various strategies. My comment will focus on a specific angle that I notice both in the report and in the broader ecosystem:
I think there has been a conflating of “catastrophic risks” and “extinction/existential risks” recently, especially among groups that are trying to influence policy. This is somewhat understandable– the difference between “catastrophic” and “existential” is not that big of a deal in most people’s minds. But in some contexts, I think it misses the fact that “existential [and thus by definition irreversible]” is actually a very different level of risk compared to “catastrophic [but something that we would be able to recover from.]”
This view seems to be (implicitly) expressed in the report summary, most notably the chart. It seems to me like the main frame is something like “if you want to avoid an unacceptable chance of catastrophic risk, all of these other options are bad.”
But not all of these catastrophic risks are the same, I think this is actually quite an important consideration, and I think even (some) policymakers would/will see this as an essential consideration as AGI becomes more salient.
Specifically, “war” and “misuse” seem very different than “extinction” or “total and irreversible civilizational collapse.”
- “War” is broad enough to encompass many outcomes (ranging from “conflict with <1M deaths” to “nuclear conflict in which civilization recovers” all the way to “nuclear conflict in which civilization does not recover.”) Note also that many natsec leaders already think the chance of a war between the US and China is at a level that would probably meet an intuitive bar for “unacceptable.” (I don’t have actual statistics on this but my guess is that >10% chance of war in the next decade is not an uncommon view. One plausible pathway that is discussed often is China invading Taiwan and US being committed to its defense).
- “Misuse” can refer to many different kinds of events (including $1B in damages from a cyberattack, 10M deaths, 1B deaths, or complete human extinction.) These are, of course, very different in terms of their overall impact, even though all of them are intuitively/emotionally stored as “very bad things that we would ideally avoid.”
It seems plausible to me that we will be in situations in which policymakers have to make tricky trade-offs between these different sources of risk, and my hope is that the community of people concerned about AI can distinguish between the different “levels” or “magnitudes” of different types of risks.
(My impression is that MIRI agrees with this, so this is more a comment on how the summary was presented & more a general note of caution to the ecosystem as a whole. I also suspect that the distinction between “catastrophic” and “existential/civilization-ending” will become increasingly more important as the AI conversation becomes more interlinked with the national security apparatus.)
Caveat: I have not read the full report and this comment is mostly inspired by the summary, the chart, and a general sense that many organizations other than MIRI are also engaging in this kind of conflation.

Orpheus16 Apr 10, 2025, 12:30 PM
4 points
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
I feel this way and generally think that on-the-margin we have too much forecasting and not enough “build plans for what to do if there is a sudden shift in political will” or “just directly engage with policymakers and help them understand things not via longform writing but via conversations/meetings.”
Many details will be ~impossible to predict and many details will not matter much (i.e., will not be action-relevant for the stakeholders who have the potential to meaningfully affect the current race to AGI).
That’s not to say forecasting is always unhelpful. Things like AI2027 can certainly move discussions forward and perhaps get new folks interested. But EG, my biggest critique of AI2027 is that I suspect they’re spending too much time/effort on detailed longform forecasting and too little effort on arranging meetings with Important Stakeholders, developing a strong presence in DC, forming policy recommendations, and related activities. (And TBC I respect/admire the AI2027 team, have relayed this feedback to them, and imagine they have thoughtful reasons for taking the approach they’re taking.)

Orpheus16 Apr 5, 2025, 3:56 PM
17 points
0
in reply to: Buck’s comment on: Buck’s Shortform
What do you think are the most important points that weren’t publicly discussed before?

Orpheus16 Jan 7, 2025, 11:06 PM
10 points
0
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views).
@ryan_greenblatt can you say more about what you expect to happen from the period in-between “AI 10Xes AI R&D” and “AI takeover is very plausible?”
I’m particularly interested in getting a sense of what sorts of things will be visible to the USG and the public during this period. Would be curious for your takes on how much of this stays relatively private/internal (e.g., only a handful of well-connected SF people know how good the systems are) vs. obvious/public/visible (e.g., the majority of the media-consuming American public is aware of the fact that AI research has been mostly automated) or somewhere in-between (e.g., most DC tech policy staffers know this but most non-tech people are not aware.)

Orpheus16 Jan 2, 2025, 8:35 PM
LW: 61 AF: 27
22
AF
on: What’s the short timeline plan?
Big fan of this post. One thing worth highlighting IMO: The post assumes that governments will not react in time, so it’s mostly up to the labs (and researchers who can influence the labs) to figure out how to make this go well.
TBC, I think it’s a plausible and reasonable assumption to make. But I think this assumption ends up meaning that “the plan” excludes a lot of the work that could make the USG (a) more likely to get involved or (b) more likely to do good and useful things conditional on them deciding to get involved.
Here’s an alternative frame: I would call the plan described in Marius’s post something like the “short timelines plan assuming that governments do not get involved and assuming that technical tools (namely control/AI-automated AI R&D) are the only/main tools we can use to achieve good outcomes.”
You could imagine an alternative plan described as something like the “short timelines plan assuming that technical tools in the current AGI development race/paradigm are not sufficient and governance tools (namely getting the USG to provide considerably more oversight into AGI development, curb race dynamics, make major improvements to security) are the only/main tools we can use to achieve good outcomes.” This kind of plan would involve a very different focus.
Here are some examples of things that I think would be featured in a “government-focused” short timelines plan:
- Demos of dangerous capabilities
- Explanations of misalignment risks to senior policymakers. Identifying specific people who would be best-suited to provide those explanations, having those people practice giving explanations and addressing counterarguments, etc.
- Plans for what the “trailing labs” should do if the leading lab appears to have an insurmountable lead (e.g., OpenAI develops a model that is automating AI R&D. It becomes clear that DeepMind and Anthropic are substantially behind OpenAI. At this point, do the labs merge and assist? Do they try to do a big, coordinated, costly push to get governments to take AI risks more seriously?)
- Emergency preparedness– getting governments to be more likely to detect and appropriately respond to time-sensitive risks.
- Preparing plans for what to do if governments become considerably more concerned about risks (e.g., preparing concrete Manhattan Project or CERN-for-AI style proposals, identifying and developing verification methods for domestic or international AI regulation.)
One possible counter is that under short timelines, the USG is super unlikely to get involved. Personally, I think we should have a lot of uncertainty RE how the USG will react. Examples of factors here: (a) new Administration, (b) uncertainty over whether AI will produce real-world incidents, (c) uncertainty over how compelling demos will be, (d) chatGPT being an illustrative example of a big increase in USG involvement that lots of folks didn’t see coming, and (e) examples of the USG suddenly becoming a lot more interested in a national security domain (e.g., 9/11--> Patriot Act, recent Tik Tok ban), (f) Trump being generally harder to predict than most Presidents (e.g., more likely to form opinions for himself, less likely to trust the establishment views in some cases).
(And just to be clear, this isn’t really a critique of Marius’s post. I think it’s great for people to be thinking about what the “plan” should be if the USG doesn’t react in time. Separately, I’d be excited for people to write more about what the short timelines “plan” should look like under different assumptions about USG involvement.)

Orpheus16 Dec 31, 2024, 8:05 PM
6 points
4
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
At first glance, I don’t see how the point I raised is affected by the distinction between expert-level AIs vs earlier AIs.

In both cases, you could expect an important part of the story to be “what are the comparative strengths and weaknesses of this AI system.”

For example, suppose you have an AI system that dominates human experts at every single relevant domain of cognition. It still seems like there’s a big difference between “system that is 10% better at every relevant domain of cognition” and “system that is 300% better at domain X and only 10% better at domain Y.”

To make it less abstract, one might suspect that by the time we have AI that is 10% better than humans at “conceptual/serial” stuff, the same AI system is 1000% better at “speed/parallel” stuff. And this would have pretty big implications for what kind of AI R&D ends up happening (even if we condition on only focusing on systems that dominate experts in every relevant domain.)

Orpheus16 Dec 31, 2024, 6:21 PM
5 points
0
in reply to: Buck’s comment on: Buck’s Shortform
Models that don’t even cause safety problems, and aren’t even goal-directedly misaligned, but that fail to live up to their potential, thus failing to provide us with the benefits we were hoping to get when we trained them. For example, sycophantic myopic reward hacking models that can’t be made to do useful research.
Would this kind of model present any risk? Could a lab just say “oh darn, this thing isn’t very useful– let’s turn this off and develop a new model”?

Orpheus16 Dec 31, 2024, 6:15 PM
14 points
1
in reply to: Matthew Barnett’s comment on: Matthew Barnett’s Shortform
Do you have any suggestions RE alternative (more precise) terms? Or do you think it’s more of a situation where authors should use the existing terms but make sure to define them in the context of their own work? (e.g., “In this paper, when I use the term AGI, I am referring to a system that [insert description of the capabilities of the system.])

Orpheus16 Dec 31, 2024, 6:10 PM
7 points
2
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
The point I make here is also likely obvious to many, but I wonder if the “X human equivalents” frame often implicitly assumes that GPT-N will be like having X humans. But if we expect AIs to have comparative advantages (and disadvantages), then this picture might miss some important factors.
The “human equivalents” frame seems most accurate in worlds where the capability profile of an AI looks pretty similar to the capability profile of humans. That is, getting GPT-6 to do AI R&D is basically “the same as” getting X humans to do AI R&D. It thinks in fairly similar ways and has fairly similar strengths/weaknesses.
The frame is less accurate in worlds where AI is really good at some things and really bad at other things. In this case, if you try to estimate the # of human equivalents that GPT-6 gets you, the result might be misleading or incomplete. A lot of fuzzier things will affect the picture.
The example I’ve seen discussed most is whether or not we expect certain kinds of R&D to be bottlenecked by “running lots of experiments” or “thinking deeply and having core conceptual insights.” My impression is that one reason why some MIRI folks are pessimistic is that they expect capabilities research to be more easily automatable (AIs will be relatively good at running lots of ML experiments quickly, which helps capabilities more under their model) than alignment research (AIs will be relatively bad at thinking deeply or serially about certain topics, which is what you need for meaningful alignment progress under their model).
Perhaps more people should write about what kinds of tasks they expect GPT-X to be “relatively good at” or “relatively bad at”. Or perhaps that’s too hard to predict in advance. If so, it could still be good to write about how different “capability profiles” could allow certain kinds of tasks to be automated more quickly than others.
(I do think that the “human equivalents” frame is easier to model and seems like an overall fine simplification for various analyses.)

Orpheus16 Dec 30, 2024, 9:01 PM
23 points
5
in reply to: evhub’s comment on: evhub’s Shortform
I’m glad you’re doing this, and I support many of the ideas already suggested. Some additional ideas:
- Interview program. Work with USAISI or UKAISI (or DHS/NSA) to pilot an interview program in which officials can ask questions about AI capabilities, safety and security threats, and national security concerns. (If it’s not feasible to do this with a government entity yet, start a pilot with a non-government group– perhaps METR, Apollo, Palisade, or the new AI Futures Project.)
- Clear communication about RSP capability thresholds. I think the RSP could do a better job at outlining the kinds of capabilities that Anthropic is worried about and what sorts of thresholds would trigger a reaction. I think the OpenAI preparedness framework tables are a good example of this kind of clear/concise communication. It’s easy for a naive reader to quickly get a sense of “oh, this is the kind of capability that OpenAI is worried about.” (Clarification: I’m not suggesting that Anthropic should abandon the ASL approach or that OpenAI has necessarily identified the right capability thresholds. I’m saying that the tables are a good example of the kind of clarity I’m looking for– someone could skim this and easily get a sense of what thresholds OpenAI is tracking, and I think OpenAI’s PF currently achieves this much more than the Anthropic RSP.)
- Emergency protocols. Publishing an emergency protocol that specifies how Anthropic would react if it needed to quickly shut down a dangerous AI system. (See some specific prompts in the “AI developer emergency response protocol” section here). Some information can be redacted from a public version (I think it’s important to have a public version, though, partly to help government stakeholders understand how to handle emergency scenarios, partly to raise the standard for other labs, and partly to acquire feedback from external groups.)
- RSP surveys. Evaluate the extent to which Anthropic employees understand the RSP, their attitudes toward the RSP, and how the RSP affects their work. More on this here.
- More communication about Anthropic’s views about AI risks and AI policy. Some specific examples of hypothetical posts I’d love to see:
  - “How Anthropic thinks about misalignment risks”
  - “What the world should do if the alignment problem ends up being hard”
  - “How we plan to achieve state-proof security before AGI”
  - Encouraging more employees to share their views on various topics, EG Sam Bowman’s post.
- AI dialogues/debates. It would be interesting to see Anthropic employees have discussions/debates from other folks thinking about advanced AI. Hypothetical examples:
  - “What are the best things the US government should be doing to prepare for advanced AI” with Jack Clark and Daniel Kokotajlo.
  - “Should we have a CERN for AI?” with [someone from Anthropic] and Miles Brundage.
  - “How difficult should we expect alignment to be” with [someone from Anthropic] and [someone who expects alignment to be harder; perhaps Jeffrey Ladish or Malo Bourgon].
More ambitiously, I feel like I don’t really understand Anthropic’s plan for how to manage race dynamics in worlds where alignment ends up being “hard enough to require a lot more than RSPs and voluntary commitments.”
From a policy standpoint, several of the most interesting open questions seem to be along the lines of “under what circumstances should the USG get considerably more involved in overseeing certain kinds of AI development” and “conditional on the USG wanting to get way more involved, what are the best things for it to do?” It’s plausible that Anthropic is limited in how much work it could do on these kinds of questions (particularly in a public way). Nonetheless, it could be interesting to see Anthropic engage more with questions like the ones Miles raises here.