I’m the chief scientist at Redwood Research.
ryan_greenblatt
This is now out.
I don’t think it’s worth adjudicating the question of how relevant Vanessa’s response is (though I do think Vannessa’s response is directly relevant).
if the AIs are aligned to the these structures, human disempowerment is likely because these structures are aligned to humans way less than they seem
My claim would be that if single-single alignment is solved, this problem won’t be existential. I agree that if you literally aligned all AIs to (e.g.) the mission of a non-profit as well as you can, you’re in trouble. However, if you have single-single alignment:
At the most basic level, I expect we’ll train AIs to give advice and ask them what they think will happen with various possible governance and alignmnent structures. If they think a goverance structure will yield total human disempowerment, we’ll do something else. This is a basic reason not to expect large classes of problems so long as we have single-single aligned AIs which are wise. (Though problems that require coordination to resolve might not be like this.) I’ve very skeptical of a world where single-single alignment is well described as being solved and people don’t ask for advice (or consider this advice seriously) because they never get around to asking AIs or there are no AIs aligned in such a way that they should try to give good advice.
I expect organizations will be explicitly controlled by people and (some of) those people will have AI representatives to represent their interests as I discuss here. If you think getting good AI representation is unlikely, that would be a crux, but this would be my proposed solution at least.
The explicit mission of for-profit companies is to empower the shareholders. It clearly doesn’t serve the interests of the shareholders to end up dead or disempowered.
Democratic governments have similar properties.
At a more basic level, I think people running organizations won’t decide “oh, we should put the AI in charge of running this organization aligned to some mission from the preferences of the people (like me) who currently have de facto or de jure power over this organization”. This is a crazily disempowering move that I expect people will by default be too savvy to make in almost all cases. (Both for people with substantial de facto and with de jure power.)
Even independent of the advice consideration, people will probably want AIs running organizations to be honest to at least the people controlling the organization. Given that I expect explicit control by people in almost all cases, if things are going in an existential direction, people can vote to change them in almost all cases.
I don’t buy that there will be some sort of existential multi-polar trap even without coordination (though I also expect coordintion) due to things like the strategy stealing assumption as I also discuss in that comment.
If a subset of organizations diverge from a reasonable interpretation of what they were supposed to do (but are still basically obeying the law and some interpretation of what they were intentionally aligned to) and this is clear to the rest of the world (as I expect would be the case given some type of AI advisors), then the rest of the world can avoid problems from this subset via the court system or other mechanisms. Even if this subset of organizations run by effectively rogue AIs just runs away with resources successfully, this is probably only a subset of resources.
I think your response to a lot of this will be something like:
People won’t have or won’t listen to AI advisors.
Institutions will intentionally delude relevant people to acquire more power.
But, the key thing is that I expect at least some people will keep power, even if large subsets are deluded. E.g., I expect that corporate shareholders, boardmembers, or government will be very interested in the questions of whether they will be disempowered by changes in structure. It does seem plausible (or even likely) to me that some people will engage in power grabs via ensuring AIs are aligned to them, deluding the world about what’s going on using a variety of mechanisms (including, e.g., denying or manipulating access to AI representation/advisors), and expanding their (hard) power over time. The thing I don’t buy is a case where very powerful people don’t ask for advice at all prior to having been deluded by the organizations that they themselves run!
I think human power grabs like this are concerning and there are a variety of plausible solutions which seem somewhat reasonable.
Maybe your response is that the solutions that will be implemented in practice given concerns about human power grabs will involve aligning AIs to institutions in ways that yield the dynamics you describe? I’m skeptical given the dynamics discussed above about asking AIs for advice.
I’ve updated over time to thinking that trusting AI systems and mostly handing things off to AI systems is an important dynamic to be tracking. As in, the notion of human researcher obsolescence discussed here. In particular, it seems important to understand what is required for this and when we should make this transition.
I previously thought that this handoff should be mostly beyond our planning horizon (for more prosaic researchers) because by the time of this handoff we will have already spent many years being radically (e.g., >10x) accelerated by AI systems, but I’ve updated against this due to updating toward shorter timelines, faster takeoff, and lower effort on safety (indicating less pausing). I also am just more into end-to-end planning with more of the later details figured out.
As I previously discussed here, I still think that you can do this with Paul-corrigible AGIs, including AGIs that you wouldn’t be happy trusting with building a utopia (at least building a utopia without letting you reflect and then consulting you about this). However, these AIs will be nearly totally uncontrolled in that you’ll be deferring to them nearly entirely. But, the aim is that they’ll ultimately give you the power they acquire back etc. I also think pretty straightforward training strategies could very plausibly work (at least if you develop countermeasures to relatively specific threat models), like studying and depending on generalization.
(I’ve also updated some toward various of the arguments in this post about vulnerability and first mover advantage, but I still disagree with the bottom line overall..)
Yeah, people at labs are generally not thoughtful about AI futurism IMO, though of course most people aren’t thoughtful about AI futurism. And labs don’t really have plans IMO. (TBC, I think careful futurism is hard, hard to check, and not clearly that useful given realistic levels of uncertainty.)
If you have any more related reading for the main “things might go OK” plan in your eyes, I’m all ears.
I don’t have a ready to go list. You might be interested in this post and comments responding to it, though I’d note I disagree substantially with the post.
Build concrete evidence of risk, to increase political will towards reducing misalignment risk
Working on demonstrating evidence of risk at an very irresponsible developer might be worse than you’d hope because they prevent you publishing or try to prevent this type of research from happening in the first place (given that it might be bad for their interests).
However, it’s also possible they’d be skeptical of research done elsewhere.
I’m not sure how to reconcile these considerations.
I think the objection is a good one that “if the AI was really aligned with one agent, it’d figure out a way to help them avoid multipolar traps”.
My reply is that I’m worried that avoiding races-to-the-bottom will continue to be hard, especially since competition operates on so many levels.
Part of the objection is in avoiding multipolar traps, but there is also a more basic story like:
Humans own capital/influence.
They use this influence to serve their own interests and have an (aligned) AI system which faithfully represents their interests.
Given that AIs can make high quality representation very cheap, the AI representation is very good and granular. Thus, something like the strategy-stealing assumption can hold and we might expect that humans end up with the same expected fraction of captial/influence they started with (at least to the extent they are interested in saving rather than consumption).
Even without any coordination, this can potentially work OK. There are objections to the strategy-stealing assumption, but none of these seem existential if we get to a point where everyone has wildly superintelligent and aligned AI representatives and we’ve ensured humans are physically robust to offense dominant technologies like bioweapons.
(I’m optimistic about being robust to bioweapons within a year or two of having wildly superhuman AIs, though we might run into huge issues during this transitional period… Regardless, bioweapons deployed by terrorists or as part of a power grab in a brief transitional period doesn’t seem like the threat model you’re describing.)
I expect some issues with races-to-the-bottom / negative sum dynamics / negative externalities like:
By default, increased industry on earth shortly after the creation of very powerful AI will result in boiling the oceans (via fusion power). If you don’t participate in this industry, you might be substantially outcompeted by others[1]. However, I don’t think it will be that expensive to protect humans through this period, especially if you’re willing to use strategies like converting people into emulated minds. Thus, this doesn’t seem at all likely to be literally existential. (I’m also optimistic about coordination here.)
There might be one time shifts in power between humans via mechanisms like states becoming more powerful. But, ultimately these states will be controlled by humans or appointed successors of humans if alignment isn’t an issue. Mechanisms like competing over the quantity of bribery are zero sum as they just change the distribution of power and this can be priced in as a one time shift even without coordination to race to the bottom on bribes.
But, this still doesn’t seem to cause issues with humans retaining control via their AI representatives? Perhaps the distribution of power between humans is problematic and may be extremely unequal and the biosphere will physically be mostly destroyed (though humans will survive), but I thought you were making stronger claims.
Edit in response to your edit: If we align the AI to some arbitrary target which is seriously misaligned with humanity as a whole (due to infighting or other issues), I agree this can cause existential problems.
(I think I should read the paper in more detail before engaging more than this!)
- ↩︎
It’s unclear if boiling the oceans would result in substantial acceleration. This depends on how quickly you can develop industry in space and dyson sphere style structures. I’d guess the speed up is much less than a year.
The paper says:
Christiano (2019) makes the case that sudden disempowerment is unlikely,
This isn’t accurate. The post What failure looks like includes a scenario involving sudden disempowerment!
The post does say:
The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.
I think this is probably not what failure will look like,
But, I think it is mostly arguing against threat models involving fast AI capability takeoff (where the level of capabilities take its creators and others by suprise and fast capabilities progress allows for AIs to suddenly become poweful enough to takeover) rather than threat models involving sudden disempowerment from a point where AIs are already well known to be extremely powerful.
I (remain) skeptical that the sort of failure mode described here is plausible if we solve the problem of aligning individual AI systems with their designers’ intentions without this alignment requiring any substantial additional costs (that is, we solve single-single alignment with minimal alignment tax).
This has previously been argued by Vanessa here and Paul here in response to a post making a similar claim.
I do worry about human power grabs: some humans obtaining greatly more power as enabled by AI (even if we have no serious alignment issues). However, I don’t think this matches the story you describe and the mitigations seem substantially different than what you seem to be imagining.
I’m also somewhat skeptical of the threat model you describe in the case where alignment isn’t solved. I think the difference between the story you tell and something more like We get what we measure is important.
I worry I’m misunderstanding something because I haven’t read the paper in detail.
It sounds like you entirely agree with the logic of the post, but wish that the start of the post mentioned something like what it says at the end:
Humans may live comfortably after the development of AGI, not due to high wages but from other income sources like investments, government welfare, and charity. The latter two sources seem especially promising if AI alignment ensures ongoing support for human well-being.
And perhaps you also wish that the post had an optimistic rather than neutral tone. (And that it more clearly emphasized that AGI would result in large amounts of wealth—at least for someone.)
Given this, it seems your claim is just that you’re quite optimistic about some mixture of “investments, government welfare, and charity” coming through to keep people alive and happy. (Given the assumption that (some) humans maintain control, there isn’t some other similar catastrophe, and the world is basically as it seems e.g. we’re not in a simulation.)
Correspondingly, it seems unaccurate to say:
Jobs and technology have a purpose: producing the good and services we need to live and thrive. If your model of the world includes the possibility that we would create the most advanced technology the world has ever seen, and the result would be mass starvation, then I think your model is fundamentally flawed.
and
it does so based on a combination of flawed logic and
If someone didn’t have wealth (perhaps due to expropriation) and no one gave them anything out of generosity, then because their labor has no value they could starve and I think you agree? I don’t think the statement “Jobs and technology have a purpose” means anything.
Historically, even if no one who owned capital or had power cared at all about the welfare of someone without capital, those with capital would often still be selfishly interested in employing that person. So, even if someone had all their assests taken from them, it would still selfishly make sense for a variety of people to trade with them such that they can survive. In cases where this is less true, human welfare often seems to suffer (see also resource curse). As you seemingly agree, with sufficient technological development, labor may become unimportant and if a person doesn’t have capital, no one with capital would selfishly be interested in trading with them such that they can remain alive.
In practice, extremely minimal amounts of charity could suffice (e.g., one person could pay to feed the entire world), so I’m personally optimistic about avoiding literal starvation. However, I do worry about the balance of power in a world where human labor is unimportant: democracy and other institutions seems potentially less stable in such a world.
(As far as concerns with institutions and power, see discussion in this recent paper about gradual disempowerment, though note I think I mostly disagree with this paper.)
It doesn’t seem very useful for them IMO.
My bet is that this isn’t an attempt to erode alignment, but is instead based on thinking that lumping together intentional bad actions with mistakes is a reasonable starting point for building safeguards. Then the report doesn’t distinguish between these due to a communication failure. It could also just be a more generic communication failure.
(I don’t know if I agree that lumping these together is a good starting point to start experimenting with safeguards, but it doesn’t seem crazy.)
15x compute multiplier relative to what? See also here.
I think Zac is trying to say they left not to protest, but instead because they didn’t think staying was viable (for whatever research and/or implementation they wanted to do).
On my views (not Zac’s), “staying wouldn’t be viable for someone who was willing to work in a potentially pretty unpleasant work environment and focus on implementation (and currently prepping for this implementation)” doesn’t seem like an accurate description of the situation. (See also Buck’s comment here.)
Relatedly, I think Buck far overestimates the influence and resources of safety-concerned staff in a ‘rushed unreasonable developer’.
As in, you don’t expect they’ll be able to implement stuff even if it doesn’t make anyone’s workflow harder or you don’t expect they’ll be able to get that much compute?
Naively, we might expect ~1% of compute as we might expect around 1000 researchers and 10/1000 is 1%. Buck said 3% because I argued for increasing this number. My case would be that there will be bunch of cases where the thing they want to do is obviously reasonable and potentially justifiable from multiple perspectives (do some monitoring of internal usage, fine-tune a model for forecasting/advice, use models to do safety research) such that they can pull somewhat more compute than just the head count would suggest.
Yes. But also, I’m afraid that Anthropic might solve this problem by just making less statements (which seems bad).
Making more statements would also be fine! I wouldn’t mind if there were just clarifying statements even if the original statement had some problems.
(To try to reduce the incentive for less statements, I criticized other labs for not having policies at all.)
I think I roughly stand behind my perspective in this dialogue. I feel somewhat more cynical than I did at the time I did this dialogue, perhaps partially due to actual updates from the world and partially because I was trying to argue for the optimistic case here which put me in a somewhat different frame.
Here are some ways my perspective differs now:
I wish I said something like: “AI companies probably won’t actually pause unilaterally, so the hope for voluntary RSPs has to be building consensus or helping to motivate developing countermeasures”. I don’t think I would have disagreed with this statement in the past, or at least I wouldn’t have fully disagreed with it and it seems like important context.
I think in practice, we’re unlikely to end up with specific tests that are defined in advance and aren’t goodhartable or cheatable. I do think that control could in principle be defined in advance and hard to goodhart using external evaluation, but I don’t expect companies to commit to specific tests which are hard to goodhart/cheat. They could make procedural commitments for third party review which are hard to cheat. Something like “this third party will review the available evidence (including our safety report and all applicable internal knowledge) and then make a public statement about the level of risk and whether there is important information which should be disclosed to the public” (I could outline this proposal in more detail, it’s mostly not my original idea.)
I’m somewhat more interested in companies focusing on things other than safety cases and commitments. Either trying to get evidence of risk that might be convincing to others (in worlds where these risks are large) or working on at-the-margin safety interventions from a cost benefit perspective.
I think the post could directly say “voluntary RSPs seem unlikely to suffice (and wouldn’t be pauses done right), but …”.
I agree it does emphasize the importance of regulation pretty strongly.
Part of my perspective is that the title implies a conclusion which isn’t quite right and so it would have been good (at least with the benefit of hindsight) to clarify this explicitly. At least to the extent you agree with me.
This post seems mostly reasonable in retrospect, except that it doesn’t specifically note that it seems unlikely that voluntary RSP commitments would result in AI companies unilaterally pausing until they were able to achieve broadly reasonable levels of safety. I wish the post more strongly emphasized that regulation was a key part of the picture—my view is that “voluntary RSPs are pauses done right” is wrong, but “RSPs via (international) regulation are pauses done right” seems like it could be roughly right. That said, I do think that purely voluntary RSPs are pretty reasonable and useful, at least if the relevant company is transparent about when they would proceed despite being unable to achieve a reasonable level of safety.
As of now at the start of 2025, I think we know more information that makes this plan looks worse.[1] I don’t see a likely path to ensuring 80% of companies have a reasonble RSP in short timelines. (For instance, not even Anthropic has expanded their RSP to include ASL-4 requirements about 1.5 years after the RSP came out.) And, beyond this, I think the current regulatory climate is such that we might not get RSPs enforced in durable regulation[2] applying to at least US companies in short timelines even if 80% of companies had good RSPs.
- ↩︎
I edited to add the first sentence of this paragraph for clarity.
- ↩︎
The EU AI act is the closest thing at the moment, but it might not be very durable as the EU doesn’t have that much leverage over tech companies. Also, it wouldn’t be very surprising if components of this end up being very unreasonable such that companies are basically forced to ignore parts of it or exit the EU market.
- ↩︎
Anthropic releasing their RSP was an important change in the AI safety landscape. The RSP was likely a substantial catalyst for policies like RSPs—which contain if-then commitments and more generally describe safety procedures—becoming more prominent. In particular, OpenAI now has a beta Preparedness Framework, Google DeepMind has a Frontier Safety Framework but there aren’t any concrete publicly-known policies yet, many companies agreed to the Seoul commitments which require making a similar policy, and SB-1047 required safety and security protocols.
However, I think the way Anthropic presented their RSP was misleading in practice (at least misleading to the AI safety community) in that it neither strictly requires pausing nor do I expect Anthropic to pause until they have sufficient safeguards in practice. I discuss why I think pausing until sufficient safeguards are in place is unlikely, at least in timelines as short as Dario’s (Dario Amodei is the CEO of Anthropic), in my earlier post.
I also have serious doubts about whether the LTBT will serve as a meaningful check to ensure Anthropic serves the interests of the public. The LTBT has seemingly done very little thus far—appointing only 1 board member despite being able to appoint 3⁄5 of the board members (a majority) and the LTBT is down to only 3 members. And none of its members have technical expertise related to AI. (The LTBT trustees seem altruistically motivated and seem like they would be thoughtful about questions about how to widely distribute benefits of AI, but this is different from being able to evaluate whether Anthropic is making good decisions with respect to AI safety.)
Additionally, in this article, Anthropic’s general counsel Brian Israel seemingly claims that the board probably couldn’t fire the CEO (currently Dario) if the board did this despite believing it would greatly reduce profits to shareholders[1]. Almost all of a board’s hard power comes from being able to fire the CEO, so if this claim were to be true, that would greatly undermine the ability of the board (and the LTBT which appoints the board) to ensure Anthropic, a public benefit corporation, serves the interests of the public in cases where this conflicts with shareholder interests. In practice, I think this claim by the general counsel of Anthropic is likely false and, because Anthropic is a public benefit corporation, the board could fire the CEO and win in court even if they openly thought this would massively reduce shareholder value (so long as the board could show they used a reasonable process to consider shareholder interests and decided that the public interest outweighed in this case). Regardless, Brian Israel making such claims is evidence the LTBT won’t provide a meaningful check on Anthropic in practice.
Misleading communication about the RSP
On the RSP, this post says:
On the one hand, the ASL system implicitly requires us to temporarily pause training of more powerful models if our AI scaling outstrips our ability to comply with the necessary safety procedures.
While I think this exact statement might be technically true, people have sometimes interpreted this quote and similar statements as a claim that Anthropic would pause until their safety measures sufficed for more powerful models. I think Anthropic isn’t likely to do this; in particular:
The RSP leaves open the option of revising it to reduce required countermeasures (so pausing is only required until the policy is changed).
This implies countermeasures would suffice for ensuring a reasonable level of safety, but given that commitments still haven’t been made for ASL-4 (the level at which existential or near-existential risks become plausible) and there aren’t clear procedural reasons to expect countermeasures to suffice to ensure a reasonable level of safety, I don’t think we should assume this will be the case.
Protections for ASL-3 are defined vaguely rather than having some sort of credible and independent risk analysis process (in addition to best guess countermeasures) and a requirement to ensure risk is sufficiently low with respect to this process. Perhaps ASL-4 requirements will differ; something more procedural seems particularly plausible as I don’t see a route to outlining specific tests in advance for ASL-4.
As mentioned earlier, I expect that if Anthropic ends up being able to build transformatively capable AI systems (as in, AI systems capable of obsoleting all human cognitive labor), they’ll fail to provide assurance of a reasonable level of safety. That said, it’s worth noting that insofar as Anthropic is actually a more responsible actor (as I currently tentatively think is the case), then from my perspective this choice is probably overall good—though I wish their communication was less misleading.
Anthropic and Anthropic employees often use similar language to this quote when describing the RSP, potentially contributing to a poor sense of what will happen. My impression is that lots of Anthropic employees just haven’t thought about this, and believe that Anthropic will behave much more cautiously than I think is plausible (and more cautiously than I think is prudent given other actors).
Other companies have worse policies and governance
While I focus on Anthropic in this comment, it is worth emphasizing that the policies and governance of other AI companies seem substantially worse. xAI, Meta and DeepSeek have no public safety policies at all, though they have said they will make a policy like this. Google DeepMind has published that they are working on making a frontier safety framework with commitments, but thus far they have just listed potential threat models corresponding to model capabilities and security levels without committing to security for specific capability levels. OpenAI has the beta preparedness framework, but the current security requirements seem inadequate and the required mitigations and assessment process for this is unspecified other than saying that the post-mitigation risk must be medium or below prior to deployment and high or below prior to continued development. I don’t expect OpenAI to keep the spirit of this commitment in short timelines. OpenAI, Google DeepMind, xAI, Meta, and DeepSeek all have clearly much worse governance than Anthropic.
What could Anthropic do to address my concerns?
Given these concerns about the RSP and the LTBT, what do I think should happen? First, I’ll outline some lower cost measures that seem relatively robust and then I’ll outline more expensive measures that don’t seem obviously good (at least not obviously good to strongly prioritize) but would be needed to make the situation no longer be problematic.
Lower cost measures:
Have the leadership clarify its views to Anthropic employees (at least alignment science employees) in terms of questions like: “How likely is Anthropic to achieve an absolutely low (e.g., 0.25%) lifetime level of risk (according to various third-party safety experts) if AIs that obsolete top human experts are created in the next 4 years?”, “Will Anthropic aim to have an RSP that would be the policy that a responsible developer would follow in a world with reasonable international safety practices?”, “How likely is Anthropic to exit from its RSP commitments if this is needed to be a competitive frontier model developer?”.
Clearly communicate to Anthropic employees (or at least a relevant subset of Anthropic employees) about in what circumstances the board could (and should) fire the CEO due to safety/public interest concerns. Additionally, explain the leadership’s policies with respect to cases where the board does fire the CEO—does the leadership of Anthropic commit to not fighting such an action?
Have an employee liaison to the LTBT who provides the LTBT with more information that isn’t filtered through the CEO or current board members. Ensure this employee is quite independent-minded, has expertise on AI safety (and ideally security), and ideally is employed by the LTBT rather than Anthropic.
Unfortunately, these measures aren’t straightforwardly independently verifiable based on public knowledge. As far as I know, some of these measures could already be in place.
More expensive measures:
In the above list, I explain various types of information that should be communicated to employees. Ensure that this information is communicated publicly including in relevant places like in the RSP.
Ensure the LTBT has 2 additional members with technical expertise in AI safety or minimally in security.
Ensure the LTBT appoints the board members it can currently appoint and that these board members are independent from the company and have their own well-formed views on AI safety.
Ensure the LTBT has an independent staff including technical safety experts, security experts, and independent lawyers.
Likely objections and my responses
Here are some relevant objections to my points and my responses:
Objection: “Sure, but from the perspective of most people, AI is unlikely to be existentially risky soon, so from this perspective it isn’t that misleading to think of deviating from safe practices as an edge case.” Response: To the extent Anthropic has views, Anthropic has the view that existentially risky AI is reasonably likely to be soon and Dario espouses this view. Further, I think this could be clarified in the text: the RSP could note that these commitments are what a responsible developer would do if we were in a world where being a responsible developer was possible while still being competitive (perhaps due to all relevant companies adopting such policies or due to regulation).
Objection: “Sure, but if other companies followed a similar policy then the RSP commitments would hold in a relatively straightforward way. It’s hardly Anthropic’s fault if other companies force it to be more reckless than it would like.” Response: This may be true, but it doesn’t mean that Anthropic isn’t being potentially misleading in their description of the situation. They could instead directly describe the situation in less misleading ways.
Objection: “Sure, but obviously Anthropic can’t accurately represent the situation publicly. That would result in bad PR and substantially undermine their business in other ways. To the extent you think Anthropic is a good actor, you shouldn’t be pressuring good actors like them to take actions that will make them differentially less competitive than worse actors.” Response: This is pretty fair, but I still think Anthropic could at least avoid making substantially misleading statements and ensure employees are well informed (at least for employees for whom this information is very relevant to their job decision-making). I think it is a good policy to correct misleading statements that result in differentially positive impressions and result in the safety community taking worse actions, because not having such a policy in general would result in more exploitation of the safety community.
- ↩︎
The article says: “However, even the board members who are selected by the LTBT owe fiduciary obligations to Anthropic’s stockholders, Israel says. This nuance means that the board members appointed by the LTBT could probably not pull off an action as drastic as the one taken by OpenAI’s board members last November. It’s one of the reasons Israel was so confidently able to say, when asked last Thanksgiving, that what happened at OpenAI could never happen at Anthropic. But it also means that the LTBT ultimately has a limited influence on the company: while it will eventually have the power to select and remove a majority of board members, those members will in practice face similar incentives to the rest of the board.” This indicates that the board couldn’t fire the CEO if they thought this would greatly reduce profits to shareholders though it is somewhat unclear.
(Note that I don’t work for Anthropic.)