Pay Risk Evaluators in Cash, Not Equity
Personally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute basics right; currently, I think we’re mostly failing even at that.
Early discussion of AI risk often focused on debating the viability of various elaborate safety schemes humanity might someday devise—designing AI systems to be more like “tools” than “agents,” for example, or as purely question-answering oracles locked within some kryptonite-style box. These debates feel a bit quaint now, as AI companies race to release agentic models they barely understand directly onto the internet.
But a far more basic failure, from my perspective, is that at present nearly all AI company staff—including those tasked with deciding whether new models are safe to build and release—are paid substantially in equity, the value of which seems likely to decline if their employers stop building and releasing new models.
As a result, it is currently the case that roughly everyone within these companies charged with sounding the alarm risks personally losing huge sums of money if they do. This extreme conflict of interest could be avoided simply by compensating risk evaluators in cash instead.
I raised a similar proposal to various people a while ago.
The strongest objection I’m aware of is something like:
A key question is how much of the problem comes from evaluators within the company sounding the alarm versus the rest of the company (and the world) not respecting this alarm. And of course, how much the conflict of interest (and tribal-ish affiliation with the company) will alter the judgment of evaluators in a problematic way.
If I was confident that all risk evaluators were incorruptible, high integrity, total altruists (to the world or something even more cosmopolitan), then I think the case for them getting compensated normally with equity is pretty good. This would allow them to perform a costly signal later and would have other benefits. Though perhaps the costly signal is less strong if it is clear these people don’t care about money (but this itself might lend some credibility).
Given my understanding of the actual situation at AI companies, I think AI companies should ideally pay risk evaluators in cash.
I have more complex views about the situation at each of the different major AI labs and what is highest priority for ensuring good risk evaluations.
This situation seems backwards to me. Like, presumably the ideal scenario is that a risk evaluator estimates the risk in an objective way, and then the company takes (hopefully predefined) actions based on that estimate. The outcome of this interaction should not depend on social cues like how loyal they seem, or how personally costly it was for them to communicate that information. To the extent it does, I think this is evidence that the risk management framework is broken.
I agree with all of this, but don’t expect to live in an ideal world with a non-broken risk management framework and we’re making decisions on that margin.
I also think predefined actions is somewhat tricky to get right even in a pretty ideal set up, but I agree you can get reasonably close.
Note that I don’t necessary endorse the arguments I quoted (this is just the strongest objection I’m aware of) and as a bottom line, I think you should pay risk evaluators in cash.
Why not pay them nothing until they are shown to be correct with sufficiently credible proof, and then pay out a huge prize?
This could exclude competent evaluators without other income—this isn’t Dath Ilan, where a bank could evaluate evaluators and front them money at interest rates that depended on their probability of finding important risks—and their shortage of liquidity could provide a lever for distortion of their incentives.
On Earth, if someone’s working for you, and you’re not giving them a salary commensurate with the task, there’s a good chance they are getting compensation in other ways (some of which might be contrary to your goals).
Your third paragraph mentions “all AI company staff” and the last refers to “risk evaluators” (i.e. “everyone within these companies charged with sounding the alarm”). Are these groups roughly the same or is the latter subgroup significantly smaller?
I personally think it’s most important to have at least some technical employees who have the knowledge/expertise to evaluate the actual situation, and who also have the power to do something about it. I’d want this to include some people who’s primary job is more like “board member” and some people who’s primary job is more like “alignment and/or capabilities researcher.”
But there is a sense in which I’d feel way more comfortable if all technical AI employees (alignment and capabilities), and policy / strategy people, didn’t have profit equity, so they didn’t have inceptive to optimize against what was safe. So, there’s just a lot of eyes on the problem, and the overall egregore steering the company has one fewer cognitive distortion to manage.
This might be too expensive (OpenAI and Anthropic have a lot of money, but, like, that doesn’t mean they can just double everyone’s salary)
An idea that occurs to me is construct “windfall equity”, which only pays out if the company or world generates AI that is safely, massively improving the world.
I think the latter group is is much smaller. I’m not sure who exactly has most influence over risk evaluation, but the most obvious examples are company leadership and safety staff/red-teamers. From what I hear, even those currently receive equity (which seems corroborated by job listings, e.g. Anthropic, DeepMind, OpenAI).
It’s a good point that this is an extreme conflict of interest. But isn’t one obvious reason why these companies design their incentive structures this way because they don’t have the cash lying around to “simply” compensate people in cash instead?
E.g. here’s ChatGPT:
It would indeed be good if AI safety research was paid for in cash, not equity. But doesn’t this problem just reduce to there being far less interest in (and cash for) safety research relative to capability research than we would need for a good outcome?
For companies that are doing well, money isn’t a hard constraint. Founders would rather pay in equity because it’s cheaper than cash[1], but they can sell additional equity and pay cash if they really want to.
Because they usually give their employees a bad deal.
This is not obvioulsly true for the large AI labs, which pay their mid-level engineers/researchers something like 800-900k/year with ~2/3 of that being equity. If you have a thousand such employees, that’s an extra $600m/year in cash. It’s true that in practice the equity often ends up getting sold for cash later by the employees themselves (e.g. in tender offers/secondaries), but paying in equity is sort of like deferring the sale of that equity for cash. (Which also lets you bake in assumptions about growth in the value of that equity, etc...)
But only a small fraction work on evaluations, so the increased cost is much smaller than you make out
Oh, that’s true, I sort of lost track of the broader context of the thread. Though then the company needs to very clearly define who’s responsible for doing the risk evals, and making go/no-go/etc calls based on their results… and how much input do they accept from other employees?
I think this is important to define anyway! (and likely pretty obvious). This would create a lot more friction for someone to take on such a role though, or move out
It would be expensive, but it’s not a hard constraint. OpenAI could almost certainly raise another $600M per year if they wanted to (they’re allegedly already losing $5B per year now).
Also the post only suggests this pay structure for a subset of employees.
Why do you call current AI models “agentic”? It seems to me they are more like tool AI or oracle AI...
I just meant there are many teams racing to build more agentic models. I agree current ones aren’t very agentic, though whether that’s because they’re meaningfully more like “tools” or just still too stupid to do agency well or something else entirely, feels like an open question to me; I think our language here (like our understanding) remains confused and ill-defined.
I do think current systems are very unlike oracles though, in that they have far more opportunity to exert influence than the prototypical imagined oracle design—e.g., most have I/O with ~any browser (or human) anywhere, people are actively experimenting with hooking them up to robotic effectors, etc.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
I think you’re right that the incentive structure around AI safety is important for getting those doing the work to do it as well as they can. I think there might be something to be said for the suggestion of moving to a cash payment over equity but think that needs a lot more development.
For instnace, if everyone is paid up front for the work they are doing to protect the world from some AI takeover in the future, then they are no longer tied to that future in terms of their current state. That might not produce any better results than equity that could decline in value in the future.
Ultimately the goal has to be that those able to and doing the work have a pretty tight personal interests stance on future state of AI. It might even be the case that such a research effort alignment is only loosely connected to compensation.
Additionally, as you note, it’s not entirely those working to limit a bad outcome for humans in general from AGI but also what the incentives are for the companies as a whole. Here I think the discussion regarding AI liabilities and insurance might matter more. Which also opens up a whole question about corporate law. Years ago, pre 1930s, banking law used to hold the bankers liable for twice the losses from bank failures to make them better at risk management with other peoples’ money. That seems to have been a special case that didn’t apply to other businesses even if they were largely owned by outsiders. Perhaps making those who are in conrtol of AI development and deplyment, or are largely the ones financing the efforts, personally responsibile might be a better incentive structure.
All of these are difficult to work though to get a good, and fair, structure in place. I don’t think any one approach will ultimately be the solution but all or some combination of them might be. But I also think it’s a given that the risk will always remain. So figuring out just what level of risk as acceptable is also needed, and problematic in its own way.