I raised a similar proposal to various people a while ago.
The strongest objection I’m aware of is something like:
You actually want evaluators to have as much skin in the game as other employees so that when they take actions that might shut the company down or notably reduce the value of equity, this is a costly signal.
Further, it’s good if evaluators are just considered normal employees and aren’t separated out in any specific way. Then, other employees at the company will consider these evaluators to be part of their tribe and will feel some loyalty. (Also reducing the chance that risk evaluators feel like they are part of rival “AI safety” tribe.) This probably has a variety of benefits in terms of support from the company. For example, when evaluators make a decision with is very costly for the company, it is more likely to respected by other employees.
A key question is how much of the problem comes from evaluators within the company sounding the alarm versus the rest of the company (and the world) not respecting this alarm. And of course, how much the conflict of interest (and tribal-ish affiliation with the company) will alter the judgment of evaluators in a problematic way.
If I was confident that all risk evaluators were incorruptible, high integrity, total altruists (to the world or something even more cosmopolitan), then I think the case for them getting compensated normally with equity is pretty good. This would allow them to perform a costly signal later and would have other benefits. Though perhaps the costly signal is less strong if it is clear these people don’t care about money (but this itself might lend some credibility).
Given my understanding of the actual situation at AI companies, I think AI companies should ideally pay risk evaluators in cash.
I have more complex views about the situation at each of the different major AI labs and what is highest priority for ensuring good risk evaluations.
You actually want evaluators to have as much skin in the game as other employees so that when they take actions that might shut the company down or notably reduce the value of equity, this is a costly signal.
Further, it’s good if evaluators are just considered normal employees and aren’t separated out in any specific way. Then, other employees at the company will consider these evaluators to be part of their tribe and will feel some loyalty. (Also reducing the chance that risk evaluators feel like they are part of rival “AI safety” tribe.) This probably has a variety of benefits in terms of support from the company. For example, when evaluators make a decision with is very costly for the company, it is more likely to respected by other employees.
This situation seems backwards to me. Like, presumably the ideal scenario is that a risk evaluator estimates the risk in an objective way, and then the company takes (hopefully predefined) actions based on that estimate. The outcome of this interaction should not depend on social cues like how loyal they seem, or how personally costly it was for them to communicate that information. To the extent it does, I think this is evidence that the risk management framework is broken.
Like, presumably the ideal scenario is that a risk evaluator estimates the risk in an objective way, and then the company takes (hopefully predefined) actions based on that estimate. The outcome of this interaction should not depend on social cues like how loyal they seem, or how personally costly it was for them to communicate that information. To the extent it does, I think this is evidence that the risk management framework is broken.
I agree with all of this, but don’t expect to live in an ideal world with a non-broken risk management framework and we’re making decisions on that margin.
I also think predefined actions is somewhat tricky to get right even in a pretty ideal set up, but I agree you can get reasonably close.
Note that I don’t necessary endorse the arguments I quoted (this is just the strongest objection I’m aware of) and as a bottom line, I think you should pay risk evaluators in cash.
This could exclude competent evaluators without other income—this isn’t Dath Ilan, where a bank could evaluate evaluators and front them money at interest rates that depended on their probability of finding important risks—and their shortage of liquidity could provide a lever for distortion of their incentives.
On Earth, if someone’s working for you, and you’re not giving them a salary commensurate with the task, there’s a good chance they are getting compensation in other ways (some of which might be contrary to your goals).
I raised a similar proposal to various people a while ago.
The strongest objection I’m aware of is something like:
A key question is how much of the problem comes from evaluators within the company sounding the alarm versus the rest of the company (and the world) not respecting this alarm. And of course, how much the conflict of interest (and tribal-ish affiliation with the company) will alter the judgment of evaluators in a problematic way.
If I was confident that all risk evaluators were incorruptible, high integrity, total altruists (to the world or something even more cosmopolitan), then I think the case for them getting compensated normally with equity is pretty good. This would allow them to perform a costly signal later and would have other benefits. Though perhaps the costly signal is less strong if it is clear these people don’t care about money (but this itself might lend some credibility).
Given my understanding of the actual situation at AI companies, I think AI companies should ideally pay risk evaluators in cash.
I have more complex views about the situation at each of the different major AI labs and what is highest priority for ensuring good risk evaluations.
This situation seems backwards to me. Like, presumably the ideal scenario is that a risk evaluator estimates the risk in an objective way, and then the company takes (hopefully predefined) actions based on that estimate. The outcome of this interaction should not depend on social cues like how loyal they seem, or how personally costly it was for them to communicate that information. To the extent it does, I think this is evidence that the risk management framework is broken.
I agree with all of this, but don’t expect to live in an ideal world with a non-broken risk management framework and we’re making decisions on that margin.
I also think predefined actions is somewhat tricky to get right even in a pretty ideal set up, but I agree you can get reasonably close.
Note that I don’t necessary endorse the arguments I quoted (this is just the strongest objection I’m aware of) and as a bottom line, I think you should pay risk evaluators in cash.
Why not pay them nothing until they are shown to be correct with sufficiently credible proof, and then pay out a huge prize?
This could exclude competent evaluators without other income—this isn’t Dath Ilan, where a bank could evaluate evaluators and front them money at interest rates that depended on their probability of finding important risks—and their shortage of liquidity could provide a lever for distortion of their incentives.
On Earth, if someone’s working for you, and you’re not giving them a salary commensurate with the task, there’s a good chance they are getting compensation in other ways (some of which might be contrary to your goals).