You actually want evaluators to have as much skin in the game as other employees so that when they take actions that might shut the company down or notably reduce the value of equity, this is a costly signal.
Further, it’s good if evaluators are just considered normal employees and aren’t separated out in any specific way. Then, other employees at the company will consider these evaluators to be part of their tribe and will feel some loyalty. (Also reducing the chance that risk evaluators feel like they are part of rival “AI safety” tribe.) This probably has a variety of benefits in terms of support from the company. For example, when evaluators make a decision with is very costly for the company, it is more likely to respected by other employees.
This situation seems backwards to me. Like, presumably the ideal scenario is that a risk evaluator estimates the risk in an objective way, and then the company takes (hopefully predefined) actions based on that estimate. The outcome of this interaction should not depend on social cues like how loyal they seem, or how personally costly it was for them to communicate that information. To the extent it does, I think this is evidence that the risk management framework is broken.
Like, presumably the ideal scenario is that a risk evaluator estimates the risk in an objective way, and then the company takes (hopefully predefined) actions based on that estimate. The outcome of this interaction should not depend on social cues like how loyal they seem, or how personally costly it was for them to communicate that information. To the extent it does, I think this is evidence that the risk management framework is broken.
I agree with all of this, but don’t expect to live in an ideal world with a non-broken risk management framework and we’re making decisions on that margin.
I also think predefined actions is somewhat tricky to get right even in a pretty ideal set up, but I agree you can get reasonably close.
Note that I don’t necessary endorse the arguments I quoted (this is just the strongest objection I’m aware of) and as a bottom line, I think you should pay risk evaluators in cash.
This situation seems backwards to me. Like, presumably the ideal scenario is that a risk evaluator estimates the risk in an objective way, and then the company takes (hopefully predefined) actions based on that estimate. The outcome of this interaction should not depend on social cues like how loyal they seem, or how personally costly it was for them to communicate that information. To the extent it does, I think this is evidence that the risk management framework is broken.
I agree with all of this, but don’t expect to live in an ideal world with a non-broken risk management framework and we’re making decisions on that margin.
I also think predefined actions is somewhat tricky to get right even in a pretty ideal set up, but I agree you can get reasonably close.
Note that I don’t necessary endorse the arguments I quoted (this is just the strongest objection I’m aware of) and as a bottom line, I think you should pay risk evaluators in cash.