But when I hear Anthropic people (and most AI safety people) talk about AI welfare, the vibe is like it would be unacceptable to incur a [1% or 5% or so] risk of a deployment accidentally creating AI suffering worse than 10^[6 or 9 or so] suffering-filled human lives.
You have to be a pretty committed scope-sensitive consequentialist to disagree with this. What if they actually risked torturing 1M or 1B people? That seems terrible and unacceptable, and by assumption AI suffering is equivalent to human suffering. I think our societal norms are such that unacceptable things regularly become acceptable when the stakes are clear so you may not even lose much utility from this emphasis on avoiding suffering.
It seems perfectly compatible with good decision-making that there are criteria A and B, A is much more important and therefore prioritized over B, and 2 out of 19 sections are focused on B. The real question is whether the organization’s leadership is able to make difficult tradeoffs, reassessing and questioning requirements as new information comes in. For example, in the 1944 Norwegian sabotage of a Nazi German heavy water shipment, stopping the Nazi nuclear program was the first priority. The mission went ahead with reasonable effort to minimize casualties and 14 civilians died anyway, less than it could have been. It would not really have alarmed me to see a document discussing 19 efforts with 2 being avoidance of casualties, nor to know that the planners regularly talked with the vibe that 10-100 civilian casualties should be avoided, as long as someone had their eye on the ball.
Note that people who have a non-consequentialist aversion for risk of causing damage should have other problems with working for Anthropic. E.g. I suspect that Anthropic is responsible for more than a million deaths of currently-alive humans in expectation.
This is just the paralysis argument. (Maybe any sophisticated non-consequentialists will have to avoid this anyway. Maybe this shows that non-consequentialism is unappealing.)
[Edit after Buck’s reply: I think it’s weaker because most Anthropic employees aren’t causing the possible-deaths, just participating in a process that might cause deaths.]
Mostly from Anthropic building AIs that then kill billions of people while taking over, or their algorithmic secrets being stolen and leading to other people building AIs that then kill billions of people, or their model weights being stolen and leading to huge AI-enabled wars.
You have to be a pretty committed scope-sensitive consequentialist to disagree with this. What if they actually risked torturing 1M or 1B people? That seems terrible and unacceptable, and by assumption AI suffering is equivalent to human suffering. I think our societal norms are such that unacceptable things regularly become acceptable when the stakes are clear so you may not even lose much utility from this emphasis on avoiding suffering.
It seems perfectly compatible with good decision-making that there are criteria A and B, A is much more important and therefore prioritized over B, and 2 out of 19 sections are focused on B. The real question is whether the organization’s leadership is able to make difficult tradeoffs, reassessing and questioning requirements as new information comes in. For example, in the 1944 Norwegian sabotage of a Nazi German heavy water shipment, stopping the Nazi nuclear program was the first priority. The mission went ahead with reasonable effort to minimize casualties and 14 civilians died anyway, less than it could have been. It would not really have alarmed me to see a document discussing 19 efforts with 2 being avoidance of casualties, nor to know that the planners regularly talked with the vibe that 10-100 civilian casualties should be avoided, as long as someone had their eye on the ball.
Note that people who have a non-consequentialist aversion for risk of causing damage should have other problems with working for Anthropic. E.g. I suspect that Anthropic is responsible for more than a million deaths of currently-alive humans in expectation.
This is just the paralysis argument. (Maybe any sophisticated non-consequentialists will have to avoid this anyway. Maybe this shows that non-consequentialism is unappealing.)
[Edit after Buck’s reply: I think it’s weaker because most Anthropic employees aren’t causing the possible-deaths, just participating in a process that might cause deaths.]
I think it’s a bit stronger than the usual paralysis argument in this case, but yeah.
Can you elaborate on how the million deaths would result?
Mostly from Anthropic building AIs that then kill billions of people while taking over, or their algorithmic secrets being stolen and leading to other people building AIs that then kill billions of people, or their model weights being stolen and leading to huge AI-enabled wars.