I find it very difficult to agree to any generality without identifying some representative specifics. It feels way too much like I’m being asked to sign up for something without being told what. Relatedly, if there are zero specifics that you think fit the generalization well enough to be good examples, it seems very likely that the generalization itself is flawed.
One point of confusion in trying to generalize bad behavior (bad equilibrium is an explanation or cause, bad behavior is the actual problem) is that incentives aren’t exogenous—they’re created and perpetuated by actors, just like the behaviors we’re trying to change. One actor’s incentives are another actor’s behaviors.
I think all of this comes down to “many humans are not altruistic to the degree or on the dimensions I want”. I’ve long said that FAI is a sidetrack, if we don’t have any path to FNI (friendly natural intelligence).
FAI is a sidetrack, if we don’t have any path to FNI (friendly natural intelligence).
I don’t think I understand the reasoning behind this, though I don’t strongly disagree. Certainly it would be great to solve the “human alignment problem”. But what’s your claim?
If a bunch of fully self-interested people are about to be wiped out by an avoidable disaster (or even actively malicious people, who would like to hurt each other a little bit, but value self-preservation more), they’re still better off pooling their resources together to avert disaster.
You might have a prisoner’s dilemma / tragedy of the commons—it’s still even better if you can get everyone else to pool resources to avert disaster, while stepping aside yourself. BUT:
that’s more a coordination problem again, rather than an everyone-is-too-selfish problem
that’s not really the situation with AI, because what you have is more a situation where you can either work really hard to build AGI or work even harder to build safe AGI; it’s not a tragedy of the commons, it’s more like lemmings running off a cliff!
One point of confusion in trying to generalize bad behavior (bad equilibrium is an explanation or cause, bad behavior is the actual problem) is that incentives aren’t exogenous—they’re created and perpetuated by actors, just like the behaviors we’re trying to change. One actor’s incentives are another actor’s behaviors.
Yeah, the incentives will often be crafted perversely, which likely means that you can expect even more opposition to clear discussion, because there are powerful forces trying to coordinate on the wrong consensus about matters of fact in order to maintain plausible deniability about what they’re doing.
In the example being discussed here, it just seems like a lot of people coordinating on the easier route, partly due to momentum of older practices, partly because certain established people/institutions are somewhat threatened by the better practices.
I find it very difficult to agree to any generality without identifying some representative specifics. It feels way too much like I’m being asked to sign up for something without being told what. Relatedly, if there are zero specifics that you think fit the generalization well enough to be good examples, it seems very likely that the generalization itself is flawed.
My feeling is that small examples of the dynamic I’m pointing at come up fairly often, but things pretty reliably go poorly if I point them out, which has resulted in an aversion to pointing such things out.
The conversation has so much gravity toward blame and self-defense that it just can’t go anywhere else.
I’m not going to claim that this is a great post for communicating/educating/fixing anything. It’s a weird post.
I find it very difficult to agree to any generality without identifying some representative specifics. It feels way too much like I’m being asked to sign up for something without being told what. Relatedly, if there are zero specifics that you think fit the generalization well enough to be good examples, it seems very likely that the generalization itself is flawed.
One point of confusion in trying to generalize bad behavior (bad equilibrium is an explanation or cause, bad behavior is the actual problem) is that incentives aren’t exogenous—they’re created and perpetuated by actors, just like the behaviors we’re trying to change. One actor’s incentives are another actor’s behaviors.
I think all of this comes down to “many humans are not altruistic to the degree or on the dimensions I want”. I’ve long said that FAI is a sidetrack, if we don’t have any path to FNI (friendly natural intelligence).
I don’t think I understand the reasoning behind this, though I don’t strongly disagree. Certainly it would be great to solve the “human alignment problem”. But what’s your claim?
If a bunch of fully self-interested people are about to be wiped out by an avoidable disaster (or even actively malicious people, who would like to hurt each other a little bit, but value self-preservation more), they’re still better off pooling their resources together to avert disaster.
You might have a prisoner’s dilemma / tragedy of the commons—it’s still even better if you can get everyone else to pool resources to avert disaster, while stepping aside yourself. BUT:
that’s more a coordination problem again, rather than an everyone-is-too-selfish problem
that’s not really the situation with AI, because what you have is more a situation where you can either work really hard to build AGI or work even harder to build safe AGI; it’s not a tragedy of the commons, it’s more like lemmings running off a cliff!
Yeah, the incentives will often be crafted perversely, which likely means that you can expect even more opposition to clear discussion, because there are powerful forces trying to coordinate on the wrong consensus about matters of fact in order to maintain plausible deniability about what they’re doing.
In the example being discussed here, it just seems like a lot of people coordinating on the easier route, partly due to momentum of older practices, partly because certain established people/institutions are somewhat threatened by the better practices.
My feeling is that small examples of the dynamic I’m pointing at come up fairly often, but things pretty reliably go poorly if I point them out, which has resulted in an aversion to pointing such things out.
The conversation has so much gravity toward blame and self-defense that it just can’t go anywhere else.
I’m not going to claim that this is a great post for communicating/educating/fixing anything. It’s a weird post.