I’m guessing this might be due to something like the following:
(There is a common belief on LW that) Most people do not take AI x/s-risk anywhere near seriously enough; that most people who do think about AI x/s-risk are far too optimistic about how hard/easy alignment is; that most people who do concede >10% p(doom) are not actually acting with anywhere near the level of caution that their professed beliefs would imply to be sensible.
If alignment indeed is difficult, then (AI labs) acting based on optimistic assumptions is very dangerous, and could lead to astronomical loss of value (or astronomical disvalue)
Hence: Pushback against memes suggesting that alignment might be easy.
I think there might sometimes be something going on along the lines of “distort the Map in order to compensate for a currently-probably-terrible policy of acting in the Territory”.
Analogy: If, when moving through Territory, you find yourself consistently drifting further east than you intend, then the sane solution is to correct how you move in the Territory; the sane solution is not to skew your map westward to compensate for your drifting. But what if you’re stuck in a bus steered by insane east-drifting monkeys, and you don’t have access to the steering wheel?
Like, if most people are obviously failing egregiously at acting sanely in the face of x/s-risks, due to those people being insane in various ways
(“but it might be easy!”, “this alignment plan has the word ‘democracy’ in it, obviously it’s a good plan!”, “but we need to get the banana before those other monkeys get it!”, “obviously working on capabilities is a good thing, I know because I get so much money and status for doing it”, “I feel good about this plan, that means it’ll probably work”, etc.),
then one of the levers you might (subconsciously) be tempted to try pulling is people’s estimate of p(doom). If everyone were sane/rational, then obviously you should never distort your probability estimates. But… clearly everyone is not sane/rational.
If that’s what’s going on (for many people), then I’m not sure what to think of it, or what to do about it. I wish the world were sane?
I feel like every week there’s a post that says, I might be naive but why can’t we just do X, and X is already well known and not considered sufficient. So it’s easy to see a post claiming a relatively direct solution as just being in that category.
The amount of effort and thinking in this case, plus the reputation of the poster, draws a clear distinction between the useless posts and this one, but it’s easy to imagine people pattern matching into believing that this is also probably useless without engaging with it.
(Ah, to clarify: I wasn’t saying that Kaj’s post seems insane; I was referring to the fact that lots of thinking/discourse in general about AI seems to be dangerously insane.)
I’m guessing this might be due to something like the following:
(There is a common belief on LW that) Most people do not take AI x/s-risk anywhere near seriously enough; that most people who do think about AI x/s-risk are far too optimistic about how hard/easy alignment is; that most people who do concede >10% p(doom) are not actually acting with anywhere near the level of caution that their professed beliefs would imply to be sensible.
If alignment indeed is difficult, then (AI labs) acting based on optimistic assumptions is very dangerous, and could lead to astronomical loss of value (or astronomical disvalue)
Hence: Pushback against memes suggesting that alignment might be easy.
I think there might sometimes be something going on along the lines of “distort the Map in order to compensate for a currently-probably-terrible policy of acting in the Territory”.
Analogy: If, when moving through Territory, you find yourself consistently drifting further east than you intend, then the sane solution is to correct how you move in the Territory; the sane solution is not to skew your map westward to compensate for your drifting. But what if you’re stuck in a bus steered by insane east-drifting monkeys, and you don’t have access to the steering wheel?
Like, if most people are obviously failing egregiously at acting sanely in the face of x/s-risks, due to those people being insane in various ways
(“but it might be easy!”, “this alignment plan has the word ‘democracy’ in it, obviously it’s a good plan!”, “but we need to get the banana before those other monkeys get it!”, “obviously working on capabilities is a good thing, I know because I get so much money and status for doing it”, “I feel good about this plan, that means it’ll probably work”, etc.),
then one of the levers you might (subconsciously) be tempted to try pulling is people’s estimate of p(doom). If everyone were sane/rational, then obviously you should never distort your probability estimates. But… clearly everyone is not sane/rational.
If that’s what’s going on (for many people), then I’m not sure what to think of it, or what to do about it. I wish the world were sane?
I feel like every week there’s a post that says, I might be naive but why can’t we just do X, and X is already well known and not considered sufficient. So it’s easy to see a post claiming a relatively direct solution as just being in that category.
The amount of effort and thinking in this case, plus the reputation of the poster, draws a clear distinction between the useless posts and this one, but it’s easy to imagine people pattern matching into believing that this is also probably useless without engaging with it.
(Ah, to clarify: I wasn’t saying that Kaj’s post seems insane; I was referring to the fact that lots of thinking/discourse in general about AI seems to be dangerously insane.)
This seems correct to me.