There isn’t a way to comment there, so I’m commenting here:
--The author, like many others, misunderstands what people mean when they talk about capabilities vs. alignment. They do not mean that everything is either one or the other, and that nothing is both. --Just because humanity can affect something doesn’t mean it is bad to have a credence in it. P(doom) is a very important concept I think which I am glad people are going around discussing. It’s analogous to asking “Does it look like we are going to win the war?” and “Is that new coronavirus we hear about in the news going to overwhelm the hospital system in this country?” In both cases, whether or not it happens depends hugely on choices humanity makes. --It looks like this is a thoughtful and insightful piece with lots of good bits which will stick in my mind. E.g. the discussion of how RLHF accelerates ASI timelines, the discussion of situations in which we have accurate predictive theories already, and the persuasion paradox about how people worried about AI x-risk can’t be too specific or else they make it (slightly) more likely.
The author, like many others, misunderstands what people mean when they talk about capabilities vs. alignment. They do not mean that everything is either one or the other, and that nothing is both
I had a similar reaction, which made me want to go looking for the source of disagreement. Do you have a post or thread that comes to mind which makes this distinction well? Most of what I am able to find just sort of gestures at some tradeoffs, which seems like a situation where we would expect the kind of misunderstanding you describe.
To all doing that (directly and purposefully for its own sake, rather than as a mournful negative externality to alignment research): I request you stop.
Will a pandemic get to my country is a matter of degree, since in principle you can have a pandemic that killed 90% of counterfactual economic activity in one country break containment but only destroy 10% in your country.
“Alignment” or “transition to TAI” of any kind is way further from “coinflip” than either of these, so if you think doomcoin is salvageable or want to defend its virtues you need way different reference classes.
Think about the ways in which winning or losing a war isn’t binary—lots of ways for implementation details of an agreement to effect your life as a citizen of one of the countries. AI is like this but even further—all the different kinds of outcomes, how central or unilateral are important moments, which values end up being imposed on the future and at what resolution, etc. People who think “we have a doomcoin toss coming up, now argue about the p(heads)” are not gonna think about this stuff!
To me, “p(doom)” is a memetic PITA as bad as “the real unaligned AI was the corporations/calitalism”, so I’m excited that you’re defending it! Usually people tell me “yeah you’re right it’s not a defensible frame haha”
Interesting, thanks. Yeah, I currently think the range of possible outcomes in warfare seems to be more smeared out, across a variety of different results, than the range of possible outcomes for humanity with respect to AGI. The bulk of the probability mass in the AGI case, IMO, is concentrated in “Total victory of unaligned, not-near-miss AGIs” and then there are smaller chunks concentrated in “Total victory of unaligned, near-miss AGIs” (near-miss means what they care about is similar enough to what we care about that it is either noticeably better, or noticeably worse, than human extinction.) and of course “human victory,” which can itself be subdivided depending on the details of how that goes.
Whereas with warfare, there’s almost a continuous range of outcomes ranging from “total annihilation and/or enslavement of our people” to “total victory” with pretty much everything in between a live possibility, and indeed some sort of negotiated settlement more likely than not.
I do agree that there are a variety of different outcomes with AGI, but I think if people think seriously about the spread of outcomes (instead of being daunted and deciding not to think about it because it’s so speculative) they’ll conclude that they fall into the buckets I described.
Separately, I think that even if it was less binary than warfare, it would still be good to talk about p(doom). I think it’s pretty helpful for orienting people & also I think a lot of harm comes from people having insufficiently high p(doom). Like, a lot of people are basically feeling/thinking “yeah it looks like things could go wrong but probably things will be fine probably we’ll figure it out, so I’m going to keep working on capabilities at the AGI lab and/or keep building status and prestige and influence and not rock the boat too much because who knows what the future might bring but anyhow we don’t want to do anything drastic that would get us ridiculed and excluded now.” If they are actually correct that there’s, say, a 5% chance of AI doom, coming from worlds in which things are harder than we expect and an unfortunate chain of events occurs and people make a bunch of mistakes or bad people seize power, maybe something in this vicinity is justified. But if instead we are in a situation where doom is the default and we need a bunch of unlikely things to happen and/or a bunch of people to wake up and work very hard and very smart and coordinate well, in order to NOT suffer unaligned AI takeover...
There isn’t a way to comment there, so I’m commenting here:
--The author, like many others, misunderstands what people mean when they talk about capabilities vs. alignment. They do not mean that everything is either one or the other, and that nothing is both.
--Just because humanity can affect something doesn’t mean it is bad to have a credence in it. P(doom) is a very important concept I think which I am glad people are going around discussing. It’s analogous to asking “Does it look like we are going to win the war?” and “Is that new coronavirus we hear about in the news going to overwhelm the hospital system in this country?” In both cases, whether or not it happens depends hugely on choices humanity makes.
--It looks like this is a thoughtful and insightful piece with lots of good bits which will stick in my mind. E.g. the discussion of how RLHF accelerates ASI timelines, the discussion of situations in which we have accurate predictive theories already, and the persuasion paradox about how people worried about AI x-risk can’t be too specific or else they make it (slightly) more likely.
I had a similar reaction, which made me want to go looking for the source of disagreement. Do you have a post or thread that comes to mind which makes this distinction well? Most of what I am able to find just sort of gestures at some tradeoffs, which seems like a situation where we would expect the kind of misunderstanding you describe.
Perhaps this? Request: stop advancing AI capabilities—LessWrong 2.0 viewer (greaterwrong.com)
Yep, it’s nicely packaged right here:
Ehh, it’s not long enough, doesn’t explain things as well as it could.
Winning or losing a war kinda binary.
Will a pandemic get to my country is a matter of degree, since in principle you can have a pandemic that killed 90% of counterfactual economic activity in one country break containment but only destroy 10% in your country.
“Alignment” or “transition to TAI” of any kind is way further from “coinflip” than either of these, so if you think doomcoin is salvageable or want to defend its virtues you need way different reference classes.
Think about the ways in which winning or losing a war isn’t binary—lots of ways for implementation details of an agreement to effect your life as a citizen of one of the countries. AI is like this but even further—all the different kinds of outcomes, how central or unilateral are important moments, which values end up being imposed on the future and at what resolution, etc. People who think “we have a doomcoin toss coming up, now argue about the p(heads)” are not gonna think about this stuff!
To me, “p(doom)” is a memetic PITA as bad as “the real unaligned AI was the corporations/calitalism”, so I’m excited that you’re defending it! Usually people tell me “yeah you’re right it’s not a defensible frame haha”
Interesting, thanks. Yeah, I currently think the range of possible outcomes in warfare seems to be more smeared out, across a variety of different results, than the range of possible outcomes for humanity with respect to AGI. The bulk of the probability mass in the AGI case, IMO, is concentrated in “Total victory of unaligned, not-near-miss AGIs” and then there are smaller chunks concentrated in “Total victory of unaligned, near-miss AGIs” (near-miss means what they care about is similar enough to what we care about that it is either noticeably better, or noticeably worse, than human extinction.) and of course “human victory,” which can itself be subdivided depending on the details of how that goes.
Whereas with warfare, there’s almost a continuous range of outcomes ranging from “total annihilation and/or enslavement of our people” to “total victory” with pretty much everything in between a live possibility, and indeed some sort of negotiated settlement more likely than not.
I do agree that there are a variety of different outcomes with AGI, but I think if people think seriously about the spread of outcomes (instead of being daunted and deciding not to think about it because it’s so speculative) they’ll conclude that they fall into the buckets I described.
Separately, I think that even if it was less binary than warfare, it would still be good to talk about p(doom). I think it’s pretty helpful for orienting people & also I think a lot of harm comes from people having insufficiently high p(doom). Like, a lot of people are basically feeling/thinking “yeah it looks like things could go wrong but probably things will be fine probably we’ll figure it out, so I’m going to keep working on capabilities at the AGI lab and/or keep building status and prestige and influence and not rock the boat too much because who knows what the future might bring but anyhow we don’t want to do anything drastic that would get us ridiculed and excluded now.” If they are actually correct that there’s, say, a 5% chance of AI doom, coming from worlds in which things are harder than we expect and an unfortunate chain of events occurs and people make a bunch of mistakes or bad people seize power, maybe something in this vicinity is justified. But if instead we are in a situation where doom is the default and we need a bunch of unlikely things to happen and/or a bunch of people to wake up and work very hard and very smart and coordinate well, in order to NOT suffer unaligned AI takeover...