Many thanks, Vaniver. Now I get it, and I really like it because “suffering” rings badly whereas s-risks seems like a more manageable concept to work with.
As I understand it, Eliezer generally thinks suffering risks are unlikely, basically because the situation is best viewed as, there is this incredibly high dimensional space of possible futures (where the dimensions are how much certain values are satisfied), and the alignment problem consists of aiming at an incredibly small area in this space. The area of really bad futures may be much larger than the area of good futures, but it’s still so tiny that even the <1% chance of solving alignment probably dominates the probability of landing in the space of bad futures by accident, if we don’t know what we’re doing. 99.999999...% of the space neither has positive nor negative value.
Rafael, many thanks for your answer. I like how you conceptualize the matter but I fight for understand the very last part of your comment. If we have a multidimensional space where the axes represent (let’s say, for clarity, positive and negative values of) how certain values are satisfied, how is it possible that most places in space are indifferent?
I think the idea is that most areas in the space contain barely any conscious experience. If you have some rogue AI optimizing all matter for some criterion x, there’s no reason why the resulting structures should be conscious. (To what extent the AI itself would be is actually talked about in this other comment thread.)
But the objection is good, I think “how well values are satisfied” was not the right description of the axes. Probably more like, if one of your values is y, like physical temperature to choose something mundane, then y can take different values but only a tiny subset of those will be to your liking; most would mean you die immediately. (Note that I’m only trying to paraphrase, this is not my model.) If most values work like this, you get the above picture.
I’m surprised that in a post about dying with dignity and its comments, the word suffer / suffering is found zero times. Can someone explain it?
Yeah, you’ve got to check for abbreviations;
s-risk
shows up 14 times (not including this comment), mostly here and below.Many thanks, Vaniver. Now I get it, and I really like it because “suffering” rings badly whereas s-risks seems like a more manageable concept to work with.
As I understand it, Eliezer generally thinks suffering risks are unlikely, basically because the situation is best viewed as, there is this incredibly high dimensional space of possible futures (where the dimensions are how much certain values are satisfied), and the alignment problem consists of aiming at an incredibly small area in this space. The area of really bad futures may be much larger than the area of good futures, but it’s still so tiny that even the <1% chance of solving alignment probably dominates the probability of landing in the space of bad futures by accident, if we don’t know what we’re doing. 99.999999...% of the space neither has positive nor negative value.
Rafael, many thanks for your answer. I like how you conceptualize the matter but I fight for understand the very last part of your comment. If we have a multidimensional space where the axes represent (let’s say, for clarity, positive and negative values of) how certain values are satisfied, how is it possible that most places in space are indifferent?
I think the idea is that most areas in the space contain barely any conscious experience. If you have some rogue AI optimizing all matter for some criterion x, there’s no reason why the resulting structures should be conscious. (To what extent the AI itself would be is actually talked about in this other comment thread.)
But the objection is good, I think “how well values are satisfied” was not the right description of the axes. Probably more like, if one of your values is y, like physical temperature to choose something mundane, then y can take different values but only a tiny subset of those will be to your liking; most would mean you die immediately. (Note that I’m only trying to paraphrase, this is not my model.) If most values work like this, you get the above picture.
See also Value is Fragile.