Thank you for your post. It is important for us to keep refining the overall p(doom) and the ways it might happen or be averted. You make your point very clearly, even in just the version presented here, condensed from your full posts on varios specific points.
It seems to me that you are applying a sort of symmetric argument to values and capabilities and arguing that x-risk requires that we hit the bullseye of capability but miss the one for values. I think this has a problem and I’d like to know your view as to how much this problem affects your overall argument.
The problem, as I see it, is that goal-space is qualitatively different from capability-space. With capabilities, there is a clear ordering that is inherent to the capabilities themselves: if you can do more, then you can do less. Someone who can lift 100kg can also lift 80kg. It is not clear to me that this is the case for goal-space; I think it is only extrinsic evaluation by humans that makes “tile the universe with paperclips” a bad goal.
Do you think this difference between these spaces holds, and if so, do you think it undermines your argument?
Thank you for your post. It is important for us to keep refining the overall p(doom) and the ways it might happen or be averted. You make your point very clearly, even in just the version presented here, condensed from your full posts on varios specific points.
It seems to me that you are applying a sort of symmetric argument to values and capabilities and arguing that x-risk requires that we hit the bullseye of capability but miss the one for values. I think this has a problem and I’d like to know your view as to how much this problem affects your overall argument.
The problem, as I see it, is that goal-space is qualitatively different from capability-space. With capabilities, there is a clear ordering that is inherent to the capabilities themselves: if you can do more, then you can do less. Someone who can lift 100kg can also lift 80kg. It is not clear to me that this is the case for goal-space; I think it is only extrinsic evaluation by humans that makes “tile the universe with paperclips” a bad goal.
Do you think this difference between these spaces holds, and if so, do you think it undermines your argument?