Exploitation means the exploiter benefits. If you are a rock, you can’t be exploited. If you are an agent who never gives in to threats, you can’t be exploited (at least by threats, maybe there are other kinds of exploitation). That said, yes, if the opponent agents are the sort to do nasty things to you anyway even though it won’t benefit them, then you might get nasty things done to you. You wouldn’t be exploited, but you’d still be very unhappy.
Cool, I think we basically agree on this point then, sorry for misunderstanding. I just wanted to emphasize the point I made because “you won’t get exploited if you decide not to concede to bullies” is kind of trivially true. :) The operative word in my reply was “robustly,” which is the hard part of dealing with this whole problem. And I think it’s worth keeping in mind how “doing nasty things to you anyway even though it won’t benefit them” is a consequence of a commitment that was made for ex ante benefits, it’s not the agent being obviously dumb as Eliezer suggests. (Fortunately, as you note in your other comment, some asymmetries should make us think these commitments are rare overall; I do think an agent probably needs to have a pretty extreme-by-human-standards, little-to-lose value system to want to do this… but who knows what misaligned AIs might prefer.)
Cool, I think we basically agree on this point then, sorry for misunderstanding. I just wanted to emphasize the point I made because “you won’t get exploited if you decide not to concede to bullies” is kind of trivially true. :) The operative word in my reply was “robustly,” which is the hard part of dealing with this whole problem. And I think it’s worth keeping in mind how “doing nasty things to you anyway even though it won’t benefit them” is a consequence of a commitment that was made for ex ante benefits, it’s not the agent being obviously dumb as Eliezer suggests. (Fortunately, as you note in your other comment, some asymmetries should make us think these commitments are rare overall; I do think an agent probably needs to have a pretty extreme-by-human-standards, little-to-lose value system to want to do this… but who knows what misaligned AIs might prefer.)