You make it sound like those two things are mutually exclusive. They aren’t. We are trying to define words so that we can understand and manipulate behavior.
“I don’t know what blackmail is, but I want to make sure an AI doesn’t do it.” Yes, exactly, as long as you interpret it in the way I explained it above.* What’s wrong with that? Isn’t that exactly what the AI safety project is, in general? “I don’t know what bad behaviors are, but I want to make sure the AI doesn’t do them.”
*”In other words there are a cluster of behaviors that we do NOT want our AI to have, which seem blackmailish to us, and a cluster of behaviors that we DO want it to have, which seem tradeish to us. So we are now trying to draw a line in conceptual space between them so that we can figure out how to program an AI appropriately.”
It’s a badly formulated question, likely to lead to confusion.
there are a cluster of behaviors that we do NOT want our AI to have, which seem blackmailish to us
So, can you specify what this cluster is? Can you list the criteria by which a behaviour would be included in or excluded from this cluster? If you do this, you have defined blackmail.
“It’s a badly formulated question, likely to lead to confusion.” Why? That’s precisely what I’m denying.
“So, can you specify what this cluster is? Can you list the criteria by which a behaviour would be included in or excluded from this cluster? If you do this, you have defined blackmail.”
That’s precisely what I (Stuart really) am trying to do! I said so, you even quoted me saying so, and as I interpret him, Stuart said so too in the OP. I don’t care about the word blackmail except as a means to an end; I’m trying to come up with criteria by which to separate the bad behaviors from the good.
I’m honestly baffled at this whole conversation. What Stuart is doing seems the opposite of confused to me.
You make it sound like those two things are mutually exclusive. They aren’t. We are trying to define words so that we can understand and manipulate behavior.
“I don’t know what blackmail is, but I want to make sure an AI doesn’t do it.” Yes, exactly, as long as you interpret it in the way I explained it above.* What’s wrong with that? Isn’t that exactly what the AI safety project is, in general? “I don’t know what bad behaviors are, but I want to make sure the AI doesn’t do them.”
*”In other words there are a cluster of behaviors that we do NOT want our AI to have, which seem blackmailish to us, and a cluster of behaviors that we DO want it to have, which seem tradeish to us. So we are now trying to draw a line in conceptual space between them so that we can figure out how to program an AI appropriately.”
It’s a badly formulated question, likely to lead to confusion.
So, can you specify what this cluster is? Can you list the criteria by which a behaviour would be included in or excluded from this cluster? If you do this, you have defined blackmail.
“It’s a badly formulated question, likely to lead to confusion.” Why? That’s precisely what I’m denying.
“So, can you specify what this cluster is? Can you list the criteria by which a behaviour would be included in or excluded from this cluster? If you do this, you have defined blackmail.”
That’s precisely what I (Stuart really) am trying to do! I said so, you even quoted me saying so, and as I interpret him, Stuart said so too in the OP. I don’t care about the word blackmail except as a means to an end; I’m trying to come up with criteria by which to separate the bad behaviors from the good.
I’m honestly baffled at this whole conversation. What Stuart is doing seems the opposite of confused to me.