I’m open to other terminology. Yes, there is no guarantee about what happens to the operator. As I’m defining it, benignity is defined to be not having outside-world instrumental goals, and the intuition for the term is “not existentially dangerous.”
The best alternative to “benign” that I could come up with is “unambitious”. I’m not very good at this type of thing though, so maybe ask around for other suggestions or indicate somewhere prominent that you’re interested in giving out a prize specifically for this?
What do you think about “aligned”? (in the sense of having goals which don’t interfere with our own, by being limited in scope to the events of the room)
“We’re talking about what you do, not what you do.”
“Suppose you give us a new toy/summarized toy, something like a room, an inside-view view thing, and ask them to explain what you desire.”
“Ah,” you reply, “I’m asking what you think about how your life would go if you lived it way up until now. I think I would be interested in hearing about that.
“Oh? I’d think about that, and I might want to think about it a bit more. So I would say, for example, that you might want to give someone a toy/summarized toy by the same criteria as other people in the room and make them play the role of toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing.
It seems like the answer would be quite different.
“Oh, then,” you say, “That seems like too much work. Let me try harder!”
“What about that—what does this all just—so close to the real thing, don’t you think? And that I shouldn’t think such things are real?”
“Not exactly. But don’t you think there should be any al variative reasons why this is always so hard, or that any al variative reasons are not just illuminative, or couldn’t be found some other way?”
“That’s not exactly how I would put it. I’m fully closed about it. I’m still working on it. I don’t know whether I could get this outcome without spending so much effort on finding this particular method of doing something, because I don’t think it would happen without them trying it, so it’s not like they’re trying to determine whether that outcome is real or not.”
“Ah...” said your friend, staring at you in horror. “So, did you ever even think of the idea, or did it just
A second comment, but it doesn’t seem worth an answer: it can’t be an explicit statement of what would happen if you tried this, and it seems to me unlikely that my initial reaction when it was presented in the first place was insincere, so it seems like a really good idea to let it propagate in your mind a little. I’m hoping a lot of good ideas do become useful this time.
There’s still an existential risk in the sense that the AGI has an incentive to hack the operator to give it maximum reward, and that hack could have powerful effects outside the box (even though the AI hasn’t optimized it for that purpose), for example it might turn out to be a virulent memetic virus. Of course this is much less risky than if the AGI had direct instrumental goals outside the box, but “benign” and “not existentially dangerous” both seem to be claiming a bit too much. I’ll think about what other term might be more suitable.
The first nuclear reaction initiated an unprecedented temperature in the atmosphere, and people were right to wonder whether this would cause the atmosphere to ignite. The existence of a generally intelligent agent is likely to cause unprecedented mental states in humans, and we would be right to wonder whether that will cause an existential catastrophe. I think the concern of “could have powerful effects outside the box” is mostly captured by the unprecedentedness of this mental state, since the mental state is not selected to have those side effects. Certainly there is no way to rule out side-effects of inside-the-box events, since these side effects are the only reason it’s useful. And there is also certainly no way to rule out how those side effects “might turn out to be,” without a complete view of the future.
Would you agree that unprecedentedness captures the concern?
I’m open to other terminology. Yes, there is no guarantee about what happens to the operator. As I’m defining it, benignity is defined to be not having outside-world instrumental goals, and the intuition for the term is “not existentially dangerous.”
The best alternative to “benign” that I could come up with is “unambitious”. I’m not very good at this type of thing though, so maybe ask around for other suggestions or indicate somewhere prominent that you’re interested in giving out a prize specifically for this?
What do you think about “aligned”? (in the sense of having goals which don’t interfere with our own, by being limited in scope to the events of the room)
To clarify, I’m looking for:
“We’re talking about what you do, not what you do.”
“Suppose you give us a new toy/summarized toy, something like a room, an inside-view view thing, and ask them to explain what you desire.”
“Ah,” you reply, “I’m asking what you think about how your life would go if you lived it way up until now. I think I would be interested in hearing about that.
“Oh? I’d think about that, and I might want to think about it a bit more. So I would say, for example, that you might want to give someone a toy/summarized toy by the same criteria as other people in the room and make them play the role of toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing toy-maximizing.
It seems like the answer would be quite different.
“Oh, then,” you say, “That seems like too much work. Let me try harder!”
“What about that—what does this all just—so close to the real thing, don’t you think? And that I shouldn’t think such things are real?”
“Not exactly. But don’t you think there should be any al variative reasons why this is always so hard, or that any al variative reasons are not just illuminative, or couldn’t be found some other way?”
“That’s not exactly how I would put it. I’m fully closed about it. I’m still working on it. I don’t know whether I could get this outcome without spending so much effort on finding this particular method of doing something, because I don’t think it would happen without them trying it, so it’s not like they’re trying to determine whether that outcome is real or not.”
“Ah...” said your friend, staring at you in horror. “So, did you ever even think of the idea, or did it just
What do you think about “domesticated”?
My comment is more like:
A second comment, but it doesn’t seem worth an answer: it can’t be an explicit statement of what would happen if you tried this, and it seems to me unlikely that my initial reaction when it was presented in the first place was insincere, so it seems like a really good idea to let it propagate in your mind a little. I’m hoping a lot of good ideas do become useful this time.
There’s still an existential risk in the sense that the AGI has an incentive to hack the operator to give it maximum reward, and that hack could have powerful effects outside the box (even though the AI hasn’t optimized it for that purpose), for example it might turn out to be a virulent memetic virus. Of course this is much less risky than if the AGI had direct instrumental goals outside the box, but “benign” and “not existentially dangerous” both seem to be claiming a bit too much. I’ll think about what other term might be more suitable.
The first nuclear reaction initiated an unprecedented temperature in the atmosphere, and people were right to wonder whether this would cause the atmosphere to ignite. The existence of a generally intelligent agent is likely to cause unprecedented mental states in humans, and we would be right to wonder whether that will cause an existential catastrophe. I think the concern of “could have powerful effects outside the box” is mostly captured by the unprecedentedness of this mental state, since the mental state is not selected to have those side effects. Certainly there is no way to rule out side-effects of inside-the-box events, since these side effects are the only reason it’s useful. And there is also certainly no way to rule out how those side effects “might turn out to be,” without a complete view of the future.
Would you agree that unprecedentedness captures the concern?
I think my concern is a bit more specific than that. See this comment.