Yes. You can convince a sufficiently rational paperclip maximizer that killing people is Yudkowsy::evil, but you can’t convince it to not take Yudkowsy::evil actions, no matter how rational it is. AKA the orthogonality thesis (when talking about other minds) and “the utility function is not up for grabs” (when talking about ourselves).
You are using rational to mean instrumentally rational. You can’t disprove the existence of agents that value rationality terminally, for its own sake … indeed the OT means they must exist. And when people say rationally persuadablable agents exist, that iswhat they mean by rational....they are not using your language.
I don’t see how that makes any difference. You could convince “agents that value rationality terminally, for its own sake” that killing people is evil, but you couldn’t necessarily convince them not to kill people, much like Pebblesorters could convince them that 15 is composite but they couldn’t necessarily convince them not to heap 15 pebbles together.
You can’t necessarily convince them, and I didn’t say you could, necessarily. That depends on the probability of the claims that morality can be figured out and/or turned into a persuasive argument. These need to be estimated in order to estimate the likeliness of the MIRI solution being optimal, since the higher the probability of alternatives the lower the probability of the MIRI scenario.
Yes. You can convince a sufficiently rational paperclip maximizer that killing people is Yudkowsy::evil, but you can’t convince it to not take Yudkowsy::evil actions, no matter how rational it is. AKA the orthogonality thesis (when talking about other minds) and “the utility function is not up for grabs” (when talking about ourselves).
You are using rational to mean instrumentally rational. You can’t disprove the existence of agents that value rationality terminally, for its own sake … indeed the OT means they must exist. And when people say rationally persuadablable agents exist, that iswhat they mean by rational....they are not using your language.
I don’t see how that makes any difference. You could convince “agents that value rationality terminally, for its own sake” that killing people is evil, but you couldn’t necessarily convince them not to kill people, much like Pebblesorters could convince them that 15 is composite but they couldn’t necessarily convince them not to heap 15 pebbles together.
You can’t necessarily convince them, and I didn’t say you could, necessarily. That depends on the probability of the claims that morality can be figured out and/or turned into a persuasive argument. These need to be estimated in order to estimate the likeliness of the MIRI solution being optimal, since the higher the probability of alternatives the lower the probability of the MIRI scenario.
Probabilities make a difference.