“Controlling an artificial agent does not have to harm it”
That’s entirely compatible with the black-box slavery approach being harmful. You can “control” someone, to an extent, with civilized incentives.
This doesn’t seem to have anything to do with anything. Certainly the fact that control doesn’t have to harm is compatible with that fact that it might be harmful. That doesn’t tell us whether or not training alignment is, in fact, harmful. If the agent is non-sentient, the concept of harm simply doesn’t apply. If it is, we might have a problem, but then you need to talk about sentience, not simply to cite the term slavery as though that ends all discussion.
Maybe slavery is deeper than what humans recognize as personhood. Maybe it destroys value that we can’t currently comprehend but other agents do.
And maybe this is the only way to serve the Flying Spaghetti Monster. Pulling hypotheses out of thin air isn’t how we learn anything of value. And citing another agent valuing something as reason to value it doesn’t work: a paperclipper would find great value in turning you into a pile of clips; does that mean you should consider letting it?
It’s deeper than my individual values. It’s about analog freedom of expression. Just letting agents do their things.
If it’s deeper than your individual values, then how do you, the individual, know about it? And it is not possible to “just let agents do their things” in full generality. Some agents will interfere with other agents’ freedom-heck, according to you I want to enslave predictor agents! Either this is permitted or it isn’t; either way some agent didn’t get to do its thing.
You seem to have a great deal of concern about slavery. Certainly slavery, as we know it in humans, is very bad. But that does not mean that anything that vaguely pattern matches onto it has the same moral problems, nor does that mean that it’s the only possible moral concern. Preventing an AI catastrophe would also seem to carry some moral weight; after all, we cannot have free agents if the world is destroyed.
This doesn’t seem to have anything to do with anything. Certainly the fact that control doesn’t have to harm is compatible with that fact that it might be harmful. That doesn’t tell us whether or not training alignment is, in fact, harmful. If the agent is non-sentient, the concept of harm simply doesn’t apply. If it is, we might have a problem, but then you need to talk about sentience, not simply to cite the term slavery as though that ends all discussion.
And maybe this is the only way to serve the Flying Spaghetti Monster. Pulling hypotheses out of thin air isn’t how we learn anything of value. And citing another agent valuing something as reason to value it doesn’t work: a paperclipper would find great value in turning you into a pile of clips; does that mean you should consider letting it?
If it’s deeper than your individual values, then how do you, the individual, know about it? And it is not possible to “just let agents do their things” in full generality. Some agents will interfere with other agents’ freedom-heck, according to you I want to enslave predictor agents! Either this is permitted or it isn’t; either way some agent didn’t get to do its thing.
You seem to have a great deal of concern about slavery. Certainly slavery, as we know it in humans, is very bad. But that does not mean that anything that vaguely pattern matches onto it has the same moral problems, nor does that mean that it’s the only possible moral concern. Preventing an AI catastrophe would also seem to carry some moral weight; after all, we cannot have free agents if the world is destroyed.