That building an intellegent agent that qualifies as “ethical,” even of it is SUPER ethical, may not be the same thing as building an intelligent agent that is compatible with humans or their values.
More plainly stated, just because your AI has a self-consitent, justifiable ethics system, doesnt mean that it likes humans, or even cares about wiping them out.
Having an AI that is ethical isn’t enough. It has to actually care about humans and their values. Even if it has rules in place like not aggressing, attacking, or killing humans, it may still be able to cause humanity to go extinct indirectly.
I don’t think this is totally off the mark, but I think the point (as pertaining to ethics) was that even systems like Kantian Deontological ethics are not immune to orthagonality. It never occurs to most humans that you could have a Kantian moral system that doesn’t involve taking care of humans, because our brains are so hardwired to discard unthinkable options when searching for solutions to “universalizable deontologies.”
I’m not sure, but I think maybe some people who think alignment is a simple problem, even if they accept orthagonality, think that all you have to do to have a moral intelligent system is not build it to be a consequentialiat with simple consequentialist values like “maximize happiness.” While they are right, that a pure consequentialist is really hard to get right, they are probably underestimating how difficult it is to get a Kantian agent right as well, especially since what your Kantian agent finds acceptable or unacceptable if universalized will still depend on underlying values.
An example:
Libertrianism, as a philosophy, is built on the idea of “just make laws that are as universally compatible with value systems as possible and let everyone else sort out the rest on their own.” Or to say it differently, prohibit killing and stealing since that will detract from peoples liberty to pursue their own agendas, and let them do whatever they want sonlong as they dont effect other people. Not in principle a bad idea for something like an AI, or governemnt to follow, since in theory you maximize the value space for agents within the system to follow. It is a terrible system though, if you want your AI, or government, or whatever to actually take care of people though, or worry about what the consequences of it’s actions might be on people, since taking care of people isn’t actually anywhere in those values. Libertarianism is self consistent, and at least allows for the values of taking care of people, but it does not necessitate them.
This is not an argument on whether or not adopting a linertarian philosophy is a good or bad thing for an AI or government to do, but the point is that if an AI adopts a Kantian ethics system from only universalisable principles, Libertariansim fits the bill, and the consequentialist part of you may be upset when your absolute libertarian AI doesn’t bat an eye at not doing anything to prevent humanity from being outcompeted and dying out, or it may even find humanity incompatible with its morally consitent principles.
I think most people who have taken a single ethics class come to agree (if they arent stupidly stubborn) that you are unlikely to find a satisfying system of ethics using pure Kantian or Consequentialist systems.
Probably because actual human ethical decison making relies on a mix of both consequentialist decison making (“If I decide X, this will have Y consequence which is incompatible with Z value”) and Deontological Imperatives that we learn from our culture. (“Don’t kill people. Even if it really seems like a good idea.”)
I think most people who have taken a single ethics class come to agree (if they arent stupidly stubborn) that you are unlikely to find a satisfying system of ethics using pure Kantian or Consequentialist systems.
By “satisfying” do you mean capturing moral intuitions well in most/all situations? If so, I very much agree that you won’t find such a thing. One reason is that people use a mix of consequentialist and deontological approaches.I think another reason is that people’s moral intuitions are outright self-contradictory. They’re not systematic, so no system can reproduce them.
I don’t think this means much other than that the study of ethics can’t be just about finding a system that reproduces our moral intuitions.
Part of thinking about ethics is changing ones’ moral intuitions by identifying where they’re self-contradictory.
Yes, precisely! That is exactly why I used the word “Satisfying” rather than another word like “good”, “accurate,” or even “self-consistent.” I remember in my bioethics class, the professor steadily challenging everyone on their initial impression of Kantian or consequentialist ethics until they found some consequence of that sort of reasoning they found unbearable.
I agree on all counts, though I’m not actually certain that having a self-contradictory set of values is necessarily a bad thing? It usually is, but many human aesthetic values are self-contradictory, yet I think I prefer to keep them around. I may change my mind on this later.
I can’t work it out myself. Please tell me the correct opinion to have
Neither, and that’s ok.
Right but what is the synthesis or lesson or whatever?
That building an intellegent agent that qualifies as “ethical,” even of it is SUPER ethical, may not be the same thing as building an intelligent agent that is compatible with humans or their values.
More plainly stated, just because your AI has a self-consitent, justifiable ethics system, doesnt mean that it likes humans, or even cares about wiping them out.
Having an AI that is ethical isn’t enough. It has to actually care about humans and their values. Even if it has rules in place like not aggressing, attacking, or killing humans, it may still be able to cause humanity to go extinct indirectly.
“We should learn more about ethics” is, I believe, the bottom line here.
I don’t think this is totally off the mark, but I think the point (as pertaining to ethics) was that even systems like Kantian Deontological ethics are not immune to orthagonality. It never occurs to most humans that you could have a Kantian moral system that doesn’t involve taking care of humans, because our brains are so hardwired to discard unthinkable options when searching for solutions to “universalizable deontologies.”
I’m not sure, but I think maybe some people who think alignment is a simple problem, even if they accept orthagonality, think that all you have to do to have a moral intelligent system is not build it to be a consequentialiat with simple consequentialist values like “maximize happiness.” While they are right, that a pure consequentialist is really hard to get right, they are probably underestimating how difficult it is to get a Kantian agent right as well, especially since what your Kantian agent finds acceptable or unacceptable if universalized will still depend on underlying values.
An example: Libertrianism, as a philosophy, is built on the idea of “just make laws that are as universally compatible with value systems as possible and let everyone else sort out the rest on their own.” Or to say it differently, prohibit killing and stealing since that will detract from peoples liberty to pursue their own agendas, and let them do whatever they want sonlong as they dont effect other people. Not in principle a bad idea for something like an AI, or governemnt to follow, since in theory you maximize the value space for agents within the system to follow. It is a terrible system though, if you want your AI, or government, or whatever to actually take care of people though, or worry about what the consequences of it’s actions might be on people, since taking care of people isn’t actually anywhere in those values. Libertarianism is self consistent, and at least allows for the values of taking care of people, but it does not necessitate them.
This is not an argument on whether or not adopting a linertarian philosophy is a good or bad thing for an AI or government to do, but the point is that if an AI adopts a Kantian ethics system from only universalisable principles, Libertariansim fits the bill, and the consequentialist part of you may be upset when your absolute libertarian AI doesn’t bat an eye at not doing anything to prevent humanity from being outcompeted and dying out, or it may even find humanity incompatible with its morally consitent principles.
I think most people who have taken a single ethics class come to agree (if they arent stupidly stubborn) that you are unlikely to find a satisfying system of ethics using pure Kantian or Consequentialist systems.
Probably because actual human ethical decison making relies on a mix of both consequentialist decison making (“If I decide X, this will have Y consequence which is incompatible with Z value”) and Deontological Imperatives that we learn from our culture. (“Don’t kill people. Even if it really seems like a good idea.”)
When you say
By “satisfying” do you mean capturing moral intuitions well in most/all situations? If so, I very much agree that you won’t find such a thing. One reason is that people use a mix of consequentialist and deontological approaches.I think another reason is that people’s moral intuitions are outright self-contradictory. They’re not systematic, so no system can reproduce them.
I don’t think this means much other than that the study of ethics can’t be just about finding a system that reproduces our moral intuitions.
Part of thinking about ethics is changing ones’ moral intuitions by identifying where they’re self-contradictory.
Yes, precisely! That is exactly why I used the word “Satisfying” rather than another word like “good”, “accurate,” or even “self-consistent.” I remember in my bioethics class, the professor steadily challenging everyone on their initial impression of Kantian or consequentialist ethics until they found some consequence of that sort of reasoning they found unbearable.
I agree on all counts, though I’m not actually certain that having a self-contradictory set of values is necessarily a bad thing? It usually is, but many human aesthetic values are self-contradictory, yet I think I prefer to keep them around. I may change my mind on this later.