It’s appropriate to anthropomorphize when you’re dealing with actual humans, or relevantly human-like things. Someone could legitimately research issues surrounding whole brain emulations, or minor variations on whole brain emulations. Likewise, moral philosophy is a legitimate and important topic. But the bulk of MIRI’s attention doesn’t go to ems or moral philosophy.
The appropriate degree of anthropomorphisation when dealing with an AI made by humans, with human limitations, for human purposes is not zero.
Likewise, moral philosophy is a legitimate and important topic. But the bulk of MIRI’s attention doesn’t go to ems or moral philosophy.Vote upVote down
Are those claims supposed to be linked? ?..we don’t need to deal with moral philosophy if we are not dealing with WBEs?
We’re trying to avoid names like “friendly” and “normative” that could reinforce someone’s impression that we think of AI risk in anthropomorphic terms, that we’re AI-hating technophobes, or that we’re moral philosophers.
Those are just three things we don’t necessarily want to be perceived as; they don’t necessarily share anything else in common. However, because the second one is pejorative and the first is sometimes treated as pejorative, the-citizen was wondering if I’m anti-moral-philosophy. I replied that highly anthropomorphic AI and moral philosophy are both perfectly good fields of study, and overlap at least a little with MIRI’s work; but the typical newcomer is likely to think these are more central to AGI safety work than they are.
But perhaps moral philosophy is important for a FAI? Like for knowing right and wrong so we can teach/build it into the FAI? Understanding right and wrong in some form seems really central to FAI?
There may be questions in moral philosophy that we need to answer in order to build a Friendly AI, but most MIRI-associated people don’t think that the bulk of the difficulty of Friendly AI (over generic AGI) is in generating a sufficiently long or sufficiently basic list of intuitively moral English-language sentences. Eliezer thinks the hard part of Friendly AI is stability under self-modification; I’ve heard other suggestions to the effect that the hard part is logical uncertainty, or identifying how preference and motivation are implemented in human brains.
The problems you need to solve in order to convince a hostile human being to become a better person, or to organize a society, or to motivate yourself to do the right thing, aren’t necessarily the same as the problems you need to solve to build the brain of a value-conducive agent from scratch.
The stability under self-modification is a core problem of AGI generally, isn’t it? So isn’t that an effort to solve AGI, not safety/friendliness (which would be fairly depressing given its stated goals)? Does MIRI have a way to define safety/friendliness that isn’t derivative of moral philosophy?
Additionally, many human preferences are almost certainly not moral… surely a key part of the project would be to find some way to separate the two. Preference satisfaction seems like a potentially very unfriendly goal...
If you want to build an unfriendly AI, you probably don’t need to solve the stability problem. If you have a consistently self-improving agent with unstable goals, it should eventually (a) reach an intelligence level where it could solve the stability problem if it wanted to, then (b) randomly arrive at goals that entail their own preservation, then (c) implement the stability solution before the self-preserving goals can get overwritten. You can delegate the stability problem to the AI itself. The reason this doesn’t generalize to friendly AI is that this process doesn’t provide any obvious way for humans to determine which goals the agent has at step (b).
MIRI makes the methodological proposal that it simplifies the issue of friendliness (or morality or safety) to deal with the whole of human value, rather than identifying a morally relevant subset. Having done that, it concludes that human morality is extremely complex. In other words, the payoff in terms of methodological simplification never arrives, for all that MIRI relieves itself of the burden of coming up with a theory of morality. Since dealing with human value in total is in absolute terms very complex, the possibility remains open that identifying the morally relevant subset of values is relatively easier (even if still difficult in absolute terms) than designing an AI to be friendly in terms of the totality of value, particularly since philosophy offers a body of work that seeks to identify simple underlying principles of ethics.
The idea of a tractable, rationally discoverable , set of ethical principles is a weaker form of, or lead into, one of the most common objections to the MIRI approach: “Why doesn’t the AI figure out morality itself?”.
Thanks that’s informative. Not entirely sure your own position is from your post, but I agree with what I take your implication to be—that a rationally discoverable set of ethics might not be as sensible notion as it sounds. But on the other hand human preference satisfaction seems a really bad goal—many human preferences in the world are awful—take a desire for power over others for example. Otherwise human society wouldn’t have wars, torture, abuse etc etc. I haven’t read up on CEV in detail, but from what I’ve seen it suffers from a confusion that somehow decent preferences are gained simply by obtaining enough knowledge? I’m not fully up to speed here so I’m willing to be corrected.
EDIT> Oh… CEV is the main accepted approach at MIRI :-( I assumed it was one of many
that a rationally discoverable set of ethics might not be as sensible notion as it sounds.
That wasn’t the point I thought I was making. I thought I was making the point that the idea of tractable sets of moral truths had been sidelined rather than sidestepped...that it had been neglected on the basis of a simplification that has not been delivered.
Having said that, I agree that discoverable morality has the potential downside of being inconvenient to, or unfriendly for , humans: the one true morality might be some deep ecology that required a much lower human population, among many other possibilities. That might have been a better argument against discoverable morality ethics than the one actually presented.
But on the other hand human preference satisfaction seems a really bad goal—many human preferences in the world are awful—take a desire for power over others for example. Otherwise human society wouldn’t have wars, torture, abuse etc etc.
Thanks for reply. That makes more sense to me now. I agree with a fair amount of what you say. I think you’d have a sense from our previous discussions why I favour physicalist approaches to the morals of a FAI, rather than idealist or dualist, regardless of whether physicalism is true or false. So I won’t go there. I pretty much agree with the rest.
EDIT> Oh just on the deep ecology point, I believe that might be solvable by prioritising species based on genetic similarity to humans. So basically weighting humans highest and other species less so based on relatedness. I certainly wouldn’t like to see a FAI adopting the view that people have of “humans are a disease” and other such views, so hopefully we can find a way to avoid that sort of thing.
Yes some humans seem to have adopted this view where intelligence moves from being a tool and having instrumental value to being instrinsically/terminally valuable. I find often the justifcation for this to be pretty flimsy, though quite a few people seem to have this view. Let’s hope a AGI doesn’t lol.
What do you feel is bad about moral philosophy? It looks like you dislike it because place it next to anthropormorphic thinking and technophobia.
It’s appropriate to anthropomorphize when you’re dealing with actual humans, or relevantly human-like things. Someone could legitimately research issues surrounding whole brain emulations, or minor variations on whole brain emulations. Likewise, moral philosophy is a legitimate and important topic. But the bulk of MIRI’s attention doesn’t go to ems or moral philosophy.
The appropriate degree of anthropomorphisation when dealing with an AI made by humans, with human limitations, for human purposes is not zero.
Are those claims supposed to be linked? ?..we don’t need to deal with moral philosophy if we are not dealing with WBEs?
the-citizen is replying to this thing I said:
Those are just three things we don’t necessarily want to be perceived as; they don’t necessarily share anything else in common. However, because the second one is pejorative and the first is sometimes treated as pejorative, the-citizen was wondering if I’m anti-moral-philosophy. I replied that highly anthropomorphic AI and moral philosophy are both perfectly good fields of study, and overlap at least a little with MIRI’s work; but the typical newcomer is likely to think these are more central to AGI safety work than they are.
For the record, my current position is that if MIRI doesn’t think it’s central, then it’s probably doing it wrong.
But perhaps moral philosophy is important for a FAI? Like for knowing right and wrong so we can teach/build it into the FAI? Understanding right and wrong in some form seems really central to FAI?
There may be questions in moral philosophy that we need to answer in order to build a Friendly AI, but most MIRI-associated people don’t think that the bulk of the difficulty of Friendly AI (over generic AGI) is in generating a sufficiently long or sufficiently basic list of intuitively moral English-language sentences. Eliezer thinks the hard part of Friendly AI is stability under self-modification; I’ve heard other suggestions to the effect that the hard part is logical uncertainty, or identifying how preference and motivation are implemented in human brains.
The problems you need to solve in order to convince a hostile human being to become a better person, or to organize a society, or to motivate yourself to do the right thing, aren’t necessarily the same as the problems you need to solve to build the brain of a value-conducive agent from scratch.
The stability under self-modification is a core problem of AGI generally, isn’t it? So isn’t that an effort to solve AGI, not safety/friendliness (which would be fairly depressing given its stated goals)? Does MIRI have a way to define safety/friendliness that isn’t derivative of moral philosophy?
Additionally, many human preferences are almost certainly not moral… surely a key part of the project would be to find some way to separate the two. Preference satisfaction seems like a potentially very unfriendly goal...
If you want to build an unfriendly AI, you probably don’t need to solve the stability problem. If you have a consistently self-improving agent with unstable goals, it should eventually (a) reach an intelligence level where it could solve the stability problem if it wanted to, then (b) randomly arrive at goals that entail their own preservation, then (c) implement the stability solution before the self-preserving goals can get overwritten. You can delegate the stability problem to the AI itself. The reason this doesn’t generalize to friendly AI is that this process doesn’t provide any obvious way for humans to determine which goals the agent has at step (b).
Cheers thanks for the informative reply.
MIRI makes the methodological proposal that it simplifies the issue of friendliness (or morality or safety) to deal with the whole of human value, rather than identifying a morally relevant subset. Having done that, it concludes that human morality is extremely complex. In other words, the payoff in terms of methodological simplification never arrives, for all that MIRI relieves itself of the burden of coming up with a theory of morality. Since dealing with human value in total is in absolute terms very complex, the possibility remains open that identifying the morally relevant subset of values is relatively easier (even if still difficult in absolute terms) than designing an AI to be friendly in terms of the totality of value, particularly since philosophy offers a body of work that seeks to identify simple underlying principles of ethics.
The idea of a tractable, rationally discoverable , set of ethical principles is a weaker form of, or lead into, one of the most common objections to the MIRI approach: “Why doesn’t the AI figure out morality itself?”.
Thanks that’s informative. Not entirely sure your own position is from your post, but I agree with what I take your implication to be—that a rationally discoverable set of ethics might not be as sensible notion as it sounds. But on the other hand human preference satisfaction seems a really bad goal—many human preferences in the world are awful—take a desire for power over others for example. Otherwise human society wouldn’t have wars, torture, abuse etc etc. I haven’t read up on CEV in detail, but from what I’ve seen it suffers from a confusion that somehow decent preferences are gained simply by obtaining enough knowledge? I’m not fully up to speed here so I’m willing to be corrected.
EDIT> Oh… CEV is the main accepted approach at MIRI :-( I assumed it was one of many
That wasn’t the point I thought I was making. I thought I was making the point that the idea of tractable sets of moral truths had been sidelined rather than sidestepped...that it had been neglected on the basis of a simplification that has not been delivered.
Having said that, I agree that discoverable morality has the potential downside of being inconvenient to, or unfriendly for , humans: the one true morality might be some deep ecology that required a much lower human population, among many other possibilities. That might have been a better argument against discoverable morality ethics than the one actually presented.
Most people have a preference for not being the victims of war or torture. Maybe something could be worked up from that.
I’ve seen comments to the effect that to the effect that it has been abandoned. The situation is unclear.
Thanks for reply. That makes more sense to me now. I agree with a fair amount of what you say. I think you’d have a sense from our previous discussions why I favour physicalist approaches to the morals of a FAI, rather than idealist or dualist, regardless of whether physicalism is true or false. So I won’t go there. I pretty much agree with the rest.
EDIT> Oh just on the deep ecology point, I believe that might be solvable by prioritising species based on genetic similarity to humans. So basically weighting humans highest and other species less so based on relatedness. I certainly wouldn’t like to see a FAI adopting the view that people have of “humans are a disease” and other such views, so hopefully we can find a way to avoid that sort of thing.
I think you have an idea from our previous discussions why I don’t think you physicalism, etc, is relevant to ethics.
Indeed I do! :-)
Or simply extremly smart AI’s > human minds.
Yes some humans seem to have adopted this view where intelligence moves from being a tool and having instrumental value to being instrinsically/terminally valuable. I find often the justifcation for this to be pretty flimsy, though quite a few people seem to have this view. Let’s hope a AGI doesn’t lol.