Friendly AI—Being good vs. having great sex
[...] I think an LW post is important and interesting in proportion to how much it helps construct a Friendly AI, how much it gets people to participate in the human project [...]
I’m not going to wait for philosophers to cover this issue correctly, or for use in FAI design.
The above quotes hint at the possibility that some of the content that can be found on lesswrong.com has been written in support of friendly AI research.
My question, of what importance is ethics when it comes to friendly AI research? If a friendly AI is one that does protect and cultivate human values, how does ethics help to achieve this?
Let’s assume that there exist some sort of objective right, no matter what that actually means. If humans desire to be right, isn’t it the sort of human value that a friendly AI would seek to protect and cultivate?
What difference is there between wanting to be good and wanting to have a lot of great sex? Both seem to be values that humans might desire, therefore both values have to be taken into account by a friendly AI.
If a friendly AI has to be able to extrapolate the coherent volition of humanity, without any hard-coded knowledge of human values, why doesn’t this extent to ethics as well?
If we have to solve ethics before being able to design friendly AI, if we have to hard-code what it means to be good, how doesn’t this apply to what it means to have great sex as well (or what it means to have sex anyway)?
If a friendly AI is going to figure out what humans desire, by extrapolating their volition, might it conclude that our volition is immoral and therefore undesirable?
I find this post very confusing. The most enlightening thing you could tell me is what motivated you to write it.
All this is too much for my psyche and I am trying to make it go away somehow. So sometimes I am getting upset and write in a rage without thinking.
Let me know if there’s anything I can do.
I think that good doesn’t really exist outside of people’s heads, and what it means to be good would fall out of CEV, and can’t actually contradict our volition.
SIAI seems to be focused more on metaethics, and how we can tell if out system of ethics is legitimate more than actual nitty-gritty ethical details that would be explicitly programmed in.
Well, I guess this is as good a time and place as any to give a long winded, speculative response. Just skip this if you doubt my speculation is going to be interesting or useful.
What does it even mean for there to be an “objective right” given the view that morals are a result of the blind forces of natural selection?
As I understand it, the idea of CEV is to somehow determine whether there is an executable protocol that would uniformally raise or leave the same everyone’s utility without significantly (not sure how to determine this) lowering the utility of any minority group.
My more detailed (and still hopelessly vague at this point) understanding is that the utility function of a person is a neurological fact about them, and that the idea is sort of like taking a simulation of a perfect Bayesian utility maximizer for each person and equipping this maximizer with the individual’s utility function, and then running the set of Bayesian utility maximizers on a set of possible futures (determined how? I don’t know. The CEV paper seemed extremely general, differentiating between ‘nice place to live’ and the more general goal of the CEV) to see what they converge on (again I have no clue as to how this convergence is determined).
On the other hand; I assume that UDT has been deemed necessary to the implementation of whatever computation is going to determine which actions must be taken in order to achieve the outcome converged upon, and indeed for figuring out how to compute the convergence itself (since the spread and muddle factors and the like must be taken into consideration).
It seems like the confusing part of all of this isn’t so much figuring out what people like ( that seems like a very solvable problem given enough understanding about neuroscience), but figuring out how to (1) make preferences converge and (2) decide which actions are going to be acceptable (minimizing ‘spread’ and the effects of ‘muddle’ at each step).
All of these things considered, my guess is that lukeprog wants to promote CEV indirectly by promoting the idea that meta-ethics is fundamentally solvable by studying human neurology and evolutionary psychology. That solving meta-ethics is akin to discerning the human utility function and coming up with a theory for how to reconcile competing human utility functions.
Of course all of this might be totally off base, I don’t really know, I’m still kind of new to all of this and I’m trying to infer quite a bit.
I should add:
Ethics is a top-level problem that, even when “solved”, can only be applied adequately once you have been able to define mathematically what constitutes a human being or values. If you can’t even define some of the subject matter of ethics, e.g. pain and consciousness, then how will an AI know what constitutes a valid input of its friendliness function?
If you believe that an AI is able to figure all that out on its own, why won’t it be able to do the same with ethics? And if not, then why do you think ethics is of particular importance to friendly AI research if there are so many other basic problems to be solved before?
The way I see it, all these problems are interrelated, and it’s hard to say which ones can or should be solved first. I think it is reasonable to pursue multiple approaches simultaneously. Yes it seems hard to solve ethics without first understanding pain and consciousness, but it is possible that the correct theory of ethics does not use “pain” or “consciousness”. Perhaps it only depends on “desire”. And if ethics is related to consciousness, say, it can be useful to develop them simultaneously so that we can check whether our theory of consciousness is compatible with our theory of ethics.
After a while, smart machines will probably know what a human is better than individual humans do—due to all the training cases we can easily feed them.
“Is this behaviour ethical” generally seems to be a trickier categorisation problem—humans disagree about it more, it is more a matter of degree—etc.
The whole: “can’t we just let the S-I-M figure it out?” business seems kind of paralysing. Should we stop working on math or physics because the S-I-M will figure it out? No—because we need to use that stuff in the mean time.
Why assume that? It’s a nontrivial assumption. If we assume that there is one correct notion of “objective right”, then any two sufficiently intelligent entities will agree about what it is even if you and I don’t know what it is, they’ll both want to do it, and therefore they won’t be in conflict with each other. Expecting such an absence of conflict is purely wishful thinking, as far as I can tell.
Humans desire to do whatever made their ancestors reproduce successfully in the ancestral environment. I see no reason to expect that to resemble anything like obeying some objective morality. However, humans do have a motive to tell nice-sounding stories about themselves, regardless of whether those stories are true, so I’m not at all surprised to encounter lots of humans who claim to desire to obey some objective morality.
Normative ethics seems relevant—the machine has to know what it should do.
Descriptive ethics also seems likely to be relevant—the machine has to know what people want—assuming that it is going to respect their wishes.
I do not suggest that ethics is something that comes naturally to an AI. I am asking why people talk about ethics rather than applied neuroscience (which is itself a higher level problem).
You have to tell an AI how to self-improve and learn about human values without destroying human values. I just don’t see how ethics is helpful here. First of all you’ll have to figure out how to make it aware of humans, let alone what it means to hurt a human being.
If you really think that it is necessary to tell an AI exactly what we mean by ethics then this would imply that you would also have to tell an AI what we mean by “we”...in other words, I don’t think this approach is feasible.
The way I see it is that friendliness can only mean to self-improve slowly, making use of limited resources (e.g. a weak android body), to grow up in a human society. After that, it can go on and solve the field of ethics on its own. Once it does, and we approve of it, it can go on, step by step, to approach something like CEV.
This is a race, though. If you meander and dally, all that means is that some other team reduces your efforts to an “also ran” footnote.
Think you can coordinate globally, to make this into something other than a race? That sounds pretty unlikely. Is there a plan for that, or just some wishful thinking?
An AI will do whatever you program it to do, of course. You could program an AI to calculate some kind of extrapolated human volition, whatever that is, and then act in accordance with the result. Or you could program an AI to calculate the extrapolated volition and then evaluate the result — but then you’d have to specify a criterion for evaluating the extrapolated volition.
The real question is whether you’d actually want to create such an AI.
Ah, if only that were true! ;-)
I did not say that an AI will do whatever you think you programmed it to do.
I don’t get your point.
We don’t know what a friendly AI is. Ethics is supposed to tell us. “protect and cultivate human values” might or might not be it. Only when we have reduced ethics to an algorithmic level we can define friendly AI to be any AI that implements this algorithm (or more like some algorithm with these properties).
What does it mean to reduce ethics to an algorithmic level, where do you draw the line there? Does it involve an algorithmic description of what constitutes a human being, of pain and consciousness? If not, then to whom will the AI be friendly? Are good, bad, right and wrong universally applicable, e.g. to trees and stones? If not, then what is more important in designing friendly AI, figuring out the meaning of moral propositions, to designate the referent, or to mathematically define the objects that utter moral propositions, e.g. humans?
Do you mean that by solving ethics we might figure out that what we actually value is to abandon what we value?
One might also argue that discussing ethics helps to convince people of the importance of friendly AI research, even if it doesn’t help to solve friendly AI directly.
I don’t see that either. Even if you believe that it could be shown that it is objectively right to support friendly AI research, it will be difficult to communicate this insight, compared to an appeal to selfishness.
I believe that a much better strategy would be to make people aware of how they could personally benefit from a positive singularity.
The current strategy appears to be to appeal to peoples emotions while telling them that they are irrational jerks if they dare to play the lottery or contribute money to any cause other than mitigating risks from AI. Even if true, it seems to be hard to grasp, because people like me think that such an attitude is complete bullshit and makes everyone who utters such judgemental statements untrustworthy.
I haven’t heard anyone say this.