[deleted] comments on Debunking Fallacies in the Theory of AI Motivation

[deleted] 5 May 2015 21:48 UTC
−1 points
No, as I just explained to SimonF, below, that is not what it is about.

I will repeat what I said:

The paper’s goal is not to discuss “basic UFAI doomsday scenarios” in the general sense, but to discuss the particular case where the AI goes all pear-shaped EVEN IF it is programmed to be friendly to humans.

That last part (even if it is programmed to be friendly to humans) is the critical qualifier that narrows down the discussion to those particular doomsday scenarios in which the AI does claim to be trying to be friendly to humans—it claims to be maximizing human happiness—but in spite of that it does something insanely wicked.

So, you said:

The basic UFAI doomsday scenario is: the AI has vast powers of learning and inference with respect to its world-model, but has its utility function (value system) hardcoded. Since the hardcoded utility function does not specify a naturalization of morality, or CEV, or whatever, the UFAI proceeds to tile the universe in whatever it happens to like (which are things we people don’t like), precisely because it has no motivation to “fix” its hardcoded utility function

… and this clearly says that the type of AI you have in mind is one that is not even trying to be friendly. Rather, you talk about how its

hardcoded utility function does not specify a naturalization of morality, or CEV, or whatever

And then you add that

the UFAI proceeds to tile the universe in whatever it happens to like

… which has nothing to do with the cases that the entire paper is about, namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.
- [deleted] 6 May 2015 2:22 UTC
  0 points
  Parent
  There is very little distinction, from the point of view of actual behaviors, between a supposedly-Friendly-but-actually-not AI, and a regular UFAI. Well, maybe the former will wait a bit longer before its pathological behavior shows up. Maybe. I really don’t want to be the sorry bastard who tries that experiment: it would just be downright embarrassing.
  
  But of course, the simplest way to bypass this is precisely to be able to, as previously mentioned in my comment and by nearly all authors on the issue, specify the utility function as the outcome of an inference problem, thus ensuring that additional interaction with humans causes the AI to update its utility function and become Friendlier with time.
  
  Causal inference that allows for deliberate conditioning of distributions on complex, counterfactual scenarios should actually help with this. Causal reasoning does dissolve into counterfactual reasoning, after all, so rational action on evaluative criteria can be considered a kind of push-and-pull force acting on an agent’s trajectory through the space of possible histories: undesirable counterfactuals push the agent’s actions away (ie: push the agent to prevent their becoming real), while desirable counterfactuals pull the agent’s actions towards themselves (ie: the agent takes actions to achieve those events as goals) :-p.