Dolores1984

Karma: 809

Dolores1984 Sep 28, 2012, 9:14 PM
0 points
in reply to: [deleted]’s comment on: Yale Creates First Self-Aware Robot?
Sure, there’s some ambiguity there, but over adequately large sample sizes, trends become evident. Peer reviewed research is usually pretty good at correcting for confounds that people reading about it think up in the first fifteen minutes.

Dolores1984 Sep 28, 2012, 6:28 PM
16 points
in reply to: TimS’s comment on: Yale Creates First Self-Aware Robot?
Because it correlates with intelligence and seems indicative of deeper trends in animal neurology. Probably not a signpost that carries over to arbitrary robots, though.

Dolores1984 Sep 27, 2012, 11:24 PM
3 points
in reply to: Merkle’s comment on: How Likely Is Cryonics To Work?
If cryonics is not performed extremely quickly, ischemic clotting can seriously inhibit cortical circulation, preventing good perfusion with cryoprotectants, and causing partial information-theoretic death. Being cryopreserved within a matter of minutes is probably necessary, barring a way to quickly improve circulation.

Dolores1984 Sep 27, 2012, 11:16 PM
−5 points
on: From First Principles
I’d like to apologize in advance for marginally lowering the quality of LW discourse. Here we go:

get the tip wet before you stick it in, and don’t worry about position.

That’s what she said.

EDIT: Yeah, that’s fair. Again, sorry. Setup was too perfect.

Dolores1984 Sep 27, 2012, 7:51 PM
8 points
in reply to: MixedNuts’s comment on: Rationality Quotes September 2012
Your idea of provincialism is provincial. The idea of shipping tinned apes around the solar system is the true failure of vision here, nevermind the bag check procedures.

Dolores1984 Sep 20, 2012, 10:17 PM
0 points
in reply to: TheOtherDave’s comment on: Brief Question about FAI approaches
Not quite. It actually replaces it with the problem of maximizing people’s expected reported life satisfaction. If you wanted to choose to try heroin, this system would be able to look ahead, see that that choice will probably drastically reduce your long-term life satisfaction (more than the annoyance at the intervention), and choose to intervene and stop you.

I’m not convinced ‘what’s best for people’ with no asterisk is a coherent problem description in the first place.

Dolores1984 Sep 20, 2012, 5:58 AM
−2 points
in reply to: Mitchell_Porter’s comment on: Brief Question about FAI approaches
By bounded, I simply meant that all reported utilities are normalized to a universal range before being summed. Put another way, every person has a finite, equal fraction of the machine’s utility to distribute among possible future universes. This is entirely to avoid utility monsters. It’s basically a vote, and they can split it up however they like.

Also, the reflexive consistency criteria should probably be applied even to people who don’t exist yet. We don’t want plans to rely on creating new people, then turning them into happy monsters, even if it doesn’t impact the utility of people who already exist. So, basically, modify the reflexive utility criteria to say that in order for positive utility to be reported from a model, all past versions of that model (to some grain) must agree that they are a valid continuation of themselves.

I’ll need to think harder about how to actually implement the approval judgements. It really depends on how detailed the models we’re working with are (i.e. cable of realizing that they are a model). I’ll give it more thought and get back to you.

Dolores1984 Sep 20, 2012, 5:43 AM
0 points
in reply to: Pentashagon’s comment on: Brief Question about FAI approaches
I can think of an infinite utility scenario. Say the AI figures out a way to run arbitrarily powerful computations in constant time. Say it’s utility function is over survival and happiness of humans. Say it runs an infinite loop (in constant time), consisting of a formal system containing implementations of human minds, which it can prove will have some minimum happiness, forever. Thus, it can make predictions about its utility a thousand years from now just as accurately as ones about a billion years from now, or n, where n is an finite number of years. Summing the future utility of the choice to turn on the computer, from zero to infinity, would be an infinite result. Contrived I know, but the point stands.

Dolores1984 Sep 20, 2012, 5:38 AM
0 points
in reply to: TheOtherDave’s comment on: Brief Question about FAI approaches
If we can extract utility in a purer fashion, I think we should. At the bare minimum, it would be much more run-time efficient. That said, trying to do so opens up a whole can of worms of really hard problems. This proposal, provided you’re careful about how you set it up, pretty much dodges all of that, as far as I can tell. Which means we could implement it faster, should that be necessary. I mean, yes, AGI is still very hard problem, but I think this reduces the F part of FAI to a manageable level, even given the impoverished understanding we have right now. And, assuming a properly modular code base, it would not be too difficult to swap out ‘get utility by asking questions’ with ‘get utility by analyzing model directly.’ Actually, the thing might even do that itself, since it might better maximize its utility function.

Dolores1984 Sep 20, 2012, 5:29 AM
−2 points
in reply to: Mitchell_Porter’s comment on: Brief Question about FAI approaches
Reflexively Consistent Bounded Utility Maximizer?

Hrm. Doesn’t exactly roll off the tongue, does it? Let’s just call it a Reflexive Utility Maximizer (RUM), and call it a day. People have raised a few troubling points that I’d like to think more about before anyone takes anything too seriously, though. There may be a better way to do this, although I think something like this could be workable as a fallback plan.

Dolores1984 Sep 20, 2012, 12:15 AM
−2 points
in reply to: Mitchell_Porter’s comment on: Brief Question about FAI approaches
Note the reflexive consistency criterion. That’d only happen if everyone predictable looked at the happy monster and said ‘yep, that’s me, that agent speaks for me.’

Dolores1984 Sep 19, 2012, 11:33 PM
−2 points
in reply to: Nisan’s comment on: Brief Question about FAI approaches
Like I said, that part is tricky to formalize. But, ultimately, it’s an individual choice on the part of the model (and, indirectly, the agent being modeled). I can’t formalize what counts as a valid continuation today, let alone in all future societies. So, leave it up to the agents in question.

As for the racism thing: yeah, so? You would rather we encode our own morality into our machine, so that it will ignore aspects of people’s personality we don’t like? I suppose you could insist that the models behave as though they had access to the entire factual database of the AI (so, at least, they couldn’t be racist simply out of factual inaccuracy), but that might be tricky to implement.

Dolores1984 Sep 19, 2012, 8:44 PM
0 points
in reply to: TheOtherDave’s comment on: Brief Question about FAI approaches
I’m not sure why my above post is being downvoted. Anyways, on to your point.

We don’t know the mechanisms that’re being used to model human beings. They are not necessarily transparently reducible—or, if they are, the AI may not reduce them into the same components that an introspective human does. In the case of neural networks, they are very powerful at matching the outputs of various systems, but if the programmer is asked to explain why the system did a particular behavior, it is usually not possible to provide a satisfactory explanation. Simply because our AI knows that your model will say ‘I don’t want to be wireheaded’ does not mean that it understands all your reasoning on the subject. Defining utility in regards to the states of arbitrary models is a very hard problem—simply putting a question to the model is easy.

Dolores1984 Sep 19, 2012, 6:12 PM
−2 points
in reply to: TheOtherDave’s comment on: Brief Question about FAI approaches
I think you’ve pretty much got it. Basically, instead of trying to figure out a universal morality across humans, you just say ‘okay, fine, people are black boxes whose behavior you can predict, let’s build a system to deal with that black box.’

However, instead of trying to get T to be immune to wireheading, I suggested that we require reflexive consistency—i.e. the model-as-it-is-now should be given a veto vote over predicted future states of itself. So, if the AI is planning to turn you into a barely-sapient happy monster, your model should be able to look at that future and say ‘no, that’s not me, I don’t want to become that, that agent doesn’t speak for me,’ replacing the value of T with zero utility.

EDIT: There’s almost certainly a better way to do it than naively asking the question, but that will suffice for this discussion.

Dolores1984 Sep 19, 2012, 5:29 PM
0 points
in reply to: evand’s comment on: Brief Question about FAI approaches
Pretty sure. Not completely, but it does seem pretty fundamental. You cannot hard code the operation of the universe into an AI, and that means it has to be able to look at symbols going into the universe and symbols coming out, and say ‘okay, what sort of underlying system would produce this behavior’? You can apply the same sort of thing to humans. If it can’t model humans effectively, we can probably kill it.

Dolores1984 Sep 19, 2012, 2:55 PM
−2 points
in reply to: Nisan’s comment on: Brief Question about FAI approaches

I don’t see why average utility would be bounded.

Because this strikes me as a nightmare scenario. Besides, we’re relying on the models to self-report total happiness. Leaving it on an unbounded scale creates incentives for abuse

Asking people how much utility they have won’t give you a utility function because, for one thing, humans don’t have preferences that are consistent with a utility function.

The question would be more like ‘assuming you understand standard deviation units, how satisfied with your life are you right now, in standard deviation units, relative to the average?’ Happy, satisfied people give the machine more utility.

Utilities are determined up to an additive constant and a positive multiplicative constant, so there is no canonical way of comparing utilities between people, so there is no canonical way of averaging utilities.

Okay, but that doesn’t mean you can’t build a machine that maximizes the number of happy people, under these conditions. Calling it utility is just short hand.

I need to go to class right now, but I’ll get into population changes when I get home this evening.

Presumably, the reflective consistency criterion would be something along the lines of ‘hey, model, here’s this other model—does he seem like a valid continuation of you?’ No value judgments involved.

EDIT:

Okay, here’s how you handle agents being created or destroyed in your predicted future. For agents that die, you feed that fact back into the original state of the model, and allow it to determine utility for that state. So, if you want to commit suicide, that’s fine—dying becomes positive utility for the machine.

Creating people is a little more problematic. If new people’s utility is naively added, well, that’s bad. Because then, the fastest way to maximize its utility function is to kill the whole human race, and then start building resource-cheap barely-sapient happy monsters that report maximum happiness all the time. So you need to add a necessary-but-not-sufficient condition that any action taken has to maximize both the utility of all forseeable minds, AND the utility of all minds currently alive. That means that happy monsters are no good (in so far as they eat resources that we’ll eventually need), and it means that Dr. Evil won’t be allowed to make billions of clones of himself and take over the world. This should also eliminate repugnant conclusion scenarios.

Dolores1984 Sep 19, 2012, 2:49 PM
−2 points
in reply to: Mitchell_Porter’s comment on: Brief Question about FAI approaches

To measure the utility (or at least approximate it), you could just ask the models.

I mean, in this case you’re limited by self-deception, but it ought to be a reasonable approximation. I may not know what my personal utility function is, but I do know roughly how satisfied I am with my life right now.

Brief Question about FAI approaches

Dolores1984Sep 19, 2012, 6:05 AM

4 points

42 comments1 min readLW link

Dolores1984 Sep 13, 2012, 7:02 PM
2 points
in reply to: Dolores1984’s comment on: Random LW-parodying Statement Generator
“The Bayesian Conspiracy updates the truth to better fit its priors.”

Dolores1984 Sep 13, 2012, 1:49 AM
8 points
in reply to: FiftyTwo’s comment on: Random LW-parodying Statement Generator

The outside view is what the Bayesian Conspiracy feels like from the inside.

Dolores1984

Brief Ques­tion about FAI approaches

Brief Question about FAI approaches