paulfchristiano comments on Updatelessness and Son of X

paulfchristiano 12 Jan 2017 5:36 UTC
LW: 3 AF: 2
AF
Prior to working more on decision theory or thin priors, it seems worth clearly fleshing out the desiderata that a naive account (say a task/act-based AI that uses CDT) fails to satisfy.

You say: “[Son of X] is very opaque. There is an extra level of indirection, where you can’t just directly reason about what agent X will do. Instead have to reason about what agent X will modify into, which gives you a new agent, which you probably understand much less than you understand agent X, and then you have to reason about what that new agent will do. Second, it is unmotivated. If you had a good reason to like Son of X, you would probably not be calling it Son of X”

But if we trust X’s decisions, then shouldn’t we trust its decision to replace itself with Son of X? And if we don’t trust X’s decisions, aren’t we in trouble anyway? Why do we need to reason about Son of X, any more than we need to reason about the other ways in which the agent will self-modify, or even other decisions the agent will make?

I agree this makes thinking about Son of X unsatisfying, we should just be thinking about X. But I don’t see why this makes building X problematic. I’m not sure if you are claiming it does, but other people seem to believe something like that, and I thought I’d respond here.

I agree that there is a certain perspective on which this situation is unsatisfying. And using a suboptimal decision theory / too thick a logical prior in the short term will certainly will involve some cost (this is a special case of my ongoing debate with Wei Dai about the urgency of philosophical problems). But it seems to me that it is way less pressing than other possible problems—e.g. how to do aligned search or aligned induction in the regime with limited unlabelled data.

These other problems: (a) seem like they kill us by default in a much stronger sense, (b) seem necessary on both MIRI agendas as well as my agenda, and I suspect are generally going to be necessary, (c) seem pretty approachable, (d) don’t really seem to be made easier by progress on decision theory, logical induction, on the construction of a thin prior, etc.

(Actually I can see how a good thin prior would help with the induction problem, but it leads to a quite different perspective and a different set of desiderata.)

On the flip side, I think there is a reasonably good chance that problems like decision theory will be obsoleted by a better understanding of how to build task/act-based AI (and I feel like for the most part people haven’t convincingly engaged with those arguments).