Wei Dai comments on Work on Security Instead of Friendliness?

Wei Dai 23 Jul 2012 13:09 UTC
3 points
In my original UDT post, I suggested

In this case, we’d need to program the AI with preferences over all mathematical structures, perhaps represented by an ordering or utility function over conjunctions of well-formed sentences in a formal set theory.

Of course there are enormous philosophical and technical problems involved with this idea, but given that it has more or less guided all subsequent decision theory work by our community (except possibly work within SI that I’ve not seen), Vaniver’s characterization of how much the domain of the utility function is underspecified (“Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality?”) is just wrong.
- Vladimir_Nesov 23 Jul 2012 14:46 UTC
  3 points
  Parent
  Right, preference over possible logical consequences of given situations is a strong unifying principle. We can also take physical world to be a certain collection of mathematical structures, possibly heuristically selected based on observations according with being controllable and morally relevant in a tractable way.
  
  The tricky thing is that we are not choosing a structure among some collection of structures (a preferred possible world from a collection of possible worlds), but instead we are choosing which properties a given fixed class of structures will have, or alternatively we are choosing which theories/definitions are consistent or inconsistent, which defined classes of structures exist vs. don’t exist. Since the alternatives that are not chosen are therefore made inconsistent, it’s not clear how to understand them as meaningful possibilities, they are the mysterious logically impossible possible worlds. And there we have it, the mystery of the domain of preference.
- private_messaging 23 Jul 2012 14:12 UTC
  −7 points
  Parent
  Well, the publicly visible side of work does not seem to refer to specifically this, hence it is not so specified.
  
  With regards to your idea, if you have to make that sort of preference function to really motivate the AI in sufficiently general manner that it may just be hell bent on killing everyone, I think the ‘risk’ is very far fetched exactly per my suspicion that viable super-intelligent UFAI is much too hard to worry about it (while all sorts of AIs that work on well specified mathematical problems would be much much simpler) . If it is the only way to motivate AI that is effective over evolution from seed to super-intelligence, then the only people who are working on something like UFAI are the FAI crowd. Keep in mind that if I want to cure cancer via the ‘software tools’ route and I am not signed up for cryonics then I’ll just go for the simplest solution that works, which will be some sort of automated reasoning over formal systems (not over real world states). Especially as the general AI would require the technologies from the former anyway.