That our most promising approach for how to define “utility function” gives at least fairly clear conceptual guidance as to the domain
Given that the standard of being “fairly clear” is rather vague, I don’t know if I disagree, but at the moment I don’t know of any approach to a potentially FAI-grade notion of preference of any clarity. Utility functions seem to be a wrong direction, since they don’t work in the context of the idea of control based on resolution of logical uncertainty (structure). (UDT’s “utility function” is more of a component of definition of something that is not a utility function.)
ADT utility value (which is a UDT-like goal definition) is somewhat formal, but only applies to toy examples, it’s not clear what it means even in these toy examples, it doesn’t work at all when there is uncertainty or incomplete control over that value on part of the agent, and I have no idea how to treat physical world in its context. (It also doesn’t have any domain, which seems like a desirable property for a structuralist goal definition.) This situation seems like the opposite of “clear” to me...
In this case, we’d need to program the AI with preferences over all mathematical structures, perhaps represented by an ordering or utility function over conjunctions of well-formed sentences in a formal set theory.
Of course there are enormous philosophical and technical problems involved with this idea, but given that it has more or less guided all subsequent decision theory work by our community (except possibly work within SI that I’ve not seen), Vaniver’s characterization of how much the domain of the utility function is underspecified (“Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality?”) is just wrong.
Right, preference over possible logical consequences of given situations is a strong unifying principle. We can also take physical world to be a certain collection of mathematical structures, possibly heuristically selected based on observations according with being controllable and morally relevant in a tractable way.
The tricky thing is that we are not choosing a structure among some collection of structures (a preferred possible world from a collection of possible worlds), but instead we are choosing which properties a given fixed class of structures will have, or alternatively we are choosing which theories/definitions are consistent or inconsistent, which defined classes of structures exist vs. don’t exist. Since the alternatives that are not chosen are therefore made inconsistent, it’s not clear how to understand them as meaningful possibilities, they are the mysterious logically impossible possible worlds. And there we have it, the mystery of the domain of preference.
Well, the publicly visible side of work does not seem to refer to specifically this, hence it is not so specified.
With regards to your idea, if you have to make that sort of preference function to really motivate the AI in sufficiently general manner that it may just be hell bent on killing everyone, I think the ‘risk’ is very far fetched exactly per my suspicion that viable super-intelligent UFAI is much too hard to worry about it (while all sorts of AIs that work on well specified mathematical problems would be much much simpler) . If it is the only way to motivate AI that is effective over evolution from seed to super-intelligence, then the only people who are working on something like UFAI are the FAI crowd. Keep in mind that if I want to cure cancer via the ‘software tools’ route and I am not signed up for cryonics then I’ll just go for the simplest solution that works, which will be some sort of automated reasoning over formal systems (not over real world states). Especially as the general AI would require the technologies from the former anyway.
Given that the standard of being “fairly clear” is rather vague, I don’t know if I disagree, but at the moment I don’t know of any approach to a potentially FAI-grade notion of preference of any clarity. Utility functions seem to be a wrong direction, since they don’t work in the context of the idea of control based on resolution of logical uncertainty (structure). (UDT’s “utility function” is more of a component of definition of something that is not a utility function.)
ADT utility value (which is a UDT-like goal definition) is somewhat formal, but only applies to toy examples, it’s not clear what it means even in these toy examples, it doesn’t work at all when there is uncertainty or incomplete control over that value on part of the agent, and I have no idea how to treat physical world in its context. (It also doesn’t have any domain, which seems like a desirable property for a structuralist goal definition.) This situation seems like the opposite of “clear” to me...
In my original UDT post, I suggested
Of course there are enormous philosophical and technical problems involved with this idea, but given that it has more or less guided all subsequent decision theory work by our community (except possibly work within SI that I’ve not seen), Vaniver’s characterization of how much the domain of the utility function is underspecified (“Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality?”) is just wrong.
Right, preference over possible logical consequences of given situations is a strong unifying principle. We can also take physical world to be a certain collection of mathematical structures, possibly heuristically selected based on observations according with being controllable and morally relevant in a tractable way.
The tricky thing is that we are not choosing a structure among some collection of structures (a preferred possible world from a collection of possible worlds), but instead we are choosing which properties a given fixed class of structures will have, or alternatively we are choosing which theories/definitions are consistent or inconsistent, which defined classes of structures exist vs. don’t exist. Since the alternatives that are not chosen are therefore made inconsistent, it’s not clear how to understand them as meaningful possibilities, they are the mysterious logically impossible possible worlds. And there we have it, the mystery of the domain of preference.
Well, the publicly visible side of work does not seem to refer to specifically this, hence it is not so specified.
With regards to your idea, if you have to make that sort of preference function to really motivate the AI in sufficiently general manner that it may just be hell bent on killing everyone, I think the ‘risk’ is very far fetched exactly per my suspicion that viable super-intelligent UFAI is much too hard to worry about it (while all sorts of AIs that work on well specified mathematical problems would be much much simpler) . If it is the only way to motivate AI that is effective over evolution from seed to super-intelligence, then the only people who are working on something like UFAI are the FAI crowd. Keep in mind that if I want to cure cancer via the ‘software tools’ route and I am not signed up for cryonics then I’ll just go for the simplest solution that works, which will be some sort of automated reasoning over formal systems (not over real world states). Especially as the general AI would require the technologies from the former anyway.