(1) I am unsure whether there exists an idealized reasoner analogous to a Carnot engine (see Realism about rationality). Even if such a reasoner exists, it seems unlikely that we will a) figure out what it is, b) understand it in sufficient depth, and c) successfully use it to understand and improve ML techniques, before we get powerful AI systems through other means. Under short timelines, this cuts particularly deeply, because a) there’s less time to do all of these things and b) it’s more likely that advanced AI is built out of “messy” deep learning systems that seem less amenable to this sort of theoretical understanding.
(2) I certainly agree that all else equal, advanced agents should act closer to ideal agents. (Assuming there is such a thing as an ideal agent.) I also agree that advanced AI should be less susceptible to money pumps, from which I learn that their “preferences” (i.e. world states that they work to achieve) are transitive. I’m also on board that more advanced AI systems are more likely to be described by some utility function that they are maximizing the expected utility of, per the VNM theorem. I don’t agree that the utility function must be simple, or that the AI must be internally reasoning by computing the expected utility over all actions and then choosing the one that’s highest. I would be extremely surprised if we built powerful AI such that when we say the English sentence “make paperclips” it acts in accordance with the utility function U(universe history) = number of paperclips in the last state of the universe history. I would be very surprised if we built powerful AI such that we hardcode in the above utility function and then design the AI to maximize its expected value.
The Value Learning sequence expands on position 2, especially in Chapter 2. The conclusion is a short version of it, but still longer than the parent comment.
(1) I am unsure whether there exists an idealized reasoner analogous to a Carnot engine (see Realism about rationality). Even if such a reasoner exists, it seems unlikely that we will a) figure out what it is, b) understand it in sufficient depth, and c) successfully use it to understand and improve ML techniques, before we get powerful AI systems through other means. Under short timelines, this cuts particularly deeply, because a) there’s less time to do all of these things and b) it’s more likely that advanced AI is built out of “messy” deep learning systems that seem less amenable to this sort of theoretical understanding.
(2) I certainly agree that all else equal, advanced agents should act closer to ideal agents. (Assuming there is such a thing as an ideal agent.) I also agree that advanced AI should be less susceptible to money pumps, from which I learn that their “preferences” (i.e. world states that they work to achieve) are transitive. I’m also on board that more advanced AI systems are more likely to be described by some utility function that they are maximizing the expected utility of, per the VNM theorem. I don’t agree that the utility function must be simple, or that the AI must be internally reasoning by computing the expected utility over all actions and then choosing the one that’s highest. I would be extremely surprised if we built powerful AI such that when we say the English sentence “make paperclips” it acts in accordance with the utility function U(universe history) = number of paperclips in the last state of the universe history. I would be very surprised if we built powerful AI such that we hardcode in the above utility function and then design the AI to maximize its expected value.
The Value Learning sequence expands on position 2, especially in Chapter 2. The conclusion is a short version of it, but still longer than the parent comment.