The types of decision utility functions that we can define precisely for an AI are exactly the kind that we absolutely do not want—namely the class of model-free reward functions. That works for training an agent to play atari games based on a score function provided by the simulated environment, but it just doesn’t scale to the real world which doesn’t come with a convenient predefined utility function.
For AGI, we need a model based utility function, which maps internal world states to human relevant utility values. As the utility function is then dependent on the AGI’s internal predictive world model, you would then need to rigorously define the AGI’s entire world model. That appears to be a rather hopelessly naive dead end. I’m not aware of any progress or research that indicates that approach is viable. Are you?
Instead all current research progress trends strongly indicate that the first practical AGI designs will be based heavily on inferring human values indirectly. Proving safety for alternate designs—even if possible—has little value if those results do not apply to the designs which will actually win the race to superintelligence.
Also—there is a whole math research tract in machine learning concerned with provable bounds on loss and prediction accuracy—so it’s not simply true that using machine learning techniques to infer human utility functions necessitates ‘heuristics’ ungrounded in any formal analysis.
The types of decision utility functions that we can define precisely for an AI are exactly the kind that we absolutely do not want—namely the class of model-free reward functions. That works for training an agent to play atari games based on a score function provided by the simulated environment, but it just doesn’t scale to the real world which doesn’t come with a convenient predefined utility function.
For AGI, we need a model based utility function, which maps internal world states to human relevant utility values. As the utility function is then dependent on the AGI’s internal predictive world model, you would then need to rigorously define the AGI’s entire world model. That appears to be a rather hopelessly naive dead end. I’m not aware of any progress or research that indicates that approach is viable. Are you?
Instead all current research progress trends strongly indicate that the first practical AGI designs will be based heavily on inferring human values indirectly. Proving safety for alternate designs—even if possible—has little value if those results do not apply to the designs which will actually win the race to superintelligence.
Also—there is a whole math research tract in machine learning concerned with provable bounds on loss and prediction accuracy—so it’s not simply true that using machine learning techniques to infer human utility functions necessitates ‘heuristics’ ungrounded in any formal analysis.