Thanks! I knew people had essentially devised these ideas before (and if they had instantly worked we would have solved FAI already), but think there is something to be gained via a reinterpretation of the ideas in the RRM. For example, if the human value function derives from discoverable symmetries of neural structure and external environment, then we can do the work to discover these and directly impose them in the agent architecture. And I think the statement I just made is not trivially equivalent to telling people “find human rewards and put them in the agent” (which is literally the whole FAI problem all over again). Symmetry is an empirically discoverable property and also a strong constraint for optimization purposes. Under symmetric constraints the agent still needs to learn human values, but may have an easier time of it. Anyway, clearly I’ve not done a great job communicating, and the ideas are all in intuition stage. Maybe in the future I’ll actually try to prove a reinforcement learning theorem using RRM philosophy.
Thanks! I knew people had essentially devised these ideas before (and if they had instantly worked we would have solved FAI already), but think there is something to be gained via a reinterpretation of the ideas in the RRM. For example, if the human value function derives from discoverable symmetries of neural structure and external environment, then we can do the work to discover these and directly impose them in the agent architecture. And I think the statement I just made is not trivially equivalent to telling people “find human rewards and put them in the agent” (which is literally the whole FAI problem all over again). Symmetry is an empirically discoverable property and also a strong constraint for optimization purposes. Under symmetric constraints the agent still needs to learn human values, but may have an easier time of it. Anyway, clearly I’ve not done a great job communicating, and the ideas are all in intuition stage. Maybe in the future I’ll actually try to prove a reinforcement learning theorem using RRM philosophy.