Ah, sorry: the way I used it in the paper, it’s my own coinage, meant to evoke the traditional usage. When I say there a large “mathematical body of work,” I mean abstract algebra for symmetry, classical machine learning for the usual meaning of “regularizer,” and indirectly the work on complex systems theory, attractor theory, control theory, etc. I created my own meaning of the word “regularizer” because I have a philosophical intuition that the concept in traditional machine learning is generalizable, perhaps by something like a “category of regularizers” (to borrow from category theory) that can be described as energy landscapes that force some processes to approximate others, where these landscapes can be specified according to their symmetries (sort of like how periodic functions can be decomposed into sinusoids via fourier transforms, or how datasets can be broken into principle components).
Epistemic status of these ideas: just the hunch of someone with a B.S. in theoretical math and a penchant for philosophy.
The hope is that someone can read the paper and be like, “Aha! This was just the reframing I needed to make progress on this reinforcement learning problem I was stuck on.” But if it’s not useful to you, it’s not useful. I wouldn’t mind knowing, so I can do better in the future, whether or not any of the 6 alignment strategies listed at the end sounded even remotely like useful approaches to the value alignment problem.
Thanks! I knew people had essentially devised these ideas before (and if they had instantly worked we would have solved FAI already), but think there is something to be gained via a reinterpretation of the ideas in the RRM. For example, if the human value function derives from discoverable symmetries of neural structure and external environment, then we can do the work to discover these and directly impose them in the agent architecture. And I think the statement I just made is not trivially equivalent to telling people “find human rewards and put them in the agent” (which is literally the whole FAI problem all over again). Symmetry is an empirically discoverable property and also a strong constraint for optimization purposes. Under symmetric constraints the agent still needs to learn human values, but may have an easier time of it. Anyway, clearly I’ve not done a great job communicating, and the ideas are all in intuition stage. Maybe in the future I’ll actually try to prove a reinforcement learning theorem using RRM philosophy.