Is Empowerment a great way to quantify alignment (as expected information gain in terms of the mutual information between actions and future states)? I’m not sure how to get from A to B, but presumably one can measure conditional empowerment of some trajectories of some set of agents in terms of the amount of extra self-control imparted to the empowered agents by virtue of their interaction with the empowering agent. Perhaps the CATE (Conditional Average Treatment Effect) for various specific interventions would be more bite-sized than trying to measure the whole enchilada!
Is Empowerment a great way to quantify alignment (as expected information gain in terms of the mutual information between actions and future states)? I’m not sure how to get from A to B, but presumably one can measure conditional empowerment of some trajectories of some set of agents in terms of the amount of extra self-control imparted to the empowered agents by virtue of their interaction with the empowering agent. Perhaps the CATE (Conditional Average Treatment Effect) for various specific interventions would be more bite-sized than trying to measure the whole enchilada!