I think the IRL problem statement is doomed if it insists on being about human values, but it might be useful as an intentional stance, formulating agent-like structures based on their influence. For this to work out, it needs to expect many agents and not just a single agent, separating their influences, and these agents could be much smaller/​simpler than humans, like shards of value or natural concepts. Only once a good map of the world that takes an intentional stance is available, is it time to look at it in search for ingredients for an aligned agent.
I think the IRL problem statement is doomed if it insists on being about human values, but it might be useful as an intentional stance, formulating agent-like structures based on their influence. For this to work out, it needs to expect many agents and not just a single agent, separating their influences, and these agents could be much smaller/​simpler than humans, like shards of value or natural concepts. Only once a good map of the world that takes an intentional stance is available, is it time to look at it in search for ingredients for an aligned agent.