I’ll look into it, thanks! I linked a MIRI paper that attempts to learn the utility function, but I think it mostly kicks the problem down the road—including the true environment as an argument to the utility function seems like the first step in the right direction to me.
I’ll look into it, thanks! I linked a MIRI paper that attempts to learn the utility function, but I think it mostly kicks the problem down the road—including the true environment as an argument to the utility function seems like the first step in the right direction to me.