I’m a big fan of this, conceptually (will read the paper tomorrow morning). Attainable utility preservation is secretly trying to preserve human power. As a nitpick, though, they should probably approximate “average goal achievement ability” instead of empowerment (for formal reasons outlined in Appendix A of Optimal Farsighted Agents Tend to Seek Power).
As I’ve written previously, if we could build competitive agents which reliably increased human control-over-the-future, I think that would be pretty damn good. Don’t worry about CEV for now—let’s just get into a stable future.
But, getting accurate models of humans seems difficult, and human power is best measured with respect to the policies which our cognitive algorithms can actually discover (I recently gave a curated talk on this—transcript coming soon). Assuming optimality could create weird incentives, but maybe the paper has something to say about that.
All in all, I don’t feel optimistic about AvE-like approaches actually scaling to superhuman, if they need to explicitly pick out a human from the environment.
I’m a big fan of this, conceptually (will read the paper tomorrow morning). Attainable utility preservation is secretly trying to preserve human power. As a nitpick, though, they should probably approximate “average goal achievement ability” instead of empowerment (for formal reasons outlined in Appendix A of Optimal Farsighted Agents Tend to Seek Power).
As I’ve written previously, if we could build competitive agents which reliably increased human control-over-the-future, I think that would be pretty damn good. Don’t worry about CEV for now—let’s just get into a stable future.
But, getting accurate models of humans seems difficult, and human power is best measured with respect to the policies which our cognitive algorithms can actually discover (I recently gave a curated talk on this—transcript coming soon). Assuming optimality could create weird incentives, but maybe the paper has something to say about that.
All in all, I don’t feel optimistic about AvE-like approaches actually scaling to superhuman, if they need to explicitly pick out a human from the environment.