The reason an unlosing agent might be interesting is that it doesn’t have to have its values specified as a collection of explicit utility functions. It could instead have some differently specified system that converges to explicit utility functions as it gets more morally relevant data. Then an unlosing procedure would keep it unexploitable during this process.
In practice, I think requiring a value-loading agent to be unlosing might be too much of a requirement, as it might lock in some early decisions. I see a “mainly unlosing” agent as being more interesting—say an imperfect value loading agent with some unlosing characteristics—as being potentially safer.
The reason an unlosing agent might be interesting is that it doesn’t have to have its values specified as a collection of explicit utility functions. It could instead have some differently specified system that converges to explicit utility functions as it gets more morally relevant data. Then an unlosing procedure would keep it unexploitable during this process.
In practice, I think requiring a value-loading agent to be unlosing might be too much of a requirement, as it might lock in some early decisions. I see a “mainly unlosing” agent as being more interesting—say an imperfect value loading agent with some unlosing characteristics—as being potentially safer.