Alternatively, construct a distribution over actions such that each action has measure according to some function of its e.g. attainable utility impact penalty (normalized appropriately, of course). Seems like a potential way to get a mild optimizer which is explicitly low-impact and doesn’t require complicated models of humans.
it’s also mild on the inside of the algorithm, not just in its effects on the world. this could avert problems with inner optimizers. beyond that, I haven’t thought enough about the behavior of the agent. I might reply with another comment.
Alternatively, construct a distribution over actions such that each action has measure according to some function of its e.g. attainable utility impact penalty (normalized appropriately, of course). Seems like a potential way to get a mild optimizer which is explicitly low-impact and doesn’t require complicated models of humans.
What advantages do you think this has compared to vanilla RL on U + AUP_Penalty?
it’s also mild on the inside of the algorithm, not just in its effects on the world. this could avert problems with inner optimizers. beyond that, I haven’t thought enough about the behavior of the agent. I might reply with another comment.