TurnTrout comments on Optimization Regularization through Time Penalty

TurnTrout 1 Jan 2019 16:51 UTC
LW: 3 AF: 2
AF
I like this line of thought overall.

• How would we safely set lambda?

• Isn’t it still doing an argmax over plans and T, making the internal optimization pressure very non-mild? If we have some notion of embedded agency, one would imagine that doing the argmax would be penalized, but it’s not clear what kind of control the agent has over its search process in this case.

But a value neutral impact measure is almost impossible, because the world has too many degrees of freedom.

Can you explain why you think something like AUP requires value-laden inputs?