And yes, this criticism applies extremely strongly to my own past work with attainable utility preservation and impact measures. (Unfortunately, I learned my lesson after, and not before, making certain mistakes.)
Actually, this is somewhat too uncharitable to my past self. It’s true that I did not, in 2018, grasp the two related lessons conveyed by the above comment:
Make sure that the formalism (CIRL, AUP) is tightly bound to the problem at hand (value alignment, “low impact”), and not just supported by “it sounds nice or has some good properties.”
Don’t randomly jump to highly specific ideas and questions without lots of locating evidence.
I think what gets you is asking the question “what things are impactful?” instead of “why do I think things are impactful?”. Then, you substitute the easier-feeling question of “how different are these world states?”. Your fate is sealed; you’ve anchored yourself on a Wrong Question.
Actually, this is somewhat too uncharitable to my past self. It’s true that I did not, in 2018, grasp the two related lessons conveyed by the above comment:
Make sure that the formalism (CIRL, AUP) is tightly bound to the problem at hand (value alignment, “low impact”), and not just supported by “it sounds nice or has some good properties.”
Don’t randomly jump to highly specific ideas and questions without lots of locating evidence.
However, in World State is the Wrong Abstraction for Impact, I wrote:
I had partially learned lesson #2 by 2019.