TurnTrout comments on Alignment allows “nonrobust” decision-influences and doesn’t require robust grading

TurnTrout 29 Nov 2022 16:56 UTC
LW: 3 AF: 3
0
AF
Ah, I should have written that question differently. I meant to ask “If we cannot robustly grade expected-diamond-production for every plan the agent might consider, how might we nonetheless design a smart agent which makes lots of diamonds?”
How do you do this?
Anyways, we might train a diamond-values agent like this.