RobertKirk comments on Alignment allows “nonrobust” decision-influences and doesn’t require robust grading

RobertKirk 29 Nov 2022 13:27 UTC
LW: 3 AF: 3
2
AF

Question: How do we train an agent which makes lots of diamonds, without also being able to robustly grade expected-diamond-production for every plan the agent might consider?

I thought you were about to answer this question in the ensuing text, but it didn’t feel like to me you gave an answer. You described the goal (values-child), but not how the mother would produce values-child rather than produce evaluation-child. How do you do this?
- TurnTrout 29 Nov 2022 16:56 UTC
  LW: 3 AF: 3
  0
  AF Parent
  Ah, I should have written that question differently. I meant to ask “If we cannot robustly grade expected-diamond-production for every plan the agent might consider, how might we nonetheless design a smart agent which makes lots of diamonds?”
  How do you do this?
  Anyways, we might train a diamond-values agent like this.