TurnTrout comments on What You See Isn’t Always What You Want

TurnTrout 15 Sep 2019 6:39 UTC
LW: 3 AF: 2
AF

They’re definitely not sufficient, almost certainly. A full fledged diamond maximizer would need far more machinery, if only to do the maximization and properly learn the representation.

Clarification: I meant (but inadequately expressed) “do you think any reasonable extension of these kinds of ideas could get what we want?” Obviously, it would be a quite unfair demand for rigor to demand whether we can do the thing right now.

Thanks for the great reply. I think the remaining disagreement might boil down to the expected difficulty of avoiding Goodhart here. I do agree that using representations is a way around this issue, and it isn’t the representation learning approach’s job to simultaneously deal with Goodharting.
- FactorialCode 15 Sep 2019 17:55 UTC
  LW: 1 AF: 1
  AF Parent
  
  do you think any reasonable extension of these kinds of ideas could get what we want?
  
  Conditional on avoiding Goodhart, I think you could probably get something that looks a lot like a diamond maximiser. It might not be perfect, the situation with the “most diamond” might not be the maximum of it’s utility function, but I would expect the maximum of it’s utility function will still contain a very large amount of diamond. For instance, depending on the representation, and the way the programmers baked in the utilty function, it might have a quirk in it’s utility function of only recognizing something as a diamond if it’s stereotypically “diamond shaped”. This would bar it from just building pure carbon planets to achieve it’s goal.
  
  IMO, you’d need something else outside of the ideas presented to get a “perfect” diamond maximizer.