Jack R comments on Why GPT wants to mesa-optimize & how we might change this

Jack R 3 Mar 2021 22:42 UTC
3 points
Anyway, the question here isn’t whether lookahead will be perfectly accurate, but whether the post-lookahead distribution of next words will allow for improvement over the pre-lookahead distribution.
Can you say a bit more about why you only need look-ahead to improve performance? SGD favors better improvements over worse improvements—it feels like I could think of many programs that are improvements but which won’t be found by SGD. Maybe you would say there don’t seem to be any improvements that are this good and this seemingly easy for SGD to find?
- John_Maxwell 4 Mar 2021 20:01 UTC
  2 points
  Parent
  From a safety standpoint, hoping and praying that SGD won’t stumble across lookahead doesn’t seem very robust, if lookahead represents a way to improve performance. I imagine that whether SGD stumbles across lookahead will end up depending on complicated details of the loss surface that’s being traversed.
  - Jack R 5 Mar 2021 2:58 UTC
    1 point
    Parent
    I agree, and thanks for the reply. And I agree that even a small chance of catastrophe is not robust. Though I asked because I still care about the probability of things going badly, even if I think that probability is worryingly high. Though I see now (thanks to you!) that in this case our prior that SGD will find look-ahead is still relatively high and that belief won’t change much by thinking about it more due to sensitivity to complicated details we can’t easily know.