I discuss the possibility of it going in some other direction when I say “The two most salient options to me”. But the bit of Evan’s post that this contradicts is:
Now, if the model gets to the point where it’s actually just failing because of this, then gradient descent will probably just remove that check—but the trick is never to actually get there.
I discuss the possibility of it going in some other direction when I say “The two most salient options to me”. But the bit of Evan’s post that this contradicts is: