Even in the simple case no. 1, I don’t quite see why Evan isn’t right yet.
It’s true that deterministically failing will create a sort of wall in the landscape that the ball will bounce off of and then roll right back into as you said. However, wouldn’t it also perhaps roll in other directions, such as perpendicular to the wall? Instead of getting stuck bouncing into the wall forever, the ball would bounce against the wall while also rolling in some other direction along it. (Maybe the analogy to balls and walls is leading me astray here?)
I discuss the possibility of it going in some other direction when I say “The two most salient options to me”. But the bit of Evan’s post that this contradicts is:
Now, if the model gets to the point where it’s actually just failing because of this, then gradient descent will probably just remove that check—but the trick is never to actually get there.
Even in the simple case no. 1, I don’t quite see why Evan isn’t right yet.
It’s true that deterministically failing will create a sort of wall in the landscape that the ball will bounce off of and then roll right back into as you said. However, wouldn’t it also perhaps roll in other directions, such as perpendicular to the wall? Instead of getting stuck bouncing into the wall forever, the ball would bounce against the wall while also rolling in some other direction along it. (Maybe the analogy to balls and walls is leading me astray here?)
I discuss the possibility of it going in some other direction when I say “The two most salient options to me”. But the bit of Evan’s post that this contradicts is: