I think the behavior of LLMs in the long run might not be very interesting. Since the oldest tokens are continually being deleted, information is being lost and eventually it’ll get stuck in a mumble loop. And the set of mumble loops seems much smaller and less interesting than the set of answers we could get in the short run.
I think the behavior of LLMs in the long run might not be very interesting. Since the oldest tokens are continually being deleted, information is being lost and eventually it’ll get stuck in a mumble loop. And the set of mumble loops seems much smaller and less interesting than the set of answers we could get in the short run.
Physics also tends toward very uninteresting things. This is for similar reasons, right?