Let’s say that we are training an LLM and have 3 selections of writing, A, B, and C, each of which is then broken down into 3 parts. Call them A1, A2, ect.
Is there a benefit to structuring the batches like this:
Batch 1: {A1,B1,C1}
Batch 2: {A2,B2,C2}
Batch 3: {A3,B3,C3}
such that the most recent batches can provide information for the current prediction?
My understanding is that human episodic memory works something like this, but that current neural networks have a learning rate that is too low for this to be useful. Have there been experiments run on this? I feel like this is an obvious idea and has been examined exhaustively, but I just don’t know the right vocabulary.
LLMs, Batches, and Emergent Episodic Memory
Let’s say that we are training an LLM and have 3 selections of writing, A, B, and C, each of which is then broken down into 3 parts. Call them A1, A2, ect.
Is there a benefit to structuring the batches like this:
Batch 1: {A1,B1,C1}
Batch 2: {A2,B2,C2}
Batch 3: {A3,B3,C3}
such that the most recent batches can provide information for the current prediction?
My understanding is that human episodic memory works something like this, but that current neural networks have a learning rate that is too low for this to be useful. Have there been experiments run on this? I feel like this is an obvious idea and has been examined exhaustively, but I just don’t know the right vocabulary.