I think my definition of μ∞ is correct. It’s designed to abstract away all the messy implementation details of the ML architecture and ML training process.
Now, you can easily amend the definition to include an infinite context window k. In fact, if you let k>N then that’s essentially an infinite context window. But it’s unclear what optimal inference is supposed to look like when k=∞. When the context window is infinite (or very large) the internet corpus consists of a single datapoint.
I think my definition of μ∞ is correct. It’s designed to abstract away all the messy implementation details of the ML architecture and ML training process.
Now, you can easily amend the definition to include an infinite context window k. In fact, if you let k>N then that’s essentially an infinite context window. But it’s unclear what optimal inference is supposed to look like when k=∞. When the context window is infinite (or very large) the internet corpus consists of a single datapoint.