I’m mostly just curious about how difficult it is for a transformer to learn to effectively access information from recent backprops, without using outside structures. Can it pull an essay title? General topic? And how well does this work for stochastic vs. batch processing? Thanks a lot btw.
I’m mostly just curious about how difficult it is for a transformer to learn to effectively access information from recent backprops, without using outside structures. Can it pull an essay title? General topic? And how well does this work for stochastic vs. batch processing? Thanks a lot btw.