Thanks, this is a really useful analogy! One place it breaks down, at least if I’m understanding the architecture properly, is that when everyone in the line consults their own memory (ie MLP), they all have identical long-term memories! I don’t think that’s a big problem for the analogy, but it might be worth mentioning in the text just so people aren’t confused. Although do let me know if the confusion is actually on my end...
There’s also one important element missing from the analogy. If a transformer is like a bunch of people standing in a line, what are they all waiting in line for? Is it, like, a play? Or a train? ;)
Thanks, this is a really useful analogy! One place it breaks down, at least if I’m understanding the architecture properly, is that when everyone in the line consults their own memory (ie MLP), they all have identical long-term memories! I don’t think that’s a big problem for the analogy, but it might be worth mentioning in the text just so people aren’t confused. Although do let me know if the confusion is actually on my end...
There’s also one important element missing from the analogy. If a transformer is like a bunch of people standing in a line, what are they all waiting in line for? Is it, like, a play? Or a train? ;)