This is a post that gave me (an ML noob) a great deal of understanding of how language models work — for example the discussion of the difference between “being able to do a task” and “knowing when to perform that task” is one I hadn’t conceptualized before reading this post, and makes a large difference in how to think about the improvements from scaling. I also thought the characterization of the split between different schools of thought and what they pay attention to was quite illuminating.
I don’t have enough object-level engagement for my recommendation to be much independent evidence, but I still will be voting this either a +4 or +9, because I personally learned a bunch from it.
This is a post that gave me (an ML noob) a great deal of understanding of how language models work — for example the discussion of the difference between “being able to do a task” and “knowing when to perform that task” is one I hadn’t conceptualized before reading this post, and makes a large difference in how to think about the improvements from scaling. I also thought the characterization of the split between different schools of thought and what they pay attention to was quite illuminating.
I don’t have enough object-level engagement for my recommendation to be much independent evidence, but I still will be voting this either a +4 or +9, because I personally learned a bunch from it.