What background knowledge do you think this requires? If I know a bit about how ML and language models work in general, should I be able to reason this out from first principles (or from following a fairly obvious trail of “look up relevant terms and quickly get up to speed on the domain?”). Or does it require some amount of pre-existing ML taste?
Also, do you have a rough sense of how long it took for MATS scholars?
Background: You don’t need to know anything beyond “a language model is a stack of matrix multiplications and non-linearities. The input is a series of tokens (words and sub-words) which get converted to vectors by a massive lookup table called the embedding (the vectors are called token embeddings). These vectors have really high cosine sim in GPT-Neo”.
Re how long it took for scholars, hmm, maybe an hour? Not sure, I expect it varied a ton. I gave this in their first or second week, I think.
What background knowledge do you think this requires? If I know a bit about how ML and language models work in general, should I be able to reason this out from first principles (or from following a fairly obvious trail of “look up relevant terms and quickly get up to speed on the domain?”). Or does it require some amount of pre-existing ML taste?
Also, do you have a rough sense of how long it took for MATS scholars?
Great questions, thanks!
Background: You don’t need to know anything beyond “a language model is a stack of matrix multiplications and non-linearities. The input is a series of tokens (words and sub-words) which get converted to vectors by a massive lookup table called the embedding (the vectors are called token embeddings). These vectors have really high cosine sim in GPT-Neo”.
Re how long it took for scholars, hmm, maybe an hour? Not sure, I expect it varied a ton. I gave this in their first or second week, I think.