Thanks for the post. I think it’d be helpful if you could add some links to references for some of the things you say, such as:
For instance, between 10^10 and 10^11 parameters, models showed dramatic improvements in their ability to interpret emoji sequences representing movies.
Thanks for the post. I think it’d be helpful if you could add some links to references for some of the things you say, such as: