I think this is something I and many others at EleutherAI would be very interested in working on, since it seems like something that we’d have a uniquely big comparative advantage at.
One very relevant piece of infrastructure we’ve built is our evaluation framework, which we use for all of our evaluation since it makes it really easy to evaluate your task on GPT-2/3/Neo/NeoX/J etc. We also have a bunch of other useful LM related resources, like intermediate checkpoints for GPT-J-6B that we are looking to use in our interpretability work, for example. I’ve also thought about building some infrastructure to make it easier to coordinate the building of handmade benchmarks—this is currently on the back burner but if this would be helpful for anyone I’d definitely get it going again.
If anyone reading this is interested in collaborating, please DM me or drop by the #prosaic-alignment channel in the EleutherAI discord.
I think this is something I and many others at EleutherAI would be very interested in working on, since it seems like something that we’d have a uniquely big comparative advantage at.
One very relevant piece of infrastructure we’ve built is our evaluation framework, which we use for all of our evaluation since it makes it really easy to evaluate your task on GPT-2/3/Neo/NeoX/J etc. We also have a bunch of other useful LM related resources, like intermediate checkpoints for GPT-J-6B that we are looking to use in our interpretability work, for example. I’ve also thought about building some infrastructure to make it easier to coordinate the building of handmade benchmarks—this is currently on the back burner but if this would be helpful for anyone I’d definitely get it going again.
If anyone reading this is interested in collaborating, please DM me or drop by the #prosaic-alignment channel in the EleutherAI discord.