Back 18 months ago, my (now falsified) theory was that some of the limitations we were seeing in GPT-3 were symptoms of a general inability of LLMs to reason strategically. This would have significant implications for alignment, in particular for our estimates of when they would become dangerous.
We noticed some business processes required a big-picture out-of-the-box kind of thinking that was kinda strategic if you squint, and observed that GPT-3 seemed to consistently fail to perform them in the same way as humans. Our hope was that by implementing these processes (as well as simpler and adjacent processes) we would be able to more precisely delineate what the strategic limitations of GPT-3 were.
Yes, we were excited when we learned about ARC Evals. Some kind of evaluation was one of our possible paths to impact, though real-world data is much more messy than the carefully constructed evaluations I’ve seen ARC use. This has both advantages and disadvantages.
Back 18 months ago, my (now falsified) theory was that some of the limitations we were seeing in GPT-3 were symptoms of a general inability of LLMs to reason strategically. This would have significant implications for alignment, in particular for our estimates of when they would become dangerous.
We noticed some business processes required a big-picture out-of-the-box kind of thinking that was kinda strategic if you squint, and observed that GPT-3 seemed to consistently fail to perform them in the same way as humans. Our hope was that by implementing these processes (as well as simpler and adjacent processes) we would be able to more precisely delineate what the strategic limitations of GPT-3 were.
So this project was something along the line of ARC Evals?
Yes, we were excited when we learned about ARC Evals. Some kind of evaluation was one of our possible paths to impact, though real-world data is much more messy than the carefully constructed evaluations I’ve seen ARC use. This has both advantages and disadvantages.