Søren Elverlin comments on Retrospective: Lessons from the Failed Alignment Startup AISafety.com

Søren Elverlin 25 May 2023 10:27 UTC
1 point
0
Back 18 months ago, my (now falsified) theory was that some of the limitations we were seeing in GPT-3 were symptoms of a general inability of LLMs to reason strategically. This would have significant implications for alignment, in particular for our estimates of when they would become dangerous.

We noticed some business processes required a big-picture out-of-the-box kind of thinking that was kinda strategic if you squint, and observed that GPT-3 seemed to consistently fail to perform them in the same way as humans. Our hope was that by implementing these processes (as well as simpler and adjacent processes) we would be able to more precisely delineate what the strategic limitations of GPT-3 were.
- Chris_Leong 25 May 2023 16:44 UTC
  3 points
  0
  Parent
  So this project was something along the line of ARC Evals?
  - Søren Elverlin 26 May 2023 7:02 UTC
    3 points
    0
    Parent
    Yes, we were excited when we learned about ARC Evals. Some kind of evaluation was one of our possible paths to impact, though real-world data is much more messy than the carefully constructed evaluations I’ve seen ARC use. This has both advantages and disadvantages.