Brendon_Wong comments on CAIS-inspired approach towards safer and more interpretable AGIs

Brendon_Wong 3 May 2023 11:45 UTC
3 points
0
Have you seen Seth Herd’s work and the work it references (particularly natural language alignment)? Drexler also has an updated proposal called Open Agencies, which seems to be an updated version of his original CAIS research. It seems like Davidad is working on a complex implementation of open agencies. I will likely work on a significantly simpler implementation. I don’t think any of these designs explicitly propose capping LLMs though, given that they’re non-agentic, transient, etc. by design and thus seem far less risky than agentic models. The proposals mostly focus on avoiding riskier models that are agentic, persistent, etc.