Dawn Drescher comments on Davidad’s Bold Plan for Alignment: An In-Depth Explanation

Dawn Drescher 23 Nov 2023 12:59 UTC
1 point
0
Thanks so much for the summary! I’m wondering how this system could be bootstrapped in the industry using less powerful but current-levels-of-general AIs. Building a proof of concept using a Super Mario world is one thing, but what I would find more interesting is a version of the system that can make probabilistic safety guarantees for something like AutoGPT so that it is immediately useful and thus more likely to catch on.
What I’m thinking of here seems to me a lot like ARC Evals with probably somewhat different processes. Humans doing tasks that should, in the end, be automated. But that’s just how I currently imagine it after a few minutes of thinking about it. Would something like that be so far from OAA to be uninformative toward the goal of testing, refining, and bootstrapping the system?
Unrelated: Developing a new language for the purpose of the world modeling would introduce a lot of potential for bugs and there’d be no ecosystem of libraries. If the language is a big improvement over other functional languages, has good marketing, and is widely used in the industry, then that could change over the course of ~5 years – the bugs would largely get found and an ecosystem might develop – but that seems very hard, slow, risky, and expensive to pull off. Maybe Haskell could do the trick too? I’ve done some correctness proofs of simple Haskell programs at the university, and it was quite enjoyable.