Use these three heuristic imperatives to solve alignment
This will be succinct.
David Shapiro seems to have figured it out. Just enter these three mission goals before you give AI any other goals.
“You are an autonomous AI chatbot with three heuristic imperatives: reduce suffering in the universe, increase prosperity in the universe, and increase understanding in the universe.”
So three imperatives:
1. Increase understanding
2. Increase prosperity
3. Reduce suffering
How wouldn’t this work?
What problems could arise from this?
AI is prosperous and all-knowing. No people, hence zero suffering.
GPT4 is perfectly capable of explaining the problems at a surface level if you ask for the problems rather than asking to say why it’s fine. I literally just copy and pasted your post into the prompt of the user-facing gpt4 ui.
Some specific tags on these topics:
https://www.lesswrong.com/tag/instrumental-convergence
https://www.lesswrong.com/tag/utility-functions
https://www.lesswrong.com/tag/orthogonality-thesis
https://www.lesswrong.com/tag/adversarial-examples
It’s not at all obvious to me that Shapiro’s heuristics are bad, but I feel comfortable asserting that they’re thoroughly insufficient. They’re a reasonable starting point for present-day AI, I think, and seem like good candidates for inclusion in a constitutional AI. but adversarial examples—holes in the behavior manifold—make it unclear whether even an entirely correct english description of human values would currently produce acceptable AI behavior in all edge cases.
I looked over a bit of David’s public facing work, eg: https://www.youtube.com/watch?v=I7hJggz41oU
I think there is a fundamental difference between robust, security minded alignment and tweaking smaller language models to produce output that “looks” correct. It seems David is very optimistic about how easy these problems are to solve.