I love being accused of being GPT-x on Discord by people who don’t understand scaling laws and think I own a planet of A100s
There are some hard and mean limits to explainability and there’s a real issue that a person that correctly sees how to align AGI or that correctly perceives that an AGI design is catastrophically unsafe will not be able to explain it. It requires super-intelligence to cogently expose stupid designs that will kill us all. What are we going to do if there’s this kind of coordination failure?
If there is no solution to the alignment problem within reach of human level intelligence, then the AGI can’t foom into an ASI without risking value drift…
A human augmented by a strong narrow AIs could in theory detect deception by an AGI. Stronger interpretability tools…
What we want is a controlled intelligence explosion, where an increase in strength of the AGI leads to an increase in our ability to align, alignment as an iterative problem…
A kind of intelligence arms race, perhaps humans can find a way to compete indefinitely?