“keyboard and monitor I’m using right now, a stack of books, a tupperware, waterbottle, flip-flops, carpet, desk and chair, refrigerator, sink, etc. Under my models, if I pick one of these objects at random and do a deep dive researching that object, it will usually turn out to be bad in ways which were either nonobvious or nonsalient to me, but unambiguously make my life worse”
But, I think the negative impacts that these goods have on you are (mostly) realized on longer timescales—say, years to decades. If you’re using a chair that is bad for your posture, the impacts of this are usually seen years down the line when your back starts aching. Or if you keep microwaving tupperware, you may end up with some pretty nasty medical problems, but again, decades down the line.
The property of an action having long horizons until it can be verified as good or bad for you makes delegating to smarter-than-you systems dangerous. My intuition is that there are lots of tasks that could significantly accelerate alignment research that don’t have this property, examples being codebase writing (unit tests can provide quick feedback), proof verification etc. In fact, I can’t think of many research tasks in technical fields that have month/year/decade horizons until they can be verified—though maybe I’ve just not given it enough thought.
Many research tasks have very long delays until they can be verified. The history of technology is littered with apparently good ideas that turned out to be losers after huge development efforts were poured into them. Supersonic transport, zeppelins, silicon-on-sapphire integrated circuits, pigeon-guided bombs, object-oriented operating systems, hydrogenated vegetable oil, oxidative decoupling for weight loss…
Finding out that these were bad required making them, releasing them to the market, and watching unrecognized problems torpedo them. Sometimes it took decades.
But if the core difficulty in solving alignment is developing some difficult mathematical formalism and figuring out relevant proofs then I think we won’t suffer from the problems with the technologies above. In other words, I would feel comfortable delegating and overseeing a team of AIs that have been tasked with solving the Riemann hypothesis—and I think this is what a large part of solving alignment might look like.
“May it go from your lips to God’s ears,” as the old Jewish saying goes. Meaning, I hope you’re right. Maybe aligning superintelligence will largely be a matter of human-checkable mathematical proof.
I have 45 years experience as a software and hardware engineer, which makes me cynical. When one of my designs encounters the real world, it hardly ever goes the way I expect. It usually either needs some rapid finagling to make it work (acceptable) or it needs to be completely abandoned (bad). This is no good for the first decisive try at superalignment; that has to work first time. I hope our proof technology is up to it.
But, I think the negative impacts that these goods have on you are (mostly) realized on longer timescales—say, years to decades. If you’re using a chair that is bad for your posture, the impacts of this are usually seen years down the line when your back starts aching. Or if you keep microwaving tupperware, you may end up with some pretty nasty medical problems, but again, decades down the line.
The property of an action having long horizons until it can be verified as good or bad for you makes delegating to smarter-than-you systems dangerous. My intuition is that there are lots of tasks that could significantly accelerate alignment research that don’t have this property, examples being codebase writing (unit tests can provide quick feedback), proof verification etc. In fact, I can’t think of many research tasks in technical fields that have month/year/decade horizons until they can be verified—though maybe I’ve just not given it enough thought.
Many research tasks have very long delays until they can be verified. The history of technology is littered with apparently good ideas that turned out to be losers after huge development efforts were poured into them. Supersonic transport, zeppelins, silicon-on-sapphire integrated circuits, pigeon-guided bombs, object-oriented operating systems, hydrogenated vegetable oil, oxidative decoupling for weight loss…
Finding out that these were bad required making them, releasing them to the market, and watching unrecognized problems torpedo them. Sometimes it took decades.
But if the core difficulty in solving alignment is developing some difficult mathematical formalism and figuring out relevant proofs then I think we won’t suffer from the problems with the technologies above. In other words, I would feel comfortable delegating and overseeing a team of AIs that have been tasked with solving the Riemann hypothesis—and I think this is what a large part of solving alignment might look like.
“May it go from your lips to God’s ears,” as the old Jewish saying goes. Meaning, I hope you’re right. Maybe aligning superintelligence will largely be a matter of human-checkable mathematical proof.
I have 45 years experience as a software and hardware engineer, which makes me cynical. When one of my designs encounters the real world, it hardly ever goes the way I expect. It usually either needs some rapid finagling to make it work (acceptable) or it needs to be completely abandoned (bad). This is no good for the first decisive try at superalignment; that has to work first time. I hope our proof technology is up to it.