Thanks for the post, I think it’s a useful framing. Two things I’d be interested in understanding better:
In the one real example of intelligence being developed we have to look at, continuous application of natural selection in fact found Homo sapiens sapiens, and the capability-gain curves of the ecosystem for various measurables were in fact sharply kinked by this new species (e.g., using machines, we sharply outperform other animals on well-established metrics such as “airspeed”, “altitude”, and “cargo carrying capacity”).
There are some ways in which AGI will be analogous to human evolution. There are some ways in which it will be disanalogous. Any solution to alignment will exploit at least one of the ways in which it’s disanalogous. Pointing to the example of humans without analysing the analogies and disanalogies more deeply doesn’t help distinguish between alignment proposals which usefully exploit disanalogies, and proposals which don’t.
So I’d be curious to know what you think the biggest disanalogies are between the example of human evolution and building AGI. Relatedly, would you consider raising a child to be a “real example of intelligence being developed”; why or why not?
Secondly:
Many different training scenarios are teaching your AI the same instrumental lessons, about how to think in accurate and useful ways. Furthermore, those lessons are underwritten by a simple logical structure
Granting that there’s a bunch of logical structure around how to think in accurate ways (e.g. solving scientific problems), and there’s a bunch of logical structure around how to pursue goals coherently (e.g. avoiding shutdown) what’s the strongest reason to believe that agents won’t learn something closely approximating the former before they learn something closely approximating the latter? My impression of Eliezer’s position is that it’s because they’re basically the same structure—if you agree with this, I’d be curious what sort of intuitions or theorems are most responsible for this belief.
(Another way of phrasing this question: suppose I made an analogous argument before the industrial revolution, saying something like “matter and energy are fundamentally the same thing at a deep level, we’ll soon be able to harness superhuman amounts of energy, therefore we’re soon going to be able to create superhuman amounts of matter”. Yet in fact, while the premise of mass-energy equivalence is true, the constants are such that it takes stupendously more energy than humans can generate, in order to produce human-sized piles of matter. What’s the main thing that makes you think that the constants in the intelligence case are such that AIs will converge to goal-coherence before, or around the same time as, superhuman scientific capabilities?)
Thanks for the post, I think it’s a useful framing. Two things I’d be interested in understanding better:
As I said in a reply to Eliezer’s AGI ruin post:
So I’d be curious to know what you think the biggest disanalogies are between the example of human evolution and building AGI. Relatedly, would you consider raising a child to be a “real example of intelligence being developed”; why or why not?
Secondly:
Granting that there’s a bunch of logical structure around how to think in accurate ways (e.g. solving scientific problems), and there’s a bunch of logical structure around how to pursue goals coherently (e.g. avoiding shutdown) what’s the strongest reason to believe that agents won’t learn something closely approximating the former before they learn something closely approximating the latter? My impression of Eliezer’s position is that it’s because they’re basically the same structure—if you agree with this, I’d be curious what sort of intuitions or theorems are most responsible for this belief.
(Another way of phrasing this question: suppose I made an analogous argument before the industrial revolution, saying something like “matter and energy are fundamentally the same thing at a deep level, we’ll soon be able to harness superhuman amounts of energy, therefore we’re soon going to be able to create superhuman amounts of matter”. Yet in fact, while the premise of mass-energy equivalence is true, the constants are such that it takes stupendously more energy than humans can generate, in order to produce human-sized piles of matter. What’s the main thing that makes you think that the constants in the intelligence case are such that AIs will converge to goal-coherence before, or around the same time as, superhuman scientific capabilities?)