when some new shiny block architecture that beats all the incumbents will be invented
Additionally, it’s sometimes assumed that this invention and the AI landscape overhaul will happen during the recursive self-improvement a.k.a. the autonomous takeoff phase.
Actually, these things don’t contradict each other. AutoML methods are great for automatically searching for strong ways to combine existing blocks into a single heterogeneous machine. They can be used now and even more so during recursive self-improvement.
And, at the same time, a few novel shiny block architectures can be found and added to the mix without phasing out the existing ones.
“intelligence containment”, especially through compute governance, will be very short-lived
Yes, I also think that compute governance is unlikely to work for long.
People need to ponder a variety of alternative approaches to AI existential safety.
I agree with everything you said. Seems that we should distinguish between a sort of “cooperative” and “adversarial” safety approaches (cf. the comment above). I wrote the entire post as an extended reply to Marc Carauleanu upon his mixed feedback to my idea of adding “selective SSM blocks for theory of mind” to increase the Self-Other Overlap in AI architecture as a pathway to improve safety. Under the view that both Transformer and Selective SSM blocks will survive up until the AGI (if it is going to be created at all, of course), and even with the addition of your qualifications (that AutoML will try to stack these and other types of blocks in some quickly evolving ways), the approach seems solid to me, but only if we also make some basic assumptions about the good faith and cooperativeness of the AutoML / auto takeoff process. If we don’t make such assumptions, of course, all bets are off, these “blocks for safety” could just be purged from the architecture.
Yes, I strongly suspect that “adversarial” safety approaches are quite doomed. The more one thinks about those, the worse they look.
We need to figure out how to make “cooperative” approaches to work reliably. In this sense, I have a feeling that, in particular, the approach being developed by OpenAI has been gradually shifting in that direction (judging, for example, by this interview with Ilya I transcribed: Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments).
Actually, these things don’t contradict each other. AutoML methods are great for automatically searching for strong ways to combine existing blocks into a single heterogeneous machine. They can be used now and even more so during recursive self-improvement.
And, at the same time, a few novel shiny block architectures can be found and added to the mix without phasing out the existing ones.
Yes, I also think that compute governance is unlikely to work for long.
People need to ponder a variety of alternative approaches to AI existential safety.
I agree with everything you said. Seems that we should distinguish between a sort of “cooperative” and “adversarial” safety approaches (cf. the comment above). I wrote the entire post as an extended reply to Marc Carauleanu upon his mixed feedback to my idea of adding “selective SSM blocks for theory of mind” to increase the Self-Other Overlap in AI architecture as a pathway to improve safety. Under the view that both Transformer and Selective SSM blocks will survive up until the AGI (if it is going to be created at all, of course), and even with the addition of your qualifications (that AutoML will try to stack these and other types of blocks in some quickly evolving ways), the approach seems solid to me, but only if we also make some basic assumptions about the good faith and cooperativeness of the AutoML / auto takeoff process. If we don’t make such assumptions, of course, all bets are off, these “blocks for safety” could just be purged from the architecture.
Yes, I strongly suspect that “adversarial” safety approaches are quite doomed. The more one thinks about those, the worse they look.
We need to figure out how to make “cooperative” approaches to work reliably. In this sense, I have a feeling that, in particular, the approach being developed by OpenAI has been gradually shifting in that direction (judging, for example, by this interview with Ilya I transcribed: Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments).