My three fundamental disagreements with MIRI, from my recollection of a ~1h conversation with Nate Soares in 2023. Please let me know if you think any positions have been misrepresented.
MIRI thinks (A) evolution is a good analogy for how alignment will fail-by-default in strong AIs, that (B) studying weak AGIs will not shine much light on how to align strong AIs, and that (C) strong narrow myopic optimizers will not be very useful for anything like alignment research.
(B) Alignment techniques for weak-but-agentic AGI are important.
Why:
In multipolar competitive scenarios, self-improvement may happen first for entire civilizations or economies, rather than for individual minds or small clusters of minds.
Techniques that work for weak-agentic-AGIs may help for aligning stronger minds. Reflection, onthological crises and self-modification makes alignment more difficult, but without strong local recursive self-improvement, it may be possible to develop techniques for better preserving alignment during these episodes, if these systems can be studied while still under control.
(C) Strong narrow myopic optimizers can be incredibly useful.
A hypothetical system capable of generating fixed-length text that strongly maximizes simple reward (e.g. expected value of next upvote) can be extremely helpful if reward is based on very careful objective evaluation. Careful judgement of adversarial “debate” setups of such systems may also generate great breakthoughts, including for alignment research.
My three fundamental disagreements with MIRI, from my recollection of a ~1h conversation with Nate Soares in 2023. Please let me know if you think any positions have been misrepresented.
MIRI thinks (A) evolution is a good analogy for how alignment will fail-by-default in strong AIs, that (B) studying weak AGIs will not shine much light on how to align strong AIs, and that (C) strong narrow myopic optimizers will not be very useful for anything like alignment research.
Now my own positions:
(A) Evolution is not a good analogy for AGI.
See Steven Byrnes’ Against evolution as an analogy for how humans will create AGI.
(B) Alignment techniques for weak-but-agentic AGI are important.
Why:
In multipolar competitive scenarios, self-improvement may happen first for entire civilizations or economies, rather than for individual minds or small clusters of minds.
Techniques that work for weak-agentic-AGIs may help for aligning stronger minds. Reflection, onthological crises and self-modification makes alignment more difficult, but without strong local recursive self-improvement, it may be possible to develop techniques for better preserving alignment during these episodes, if these systems can be studied while still under control.
(C) Strong narrow myopic optimizers can be incredibly useful.
A hypothetical system capable of generating fixed-length text that strongly maximizes simple reward (e.g. expected value of next upvote) can be extremely helpful if reward is based on very careful objective evaluation. Careful judgement of adversarial “debate” setups of such systems may also generate great breakthoughts, including for alignment research.