I’ve started to notice a common pattern in evolutionary analogies, where they initially suggest concerning alignment implications, which then seem to dissolve once I track the mechanistic details of what actually happened in the evolutionary context, and how that would apply to AI development. At this point, my default reaction to any evolutionary analogy about AI alignment is skepticism.
I agree. Perhaps the alignment field would be better off if we’d never thought about evolution at all, and instead had modelled the learning dynamics directly. Don’t think about AIXI, don’t think about evolution, think about what your gradient update equations might imply, and then run experiments to test that.
I agree. Perhaps the alignment field would be better off if we’d never thought about evolution at all, and instead had modelled the learning dynamics directly. Don’t think about AIXI, don’t think about evolution, think about what your gradient update equations might imply, and then run experiments to test that.