All of them stem from the same fundamental problem: we don’t have a FAI, nor a clear destination for what A[N] should be doing. Therefore, the Amplification/Distillation steps are taking a random walk in the space of minds, with each step defined by imperfect local criteria. There is no reason to suspect the ultimate attractor of this method will be good.
Wei’s comments aside, this does to me suggest a way in which amplification/distillation could be a dangerous research path, as you hint at, because it seemingly can be used to create more powerful AI for any purpose. That is, it encodes no solution to metaethics and leaves that to be implicitly resolved by the human operators, so research on amplification/distillation seems to potentially contribute more to capabilities research than safety research. This updates me in the direction of being more opposed to the proposal even if it is a capability being consider with the intention to use it for safety-related purposes.
This somewhat contradicts my previous take on Paul’s’ work as I think, based on your presentation of it, I may have misunderstood or failed to realize the full implications of Paul’s approach. I previously viewed it as a means of learning human values while building more capable AI, and while it can still be used for that I’m now more worried about the ways in which it might be used for other purposes.
Wei’s comments aside, this does to me suggest a way in which amplification/distillation could be a dangerous research path, as you hint at, because it seemingly can be used to create more powerful AI for any purpose. That is, it encodes no solution to metaethics and leaves that to be implicitly resolved by the human operators, so research on amplification/distillation seems to potentially contribute more to capabilities research than safety research. This updates me in the direction of being more opposed to the proposal even if it is a capability being consider with the intention to use it for safety-related purposes.
This somewhat contradicts my previous take on Paul’s’ work as I think, based on your presentation of it, I may have misunderstood or failed to realize the full implications of Paul’s approach. I previously viewed it as a means of learning human values while building more capable AI, and while it can still be used for that I’m now more worried about the ways in which it might be used for other purposes.