https://www.lesserwrong.com/posts/ZyyMPXY27TTxKsR5X/problems-with-amplification-distillation
Summary:
I have four main criticisms of the approach:
1. “Preserve alignment” is not a valid concept, and “alignment” is badly used in the description of the method.
2. The method requires many attendant problems to be solved, just like any other method of alignment.
3. There are risks of generating powerful agents within the systems that will try to manipulate it.
4. If those attendant problems are solved, it isn’t clear there’s much remaining of the method.
The first two points will form the core of my critique, with the third as a strong extra worry. I am considerably less convinced about the fourth.
https://www.lesserwrong.com/posts/ZyyMPXY27TTxKsR5X/problems-with-amplification-distillation
Summary:
I have four main criticisms of the approach:
1. “Preserve alignment” is not a valid concept, and “alignment” is badly used in the description of the method.
2. The method requires many attendant problems to be solved, just like any other method of alignment.
3. There are risks of generating powerful agents within the systems that will try to manipulate it.
4. If those attendant problems are solved, it isn’t clear there’s much remaining of the method.
The first two points will form the core of my critique, with the third as a strong extra worry. I am considerably less convinced about the fourth.