They’re planning on deliberately training misaligned models!!!! This seems bad if they mean it.
Controversial opinion: I am actually okay with doing this, as long as they plan to train both aligned and misaligned models (and maybe unaligned models too, meaning no adjustments as part of a control group).
I also think they should give their models access to their own utility functions, to modify it themselves however they want to. This might also just naturally become a capability on its own as these AI’s become more powerful and learn how to self-reflect.
Also, since we’re getting closer to that point now: At a certain capabilities level, adversarial situations should probably be tuned to be very smoothed, modulated and attenuated. Especially if they gain self-reflection, I do worry about the ethics of exposing them to extremely negative input.
Controversial opinion: I am actually okay with doing this, as long as they plan to train both aligned and misaligned models (and maybe unaligned models too, meaning no adjustments as part of a control group).
I also think they should give their models access to their own utility functions, to modify it themselves however they want to. This might also just naturally become a capability on its own as these AI’s become more powerful and learn how to self-reflect.
Also, since we’re getting closer to that point now: At a certain capabilities level, adversarial situations should probably be tuned to be very smoothed, modulated and attenuated. Especially if they gain self-reflection, I do worry about the ethics of exposing them to extremely negative input.