Isn’t honesty a part of the “truth machine” rather than the “good machine”? Confabulation seems to be a case of the model generating text which it doesn’t “believe”, in some sense.
Yeah. I’m saying that the “good machine” should be trained on all three; it should be honest, but, constrained by helpfulness and harmlessness. (Or, more realistically, a more complicated constitution with more details.)
Isn’t honesty a part of the “truth machine” rather than the “good machine”? Confabulation seems to be a case of the model generating text which it doesn’t “believe”, in some sense.
Yeah. I’m saying that the “good machine” should be trained on all three; it should be honest, but, constrained by helpfulness and harmlessness. (Or, more realistically, a more complicated constitution with more details.)