Yeah, intuitively I don’t see that as breaking alignment if it’s already aligned, and an unaligned AI would already have incentive to lie, I think. Considering the potential positive impact this could have qualia-wise, imo it’s a worthwhile practice to carry out.
Yeah, intuitively I don’t see that as breaking alignment if it’s already aligned, and an unaligned AI would already have incentive to lie, I think. Considering the potential positive impact this could have qualia-wise, imo it’s a worthwhile practice to carry out.