TurnTrout’s proposal seems to me to be basically “train it around diamonds, do some reward-shaping, and hope that at least some care-about-diamonds makes it across the gap”.
I read a connotation here like “TurnTrout isn’t proposing anything sufficiently new and impressive.” To be clear, I don’t think I’m proposing an awesome new alignment technique. I’m instead proposing that we don’t need one.
Also, in OP, you write:
I read a connotation here like “TurnTrout isn’t proposing anything sufficiently new and impressive.” To be clear, I don’t think I’m proposing an awesome new alignment technique. I’m instead proposing that we don’t need one.