You say this is a dumber approach, but it seems smarter to me, and more general. I feel more confident that this vector is genuinely going to result in a “switch from English to French” behavior, versus the edits in the main post. I suppose it might also result in some more general “switch between languages” behavior.
So the last challenge remaining of the four is for someone to figure out a truth-telling vector.
You say this is a dumber approach, but it seems smarter to me, and more general. I feel more confident that this vector is genuinely going to result in a “switch from English to French” behavior, versus the edits in the main post. I suppose it might also result in some more general “switch between languages” behavior.
So the last challenge remaining of the four is for someone to figure out a truth-telling vector.