Imo “true according to Alice” is nowhere near as “crazy” a feature as “has_true XOR has_banana”. It seems useful for the LLM to model what is true according to Alice! (Possibly I’m misunderstanding what you mean by “crazy” here.)
I agree with this! (And it’s what I was trying to say; sorry if I was unclear.) My point is that { features which are as crazy as “true according to Alice” (i.e., not too crazy)} seems potentially manageable, where as { features which are as crazy as arbitrary boolean functions of other features } seems totally unmanageable.
I agree with this! (And it’s what I was trying to say; sorry if I was unclear.) My point is that
{ features which are as crazy as “true according to Alice” (i.e., not too crazy)}
seems potentially manageable, where as
{ features which are as crazy as arbitrary boolean functions of other features }
seems totally unmanageable.
Thanks, as always, for the thoughtful replies.