A measurable uptick in persuasive ability, combined with middling benchmark scores but a positive eval of “taste” and “aesthetics”, should raise some eyebrows. I wonder how we can distinguish good (or the ‘correct’) output from output that is simply pleasant.
A measurable uptick in persuasive ability, combined with middling benchmark scores but a positive eval of “taste” and “aesthetics”, should raise some eyebrows. I wonder how we can distinguish good (or the ‘correct’) output from output that is simply pleasant.