For example, if the GPT-4 evaluator gave a weighted score of 2.0 to a summary generated by Claude 2 and a weighted score of 3.0 to its own summary for the same article, then its final normalized self-preference score for the Claude summary would be 2/(2+3)=0.4.
Should this be 3/(2+3) = 0.6? Not sure I’ve understood correctly.
Should this be 3/(2+3) = 0.6? Not sure I’ve understood correctly.