That would definitely be better, although it would mean reading/​scoring 1056 different responses, unless I can automate the scoring process. (Would LLMs object to doing that?)
That would definitely be better, although it would mean reading/​scoring 1056 different responses, unless I can automate the scoring process. (Would LLMs object to doing that?)