Perhaps the norm should be to use some sort of LLM-based survey service like https://news.ycombinator.com/item?id=36865625 in order to try to get a more representative population sample of LLM outputs?
This seems like it could be a useful service in general: do the legwork to take base models (not tuned models), and prompt in many ways and reformulate in many ways to get the most robust distribution of outputs possible. (For example, ask a LLM to rewrite a question at various levels of details or languages, or switch between logically equivalent formulations to avoid acquiescence bias; or if it needs k shots, shuffle/drop out the shots a bunch of times.)
Perhaps the norm should be to use some sort of LLM-based survey service like https://news.ycombinator.com/item?id=36865625 in order to try to get a more representative population sample of LLM outputs?
This seems like it could be a useful service in general: do the legwork to take base models (not tuned models), and prompt in many ways and reformulate in many ways to get the most robust distribution of outputs possible. (For example, ask a LLM to rewrite a question at various levels of details or languages, or switch between logically equivalent formulations to avoid acquiescence bias; or if it needs k shots, shuffle/drop out the shots a bunch of times.)