Joseph Bloom comments on I was Wrong, Simulator Theory is Real

Joseph Bloom 27 Apr 2023 13:00 UTC
5 points
0
Maybe I’m not super across simulator theory. If you finetuned your LLM or RLHF’d it into not having this property (which I assume is possible), then does it cease to be a simulator? If so, what do we call the resulting thing?

Maybe proposing alternative hypotheses which makes distinctly different predictions in some scenario could be interesting.

Unrelated but I think I once did a mini experiment where I tried to get gpt3 to fail intentionally with prompts like “write a recipe but forget an ingredient” with the idea being that that once the model is finetuned to be performant it will struggle to choose precise positions to fail when told to, this seemed interesting but wasn’t sure where to take it. Maybe this will be useful to you.