I understand that—with some caveats—a waluigi->luigi transition may have low probability in natural language text. However, there’s no reason to think this has to be the case for RLHF text.
I understand that—with some caveats—a waluigi->luigi transition may have low probability in natural language text. However, there’s no reason to think this has to be the case for RLHF text.