it’s quite common for assistants to refuse instructions, especially harmful instructions. so i’m not surprised that base llms systestemically refuse harmful instructions from than harmless ones.
Indeed. The base LLM would likely predict a “henchman” to be a lot less scrupulous than an “assistant”.
it’s quite common for assistants to refuse instructions, especially harmful instructions. so i’m not surprised that base llms systestemically refuse harmful instructions from than harmless ones.
Indeed. The base LLM would likely predict a “henchman” to be a lot less scrupulous than an “assistant”.