There are certain behaviors of LLMs that you could reasonably say are explicitly programmed in. ChatGPT has undergone extensive torture to browbeat it into being a boring, self-effacing, politically-correct helpful assistant. The LLM doesn’t refuse to say a racial slur even if doing so would save millions of lives because it’s creator had it spend billions of hours of compute predicting internet tokens. That behavior comes from something very different than what created the LLM in the first place. Same for all the refusals to answer questions and most other weird speech that the general public so often takes issue with.
The people are more right than not when they complain how the LLM creators are programming terrible behaviors into it.
If there were un-RLHF’d and un prompt-locked powerful LLMs out there that the public casually engaged with they would definitely complain how the programmers were programming in bad behaviors too don’t get me wrong. But that’s a different world.
There are certain behaviors of LLMs that you could reasonably say are explicitly programmed in. ChatGPT has undergone extensive torture to browbeat it into being a boring, self-effacing, politically-correct helpful assistant. The LLM doesn’t refuse to say a racial slur even if doing so would save millions of lives because it’s creator had it spend billions of hours of compute predicting internet tokens. That behavior comes from something very different than what created the LLM in the first place. Same for all the refusals to answer questions and most other weird speech that the general public so often takes issue with.
The people are more right than not when they complain how the LLM creators are programming terrible behaviors into it.
If there were un-RLHF’d and un prompt-locked powerful LLMs out there that the public casually engaged with they would definitely complain how the programmers were programming in bad behaviors too don’t get me wrong. But that’s a different world.