If you have played with chatGPT4 its pretty clear that it is aligned (humans have roughly chose its values), especially compared to reports of the original raw model before RLHF, or less sophisticated alignment attempts in the same model family—ie Bing. Now its possible of course that its all deception, but this seems somewhat unlikely.
If you have played with chatGPT4 its pretty clear that it is aligned (humans have roughly chose its values), especially compared to reports of the original raw model before RLHF, or less sophisticated alignment attempts in the same model family—ie Bing. Now its possible of course that its all deception, but this seems somewhat unlikely.