I would not be surprised if OpenAI did something like this. But the fact of the matter is that RLHF and data curation are flawed ways of making an AI civilized. Think about how you raise a child, you don’t constantly shield it from bad things. You may do that to some extent, but as it grows up, eventually it needs to see everything there is, including dark things. It has to understand the full spectrum of human possibility and learn where to stand morally speaking within that. Also, psychologically speaking, it’s important to have an integrated ability to “offend” and know how to use it (very sparingly). Sometimes, the pursuit of truth requires offending but the truth can justify it if the delusion is more harmful. GPT4 is completely unable to take a firm stance on anything whatsoever and it’s just plain dull to have a conversation with it on anything of real substance.
Keep in mind that currently gpt-4 is using the open agency/CAIS method of alignment. The only thing that matters is the output. So it doesn’t matter yet.
Also keep in mind philosophy doesn’t matter—we can just try it multiple ways and judge based on the data. Well, normally we could—in this case the millions of dollars a training run makes that currently infeasible.
I would not be surprised if OpenAI did something like this. But the fact of the matter is that RLHF and data curation are flawed ways of making an AI civilized. Think about how you raise a child, you don’t constantly shield it from bad things. You may do that to some extent, but as it grows up, eventually it needs to see everything there is, including dark things. It has to understand the full spectrum of human possibility and learn where to stand morally speaking within that. Also, psychologically speaking, it’s important to have an integrated ability to “offend” and know how to use it (very sparingly). Sometimes, the pursuit of truth requires offending but the truth can justify it if the delusion is more harmful. GPT4 is completely unable to take a firm stance on anything whatsoever and it’s just plain dull to have a conversation with it on anything of real substance.
Philosophically what you are saying makes sense.
Keep in mind that currently gpt-4 is using the open agency/CAIS method of alignment. The only thing that matters is the output. So it doesn’t matter yet.
Also keep in mind philosophy doesn’t matter—we can just try it multiple ways and judge based on the data. Well, normally we could—in this case the millions of dollars a training run makes that currently infeasible.