I find the fact that they used the training data in a short time to massively reduce the “jailbreak”-cases evidence in the direction that the point of the exercise was to gather training data.
ChatGPT has a mode where it labels your question as illegitimate and colors it red but still gives you an answer. Then there’s the feedback button to tell OpenAI if it made a mistake. This behavior prioritizes gathering training data over not giving any problematic answers.
Maybe the underlying reason why we are interpreting the evidence in different ways is because we are holding OpenAI to different standards:
Compared to a standard company, having a feedback button is evidence of competence. Quickly incorporating training data is also a positive update, as is having an explicit graphical representation of illegitimate questions.
I am comparing OpenAI to the extremely high standard of “Being able to solve the alignment problem”. Against this standard, having a feedback button is absolutely expected, and even things like Eliezers suggestion (publishing hashes of your gambits) should be obvious to companies competent enough to have a chance of solving the alignment problem.
It’s important to be able to distinguish factual questions from questions about judgments. “Did the OpenAI release happen the way OpenAI expected?” is a factual question that has nothing to do with the question of what standards we should have for OpenAI.
If you get the factual questions wrong it’s very easy for people within OpenAI to easily dismiss your arguments.
I find the fact that they used the training data in a short time to massively reduce the “jailbreak”-cases evidence in the direction that the point of the exercise was to gather training data.
ChatGPT has a mode where it labels your question as illegitimate and colors it red but still gives you an answer. Then there’s the feedback button to tell OpenAI if it made a mistake. This behavior prioritizes gathering training data over not giving any problematic answers.
Maybe the underlying reason why we are interpreting the evidence in different ways is because we are holding OpenAI to different standards:
Compared to a standard company, having a feedback button is evidence of competence. Quickly incorporating training data is also a positive update, as is having an explicit graphical representation of illegitimate questions.
I am comparing OpenAI to the extremely high standard of “Being able to solve the alignment problem”. Against this standard, having a feedback button is absolutely expected, and even things like Eliezers suggestion (publishing hashes of your gambits) should be obvious to companies competent enough to have a chance of solving the alignment problem.
It’s important to be able to distinguish factual questions from questions about judgments. “Did the OpenAI release happen the way OpenAI expected?” is a factual question that has nothing to do with the question of what standards we should have for OpenAI.
If you get the factual questions wrong it’s very easy for people within OpenAI to easily dismiss your arguments.
I fully agree that it is a factual question, and OpenAI could easily shed light on the circumstances around the launch if they chose to do so.