OpenAI says they are using ChatGPT 4 internally: “We’ve also been using GPT-4 internally, with great impact on functions like support, sales, content moderation, and programming. We also are using it to assist humans in evaluating AI outputs, starting the second phase in our alignment strategy.” https://openai.com/research/gpt-4
Does this mean what I think it means? That they are using this AI to analyse and optimise the code the AI themself run on? Does anyone know if OpenAI have confirmed or denied this, or given information on safeguards that are in place? E.g. what happens if the code produced by the AI is no longer comprehensible for the human workers, but performs significantly better than previous code? OpenAI seems to be under massive pressure to up efficiency to meet demand, and we already know that ChatGPT is bloody good at being given existing code and then spitting out something that does what looks to be the same thing, but more efficiently. This seems like an obvious and massive safety issue, and conflict of interest between safety and capability.
Sorry if you all have already covered this and I am just preaching to the choire here. I’ve been incredibly busy and have not been able to check back here.
OpenAI says they are using ChatGPT 4 internally: “We’ve also been using GPT-4 internally, with great impact on functions like support, sales, content moderation, and programming. We also are using it to assist humans in evaluating AI outputs, starting the second phase in our alignment strategy.” https://openai.com/research/gpt-4
Does this mean what I think it means? That they are using this AI to analyse and optimise the code the AI themself run on? Does anyone know if OpenAI have confirmed or denied this, or given information on safeguards that are in place? E.g. what happens if the code produced by the AI is no longer comprehensible for the human workers, but performs significantly better than previous code? OpenAI seems to be under massive pressure to up efficiency to meet demand, and we already know that ChatGPT is bloody good at being given existing code and then spitting out something that does what looks to be the same thing, but more efficiently. This seems like an obvious and massive safety issue, and conflict of interest between safety and capability.
Sorry if you all have already covered this and I am just preaching to the choire here. I’ve been incredibly busy and have not been able to check back here.
Probably something along the lines of RLAIF? Anthropic’s Claude might be more robustly tuned because of this, though GPT-4 might already have similar things as part of its own training.