I think you’ve had more luck than me when trying to get chatGPT to correct its own mistakes. When I tried making it play chess, I told it to “be sure not to output your move before writing a paragraph of analysis on the current board position, and output 5 good moves and the reasoning behind them, all of this before giving me your final move.” Then after it chose its move I told it “are you sure this is a legal move? and is this really the best move?”, it pretty much never changed its answer, and never managed to figure out that its illegal moves were illegal. If I straight-up told it “this move is illegal”, it would excuse itself and output something else, and sometimes it correctly understood why its move was illegal, but not always.
so do we care about infinite length completions?
The inability of the GPT series to generate infinite length completions is crucial for safety! If humans fundamentally need to be in the loop for GPT to give us good outputs for things like scientific reasoning, then it makes the whole thing suddenly way safer, and we can be assured that there isn’t an instance of GPT running on some amazon server self-improving itself by just doing a thousand years of scientific progress in a week.
Does the inability of the GPT series to generate infinite length completions require that humans specifically remain in the loop, or just that the external world must remain in the loop in some way which gets the model back into the distribution? Because if it’s the latter case I think you still have to worry about some instance running on a cloud server somewhere.
I think you’ve had more luck than me when trying to get chatGPT to correct its own mistakes. When I tried making it play chess, I told it to “be sure not to output your move before writing a paragraph of analysis on the current board position, and output 5 good moves and the reasoning behind them, all of this before giving me your final move.” Then after it chose its move I told it “are you sure this is a legal move? and is this really the best move?”, it pretty much never changed its answer, and never managed to figure out that its illegal moves were illegal. If I straight-up told it “this move is illegal”, it would excuse itself and output something else, and sometimes it correctly understood why its move was illegal, but not always.
The inability of the GPT series to generate infinite length completions is crucial for safety! If humans fundamentally need to be in the loop for GPT to give us good outputs for things like scientific reasoning, then it makes the whole thing suddenly way safer, and we can be assured that there isn’t an instance of GPT running on some amazon server self-improving itself by just doing a thousand years of scientific progress in a week.
Does the inability of the GPT series to generate infinite length completions require that humans specifically remain in the loop, or just that the external world must remain in the loop in some way which gets the model back into the distribution? Because if it’s the latter case I think you still have to worry about some instance running on a cloud server somewhere.