The training procedure they used seems to me analogous to what would happen if a human tried to solve problems using different approaches, then learned based on what approaches converged on the same answer.
Due to the fact that no external information is being added (aside from the prompting), and that the model updates based on majority voting, this seems like it takes a network whose model of the world is very inconsistent, and forces the network to make its model of the world more consistent, leading to improvements in performance.
One assumption here is that, if you start off with a world model that’s vaguely describing the real world, and force consistency on it, it will become a more accurate world model. I think this is very likely to be true.
My weak conclusions are:
Curated data for fine-tuning is now less of a bottleneck, as human-made tailored data (made by MTurk or undergrads) can be partly replaced with data that the network outputs (after training it on a large corpus).
Compute also seems less of a bottleneck as “self-improvement” leads to an order of magnitude fewer parameters needed for the same performance.
These two (plus the incoming wave of people trying to replicate or improve on the methods in the paper) would imply slightly shorter timelines, and much shorter timelines in worlds where most of the data bottleneck is in data for finetuning.
This might be good for alignment (ignoring timelines getting shorter) as Chain-of-Thought reasoning is more easily interpretable, and if we can actually manage to force a model to do chain of thought reasoning and have it match up with what it’s outputting, this would be a big win.
Some quick thoughts after reading the paper:
The training procedure they used seems to me analogous to what would happen if a human tried to solve problems using different approaches, then learned based on what approaches converged on the same answer.
Due to the fact that no external information is being added (aside from the prompting), and that the model updates based on majority voting, this seems like it takes a network whose model of the world is very inconsistent, and forces the network to make its model of the world more consistent, leading to improvements in performance.
One assumption here is that, if you start off with a world model that’s vaguely describing the real world, and force consistency on it, it will become a more accurate world model. I think this is very likely to be true.
My weak conclusions are:
Curated data for fine-tuning is now less of a bottleneck, as human-made tailored data (made by MTurk or undergrads) can be partly replaced with data that the network outputs (after training it on a large corpus).
Compute also seems less of a bottleneck as “self-improvement” leads to an order of magnitude fewer parameters needed for the same performance.
These two (plus the incoming wave of people trying to replicate or improve on the methods in the paper) would imply slightly shorter timelines, and much shorter timelines in worlds where most of the data bottleneck is in data for finetuning.
This might be good for alignment (ignoring timelines getting shorter) as Chain-of-Thought reasoning is more easily interpretable, and if we can actually manage to force a model to do chain of thought reasoning and have it match up with what it’s outputting, this would be a big win.