GPT-4 will be unable to contribute to the core cognitive tasks involved in AI programming.
If you ask GPT-4 to generate ideas for how to improve itself, it will always (10/10) suggest things that an AI researcher considers very unlikely to work.
If you ask GPT-4 to evaluate ideas for improvement that are generated by an AI researcher, the feedback will be of no practical use.
Likewise, every suggestion for how to get more data or compute, or be more efficient with data or compute, will be judged by an AI researcher as hopeless.
If you ask GPT-4 to suggest performance improvements to the core part of its own code, every single of these will be very weak at best.
If there is an accompanying paper for GPT-4, there will be no prompts possible that would make GPT-4 suggest meaningful improvements to this paper.
I assign 95% to each of these statements. I expect we will not be seeing the start of a textbook takeoff in August.
I suspect you are very wrong on this for the reason there are a lot of things that have not yet been tried that are both
Obvious in the literature
Require a lot of money to try
An AI researcher would say might work
I would expect GPT-4 can generate items from this class. Since it’s training cutoff is 2021 it won’t have bleeding edge ideas because it lacks the information.
I think you are probably overconfident mostly because of the use of the term ‘every’ in some of these clauses. Consider that if GPT-4 is trained on arxiv, it could plausibly make many many research suggestions. And all it would need to do in order to disprove the extremely generally worded clause 3 would be to eventually generate one such research suggestion that improves ‘compute’ (hardware or software efficiency), which eventually becomes a certainty with enough suggestions. So essentially you are betting that GPT-4 is not trained on arxiv.
I should have explained what I mean by “always (10/10)”: If you generate 10 completions, you expect with 95% confidence that all 10 satisfies the criteria.
All the absolute statements in my post should be turned down from 100% to 99.5%. My intuition is that if less than 1 in 200 ideas are valuable, it will not be worthwhile to have the model contribute to improving itself.
Intelligence Amplification
GPT-4 will be unable to contribute to the core cognitive tasks involved in AI programming.
If you ask GPT-4 to generate ideas for how to improve itself, it will always (10/10) suggest things that an AI researcher considers very unlikely to work.
If you ask GPT-4 to evaluate ideas for improvement that are generated by an AI researcher, the feedback will be of no practical use.
Likewise, every suggestion for how to get more data or compute, or be more efficient with data or compute, will be judged by an AI researcher as hopeless.
If you ask GPT-4 to suggest performance improvements to the core part of its own code, every single of these will be very weak at best.
If there is an accompanying paper for GPT-4, there will be no prompts possible that would make GPT-4 suggest meaningful improvements to this paper.
I assign 95% to each of these statements. I expect we will not be seeing the start of a textbook takeoff in August.
There’s a bit of an epistemic shadow here. If a capability is publicly announced and available, then it can’t be the keystone to a fast takeoff.
The epistemic shadow argument further requires that the fast takeoff leads to something close to extinction.
This is not the least impressive thing I expect GPT-4 won’t be able to do~.
I suspect you are very wrong on this for the reason there are a lot of things that have not yet been tried that are both
Obvious in the literature
Require a lot of money to try
An AI researcher would say might work
I would expect GPT-4 can generate items from this class. Since it’s training cutoff is 2021 it won’t have bleeding edge ideas because it lacks the information.
Do you have a prompt?
I think you are probably overconfident mostly because of the use of the term ‘every’ in some of these clauses. Consider that if GPT-4 is trained on arxiv, it could plausibly make many many research suggestions. And all it would need to do in order to disprove the extremely generally worded clause 3 would be to eventually generate one such research suggestion that improves ‘compute’ (hardware or software efficiency), which eventually becomes a certainty with enough suggestions. So essentially you are betting that GPT-4 is not trained on arxiv.
I should have explained what I mean by “always (10/10)”: If you generate 10 completions, you expect with 95% confidence that all 10 satisfies the criteria.
All the absolute statements in my post should be turned down from 100% to 99.5%. My intuition is that if less than 1 in 200 ideas are valuable, it will not be worthwhile to have the model contribute to improving itself.