I agree that self-improvement is an assumption that probably deserves its own blog post. If you believe exponential self improvement will kick in at some point, then you can consider this discussion as pertaining until the point that it happens.
My own sense is that:
While we might not be super close to them, there are probably fundamental limits to how much intelligence you can pack per FLOP. I don’t believe there is a small C program that is human-level intelligent. In fact, since both AI and evolution seem to have arrived at roughly similar magnitude, maybe we are not that far off? If there are such limits, then no matter how smart the “AI AI-researchers” are, they still won’t be able to get more intelligence per FLOP than these limits.
I do think that AI AI-researchers will be incomparable to human AI-researchers in a similar manner to other professions. The simplistic view that AI research or any form of research as one-dimensional, where people can be sorted by an ELO-like scale, is dead wrong based on my 25 years of experience. Yes, some aspects of AI research might be easier to automate, and we will certainly use AI to automate them and make AI researchers more productive. But, like the vast majority of human professions (with all due respect to elevator operators :) ), I don’t think human AI researchers will be obsolete any time soon.
p.s. I also noticed this “2 comments”—not sure what’s going on. Maybe my footnotes count as comments?
If you believe exponential self improvement will kick in at some point, then you can consider this discussion as pertaining until the point that it happens.
Yes, this makes sense.
The simplistic view that AI research or any form of research as one-dimensional, where people can be sorted by an ELO-like scale, is dead wrong based on my 25 years of experience.
I agree with that. On the other hand, if one starts creating LLM-based “artificial AI researchers”, one would probably create diverse teams of collaborating “artificial AI researchers” in the spirit of multi-agent LLM-based architectures, for example, in the spirit of Multiagent Debate or Mindstorms in Natural Language-Based Societies of Mind or Multi-Persona Self-Collaboration or other work in that direction.
So, one would try to reproduce the whole teams of engineers and researchers, with diverse participants.
I don’t believe there is a small C program that is human-level intelligent.
I am not sure. Let’s consider the shift from traditional neural nets to Transformers.
In terms of expressive power, there is an available shift of similar magnitude in the space of neural machines from Transformers to “flexible attention machines”(those can be used as continuously deformable general-purpose dataflow programs, and they can be very compact, and they also allow for very fluent self-modification). No one is using those “flexible attention machines” for serious machine learning work (as far as I know), mostly because no one optimized them to make them GPU-friendly at their maximal generality (again as far as I know), but at some point people will figure that out (probably by rediscovering the whole thing from scratch rather than by reading the overlooked arXiv preprints and building on top of that).
It might be that one would consider a hybrid between such a machine and a more traditional Transformer (the Transformer part will be opaque, just like today, but the “flexible neural machine” might be very compact and transparent). I am agnostic on how far one could push all this, but the potential there is strong enough to be an argument against making a firm bet against this possibility.
And there might be some alternative routes to “compact AI with an LLM as an oracle” (I describe the route I understand reasonably well, but it does not have to be the only one).
>On the other hand, if one starts creating LLM-based “artificial AI researchers”, one would probably create diverse teams of collaborating “artificial AI researchers” in the spirit of multi-agent LLM-based architectures,.. So, one would try to reproduce the whole teams of engineers and researchers, with diverse participants.
I think this can be an approach to create a diversity of styles, but not necessarily of capabilities. A bit of prompt engineering telling the model to pretend to be some expert X can help in some benchmarks but the returns diminish very quickly. So you can have a model pretending to be this type of person and that but they will suck at Tic-Tac-Toe. (For example, GPT4 doesn’t know to recognize a winning move even when I tell it to play like Terence Tao.)
Regarding the existence of compact ML programs, I agree that it is not known. I would say however that the main benefit of architectures like transformers hasn’t been so much to save in the total number of FLOPs as much as to organize these FLOPs so they are best suited for modern GPUs—that is ensure that the majority of the FLOPs are spent multiplying dense matrices.
Yes, I just did confirm that even turning Code Interpreter on does not seem to help with recognition of a winning move at Tic-Tac-Toe (even when I tell it to play like a Tic-Tac-Toe expert). Although, it did not try to generate and run any Python (perhaps, it needs to be additionally pushed towards doing that).
A more sophisticated prompt engineering might do it, but it does not work well enough on its own on this task.
Returning to “artificial researchers based on LLMs”, I would expect the need for more sophisticated prompts, not just reference to a person, but some set of technical texts and examples of reasoning to focus on (and learning to generate better long prompts of this kind would be a part of self-improvement, although I would expect the bulk of self-improvement to come from designing smarter relatively compact neural machines interfacing with LLMs and smarter schemes of connectivity between them and LLMs (I expect an LLM in question to be open and not hidden by an opaque API, so that one would be able to read from any layer/inject into any layer)).
One can make all sorts of guesses but based on the evidence so far, AIs have a different skill profile than humans. This means if we think of any job a which requires a large set of skills, then for a long period of time, even if AIs beat the human average in some of them, they will perform worse than humans in others.
Yes, at least that’s the hope (that there will be need for joint teams and for finding some mutual accommodation and perhaps long-term mutual interest between them and us; basically, the hope that Copilot-style architecture will be essential for long time to come)...
I agree that self-improvement is an assumption that probably deserves its own blog post. If you believe exponential self improvement will kick in at some point, then you can consider this discussion as pertaining until the point that it happens.
My own sense is that:
While we might not be super close to them, there are probably fundamental limits to how much intelligence you can pack per FLOP. I don’t believe there is a small C program that is human-level intelligent. In fact, since both AI and evolution seem to have arrived at roughly similar magnitude, maybe we are not that far off? If there are such limits, then no matter how smart the “AI AI-researchers” are, they still won’t be able to get more intelligence per FLOP than these limits.
I do think that AI AI-researchers will be incomparable to human AI-researchers in a similar manner to other professions. The simplistic view that AI research or any form of research as one-dimensional, where people can be sorted by an ELO-like scale, is dead wrong based on my 25 years of experience. Yes, some aspects of AI research might be easier to automate, and we will certainly use AI to automate them and make AI researchers more productive. But, like the vast majority of human professions (with all due respect to elevator operators :) ), I don’t think human AI researchers will be obsolete any time soon.
p.s. I also noticed this “2 comments”—not sure what’s going on. Maybe my footnotes count as comments?
Yes, this makes sense.
I agree with that. On the other hand, if one starts creating LLM-based “artificial AI researchers”, one would probably create diverse teams of collaborating “artificial AI researchers” in the spirit of multi-agent LLM-based architectures, for example, in the spirit of Multiagent Debate or Mindstorms in Natural Language-Based Societies of Mind or Multi-Persona Self-Collaboration or other work in that direction.
So, one would try to reproduce the whole teams of engineers and researchers, with diverse participants.
I am not sure. Let’s consider the shift from traditional neural nets to Transformers.
In terms of expressive power, there is an available shift of similar magnitude in the space of neural machines from Transformers to “flexible attention machines”(those can be used as continuously deformable general-purpose dataflow programs, and they can be very compact, and they also allow for very fluent self-modification). No one is using those “flexible attention machines” for serious machine learning work (as far as I know), mostly because no one optimized them to make them GPU-friendly at their maximal generality (again as far as I know), but at some point people will figure that out (probably by rediscovering the whole thing from scratch rather than by reading the overlooked arXiv preprints and building on top of that).
It might be that one would consider a hybrid between such a machine and a more traditional Transformer (the Transformer part will be opaque, just like today, but the “flexible neural machine” might be very compact and transparent). I am agnostic on how far one could push all this, but the potential there is strong enough to be an argument against making a firm bet against this possibility.
And there might be some alternative routes to “compact AI with an LLM as an oracle” (I describe the route I understand reasonably well, but it does not have to be the only one).
>On the other hand, if one starts creating LLM-based “artificial AI researchers”, one would probably create diverse teams of collaborating “artificial AI researchers” in the spirit of multi-agent LLM-based architectures,.. So, one would try to reproduce the whole teams of engineers and researchers, with diverse participants.
I think this can be an approach to create a diversity of styles, but not necessarily of capabilities. A bit of prompt engineering telling the model to pretend to be some expert X can help in some benchmarks but the returns diminish very quickly. So you can have a model pretending to be this type of person and that but they will suck at Tic-Tac-Toe. (For example, GPT4 doesn’t know to recognize a winning move even when I tell it to play like Terence Tao.)
Regarding the existence of compact ML programs, I agree that it is not known. I would say however that the main benefit of architectures like transformers hasn’t been so much to save in the total number of FLOPs as much as to organize these FLOPs so they are best suited for modern GPUs—that is ensure that the majority of the FLOPs are spent multiplying dense matrices.
Yes, I just did confirm that even turning Code Interpreter on does not seem to help with recognition of a winning move at Tic-Tac-Toe (even when I tell it to play like a Tic-Tac-Toe expert). Although, it did not try to generate and run any Python (perhaps, it needs to be additionally pushed towards doing that).
A more sophisticated prompt engineering might do it, but it does not work well enough on its own on this task.
Returning to “artificial researchers based on LLMs”, I would expect the need for more sophisticated prompts, not just reference to a person, but some set of technical texts and examples of reasoning to focus on (and learning to generate better long prompts of this kind would be a part of self-improvement, although I would expect the bulk of self-improvement to come from designing smarter relatively compact neural machines interfacing with LLMs and smarter schemes of connectivity between them and LLMs (I expect an LLM in question to be open and not hidden by an opaque API, so that one would be able to read from any layer/inject into any layer)).
One can make all sorts of guesses but based on the evidence so far, AIs have a different skill profile than humans. This means if we think of any job a which requires a large set of skills, then for a long period of time, even if AIs beat the human average in some of them, they will perform worse than humans in others.
Yes, at least that’s the hope (that there will be need for joint teams and for finding some mutual accommodation and perhaps long-term mutual interest between them and us; basically, the hope that Copilot-style architecture will be essential for long time to come)...