>On the other hand, if one starts creating LLM-based “artificial AI researchers”, one would probably create diverse teams of collaborating “artificial AI researchers” in the spirit of multi-agent LLM-based architectures,.. So, one would try to reproduce the whole teams of engineers and researchers, with diverse participants.
I think this can be an approach to create a diversity of styles, but not necessarily of capabilities. A bit of prompt engineering telling the model to pretend to be some expert X can help in some benchmarks but the returns diminish very quickly. So you can have a model pretending to be this type of person and that but they will suck at Tic-Tac-Toe. (For example, GPT4 doesn’t know to recognize a winning move even when I tell it to play like Terence Tao.)
Regarding the existence of compact ML programs, I agree that it is not known. I would say however that the main benefit of architectures like transformers hasn’t been so much to save in the total number of FLOPs as much as to organize these FLOPs so they are best suited for modern GPUs—that is ensure that the majority of the FLOPs are spent multiplying dense matrices.
Yes, I just did confirm that even turning Code Interpreter on does not seem to help with recognition of a winning move at Tic-Tac-Toe (even when I tell it to play like a Tic-Tac-Toe expert). Although, it did not try to generate and run any Python (perhaps, it needs to be additionally pushed towards doing that).
A more sophisticated prompt engineering might do it, but it does not work well enough on its own on this task.
Returning to “artificial researchers based on LLMs”, I would expect the need for more sophisticated prompts, not just reference to a person, but some set of technical texts and examples of reasoning to focus on (and learning to generate better long prompts of this kind would be a part of self-improvement, although I would expect the bulk of self-improvement to come from designing smarter relatively compact neural machines interfacing with LLMs and smarter schemes of connectivity between them and LLMs (I expect an LLM in question to be open and not hidden by an opaque API, so that one would be able to read from any layer/inject into any layer)).
One can make all sorts of guesses but based on the evidence so far, AIs have a different skill profile than humans. This means if we think of any job a which requires a large set of skills, then for a long period of time, even if AIs beat the human average in some of them, they will perform worse than humans in others.
Yes, at least that’s the hope (that there will be need for joint teams and for finding some mutual accommodation and perhaps long-term mutual interest between them and us; basically, the hope that Copilot-style architecture will be essential for long time to come)...
>On the other hand, if one starts creating LLM-based “artificial AI researchers”, one would probably create diverse teams of collaborating “artificial AI researchers” in the spirit of multi-agent LLM-based architectures,.. So, one would try to reproduce the whole teams of engineers and researchers, with diverse participants.
I think this can be an approach to create a diversity of styles, but not necessarily of capabilities. A bit of prompt engineering telling the model to pretend to be some expert X can help in some benchmarks but the returns diminish very quickly. So you can have a model pretending to be this type of person and that but they will suck at Tic-Tac-Toe. (For example, GPT4 doesn’t know to recognize a winning move even when I tell it to play like Terence Tao.)
Regarding the existence of compact ML programs, I agree that it is not known. I would say however that the main benefit of architectures like transformers hasn’t been so much to save in the total number of FLOPs as much as to organize these FLOPs so they are best suited for modern GPUs—that is ensure that the majority of the FLOPs are spent multiplying dense matrices.
Yes, I just did confirm that even turning Code Interpreter on does not seem to help with recognition of a winning move at Tic-Tac-Toe (even when I tell it to play like a Tic-Tac-Toe expert). Although, it did not try to generate and run any Python (perhaps, it needs to be additionally pushed towards doing that).
A more sophisticated prompt engineering might do it, but it does not work well enough on its own on this task.
Returning to “artificial researchers based on LLMs”, I would expect the need for more sophisticated prompts, not just reference to a person, but some set of technical texts and examples of reasoning to focus on (and learning to generate better long prompts of this kind would be a part of self-improvement, although I would expect the bulk of self-improvement to come from designing smarter relatively compact neural machines interfacing with LLMs and smarter schemes of connectivity between them and LLMs (I expect an LLM in question to be open and not hidden by an opaque API, so that one would be able to read from any layer/inject into any layer)).
One can make all sorts of guesses but based on the evidence so far, AIs have a different skill profile than humans. This means if we think of any job a which requires a large set of skills, then for a long period of time, even if AIs beat the human average in some of them, they will perform worse than humans in others.
Yes, at least that’s the hope (that there will be need for joint teams and for finding some mutual accommodation and perhaps long-term mutual interest between them and us; basically, the hope that Copilot-style architecture will be essential for long time to come)...