Rafael Harth comments on Rafael Harth’s Shortform

Rafael Harth 16 Feb 2024 22:46 UTC
3 points

Current LLMs including GPT-4 and Gemini are generative pre-trained transformers; other architectures available include recurrent neural networks and a state space model. Are you addressing primarily GPTs or also the other variants (which have only trained smaller large language models currently)? Or anything that trains based on language input and statistical prediction?

Definitely including other variants.

Another current model is Sora, a diffusion transformer. Does this ‘count as’ one of the models being made predictions about, and does it count as having LLM technology incorporated?

Happy to include Sora as well

Natural language modeling seems generally useful, as does size; what specifically do you not expect to be incorporated into future AI systems?

Anything that looks like current architectures. If language modeling capabilities of future AGIs aren’t implemented by neural networks at all, I get full points here; if they are, there’ll be room to debate how much they have in common with current models. (And note that I’m not necessarily expecting they won’t be incorporated; I did mean “may” as in “significant probability”, not necessarily above 50%.)

Conversely...

Or anything that trains based on language input and statistical prediction?

… I’m not willing to go this far since that puts almost no restriction on the architecture other than that it does some kind of training.

What does ‘scaled up’ mean? Literally just making bigger versions of the same thing and training them more, or are you including algorithmic and data curriculum improvements on the same paradigm? Scaffolding?

I’m most confident that pure scaling won’t be enough, but yeah I’m also including the application of known techniques. You can operationalize it as claiming that AGI will require new breakthroughs, although I realize this isn’t a precise statement.

We are going to eventually decide on something to call AGIs, and in hindsight we will judge that GPT-4 etc do not qualify. Do you expect we will be more right about this in the future than the past, or as our AI capabilities increase, do you expect that we will have increasingly high standards about this?

Don’t really want to get into the mechanism, but yes to the first sentence.