Could you elaborate on what it would mean to demonstrate ‘savannah-to-boardroom’ transfer? Our architecture was selected for in the wilds of nature, not our training data. To me it seems that when we use an architecture designed for language translation for understanding images we’ve demonstrated a similar degree of transfer.
I agree that we’re not yet there on sample efficient learning in new domains (which I think is more what you’re pointing at) but I’d like to be clearer on what benchmarks would show this. For example, how well GPT-4 can integrate a new domain of knowledge from (potentially multiple epochs of training on) a single textbook seems a much better test and something that I genuinely don’t know the answer to.
Could you elaborate on what it would mean to demonstrate ‘savannah-to-boardroom’ transfer? Our architecture was selected for in the wilds of nature, not our training data. To me it seems that when we use an architecture designed for language translation for understanding images we’ve demonstrated a similar degree of transfer.
I agree that we’re not yet there on sample efficient learning in new domains (which I think is more what you’re pointing at) but I’d like to be clearer on what benchmarks would show this. For example, how well GPT-4 can integrate a new domain of knowledge from (potentially multiple epochs of training on) a single textbook seems a much better test and something that I genuinely don’t know the answer to.