One of the things I find really hard about tech forecasting is that most of tech adoption is driven by market forces / comparative economics (“is solar cheaper than coal?”), but raw possibility / distance in the tech tree is easier to predict (“could more than half of schools be online?”). For about the last ten years we could have had the majority of meetings and classes online if we wanted to, but we didn’t want to—until recently. Similarly, people correctly called that the Internet would enable remote work, in a way that could make ‘towns’ the winners and ‘big cities’ the losers—but they incorrectly called that people would prefer remote work to in-person work, and towns to big cities.
[A similar thing happened to me with music-generation AI; for years I think we’ve been in a state where people could have taken off-the-shelf method A and done something interesting with it on a huge music dataset, but I think everyone with a huge music dataset cares more about their relationship with music producers than they do about making the next step of algorithmic music.]
for years I think we’ve been in a state where people could have taken off-the-shelf method A and done something interesting with it on a huge music dataset
Absolutely. I got decent enough results just tinkering with GPT-2, and OpenAI’s Jukebox could have been done at smaller scale years ago, and OA could presumably do a lot better right now if they had a few million to spare (Jukebox has only ~7b parameters, while GPT-3 has 175b, and Jukebox is pretty close to human-level so just another 10x seems like it’d make it an extremely useful tool commercially).
Riffusion is basically exactly what I was thinking of here (in terms of input-output behavior; the internals are more sophisticated than what I was thinking of at the time).
One of the things I find really hard about tech forecasting is that most of tech adoption is driven by market forces / comparative economics (“is solar cheaper than coal?”), but raw possibility / distance in the tech tree is easier to predict (“could more than half of schools be online?”). For about the last ten years we could have had the majority of meetings and classes online if we wanted to, but we didn’t want to—until recently. Similarly, people correctly called that the Internet would enable remote work, in a way that could make ‘towns’ the winners and ‘big cities’ the losers—but they incorrectly called that people would prefer remote work to in-person work, and towns to big cities.
[A similar thing happened to me with music-generation AI; for years I think we’ve been in a state where people could have taken off-the-shelf method A and done something interesting with it on a huge music dataset, but I think everyone with a huge music dataset cares more about their relationship with music producers than they do about making the next step of algorithmic music.]
Absolutely. I got decent enough results just tinkering with GPT-2, and OpenAI’s Jukebox could have been done at smaller scale years ago, and OA could presumably do a lot better right now if they had a few million to spare (Jukebox has only ~7b parameters, while GPT-3 has 175b, and Jukebox is pretty close to human-level so just another 10x seems like it’d make it an extremely useful tool commercially).
Riffusion is basically exactly what I was thinking of here (in terms of input-output behavior; the internals are more sophisticated than what I was thinking of at the time).