I’m not sure exactly where I land on this, but I think it’s important to consider that restricting the data companies can train on could influence the architectures they use. Self-supervised autoregressive models a-la GPT-3 seem a lot more benign than full-fledged RL agents. The latter is a lot less data hungry than the former (especially in terms of copyrighted data). There are enough other factors here to not make me completely confident in this analysis, but it’s worth thinking about.
I’m leaning toward the current paradigm being preferable to a full-fledged RL one, but want to add a point—one of my best guesses for proto-AGI involves massive LLMs hooked up to some RL system. This might not require RL capabilities on the same level of complexity as pure RL agents, and RL is still being worked on today.
I agree, but this is a question of timelines too. Within the LLM + RL paradigm we may not need AGI-level RL or LLMs that can accessibly simulate AGI-level simulacra just from self-supervised learning, both of which would take longer than many points requiring intermediate levels of LLM and RL capabilities, because people are still working on RL stuff now.
I’m not sure exactly where I land on this, but I think it’s important to consider that restricting the data companies can train on could influence the architectures they use. Self-supervised autoregressive models a-la GPT-3 seem a lot more benign than full-fledged RL agents. The latter is a lot less data hungry than the former (especially in terms of copyrighted data). There are enough other factors here to not make me completely confident in this analysis, but it’s worth thinking about.
I’m leaning toward the current paradigm being preferable to a full-fledged RL one, but want to add a point—one of my best guesses for proto-AGI involves massive LLMs hooked up to some RL system. This might not require RL capabilities on the same level of complexity as pure RL agents, and RL is still being worked on today.
Agree, but LLM + RL is still preferable to muzero-style AGI.
I agree, but this is a question of timelines too. Within the LLM + RL paradigm we may not need AGI-level RL or LLMs that can accessibly simulate AGI-level simulacra just from self-supervised learning, both of which would take longer than many points requiring intermediate levels of LLM and RL capabilities, because people are still working on RL stuff now.