Note that if general intelligence can be harnessed effectively and safely through role architectures, and if irresponsible parties will neglect AGI-agent safety regardless of its feasibility, then solving the classic AGI-agent alignment problem is neither necessary nor sufficient for safe application of strong AI capabilities.[12] And If this is true, then it is important that we update our assumptions and priorities in far-reaching ways.
If sufficiently powerful LLMs are created in open-source, I do not doubt that there will be dozens, if not hundreds of people who, playing Herostratus, will attempt to make self-protecting, self-preserving, self-improving, and explicitly malign agents out of these LLMs, just because.
The real question for me is whether the open-source and capability AI research communities are on track to creating such agents. My current intuition is that the sheer amount of attention and academic and hacking effort poured into improving the LLM algorithms in the next few years (unless strict limitations on conducting such research are imposed immediately, or there is a widespread movement among the capability researchers to ditch this direction of work voluntarily, as David Duvenaud did recently) can with good probability lead to about an order of magnitude efficiency gain in LLM algorithms (in terms of parameters, compute, or data required to achieve a certain level of capability). Then, models more capable than GPT-4 will probably become within reach of open-source or open-source-sympathetic labs, such as FAIR or Stability.
Given the timelines (I assign perhaps only 30% probability that open-source and academic communities will not reach this capability mark in the next five years, in the “business as usual” scenario), I tend to agree with Yudkowsky that furthering LLM capabilities will likely lead to catastrophe, and agree with his policy proposals.
His proposal of turning away from generic LLM capabilities back to narrow AI like AlphaFold (and new developments by isomorphicLabs) also seems reasonable to me.
If sufficiently powerful LLMs are created in open-source, I do not doubt that there will be dozens, if not hundreds of people who, playing Herostratus, will attempt to make self-protecting, self-preserving, self-improving, and explicitly malign agents out of these LLMs, just because.
The real question for me is whether the open-source and capability AI research communities are on track to creating such agents. My current intuition is that the sheer amount of attention and academic and hacking effort poured into improving the LLM algorithms in the next few years (unless strict limitations on conducting such research are imposed immediately, or there is a widespread movement among the capability researchers to ditch this direction of work voluntarily, as David Duvenaud did recently) can with good probability lead to about an order of magnitude efficiency gain in LLM algorithms (in terms of parameters, compute, or data required to achieve a certain level of capability). Then, models more capable than GPT-4 will probably become within reach of open-source or open-source-sympathetic labs, such as FAIR or Stability.
Given the timelines (I assign perhaps only 30% probability that open-source and academic communities will not reach this capability mark in the next five years, in the “business as usual” scenario), I tend to agree with Yudkowsky that furthering LLM capabilities will likely lead to catastrophe, and agree with his policy proposals.
His proposal of turning away from generic LLM capabilities back to narrow AI like AlphaFold (and new developments by isomorphicLabs) also seems reasonable to me.