General meta-problem of such discussions is that direct counterargument to “LLMs are safe” is to tell how to make LLM unsafe, and it’s not a good practice.
Or to point to a situation where LLMs exhibit unsafe behavior in a realistic usage scenario. We don’t say
a problem with discussions of fire safety is that a direct counterargument to “balloon-framed wood buildings are safe” is to tell arsonists the best way that they can be lit on fire
With every technology there is a way to make it stop working. There are any number of ways to make a plane unable to fly. But the important thing is that we know a way to make a plane fly—therefore humans can fly via a plane.
Likewise, the point that LLM-based-architecture can in principle be safe still stands even if there is a way to make an unsafe LLM-based-architecture.
And this is a huge point. Previously we were in a state where alignment wasn’t even a tractable problem. Where capabilities progressed and alignment stayed in the dirt. Where AI system may understand human values but still not care about them and we didn’t know what to do with it.
But now we can just
have an “ethics module” where the underlying LLM produces text which then feeds into other parts of the system to help guide behavior.
Which makes alignment tractable. Alignment can now be reduced to capability of the ethics module. We know that the system will care about our values as it understands them because we can explicitly code it to do this way via an if-else statement. This is an enormous improvement over the previous status quo.
I feel like I am a victim of transparency illusion.
First part of OP argument is “LLMs need data, data is limited and synthetic data is meh”. Direct counterargument to this is “here is how to avoid drawbacks of sythetic data”.
Second part of OP argument is “LLMs are humanlike and will remain so”, and direct counterargument is “here is how to make LLMs more capable but less humanlike, it will be adopted because it makes LLMs more capable”.
Walking around telling everyone ideas of how to make AI more capable and less alignable is pretty much ill-adviced.
“here is how to make LLMs more capable but less humanlike, it will be adopted because it makes LLMs more capable”.
Thankfully, this is a class of problems that humanity has an experience dealing with. The solution boils down to regulating all the ways to make LLMs less human-like out of existence.
You mean, “ban superintelligence”? Because superintelligences are not human-like.
That’s the problem with your proposal of “ethics module”. Let’s suppose that we have system of “ethics module” and “nanotech design module”. Nanotech design module outputs 3D-model of supramolecular unholy abomination. What exactly should ethics module do to ensure that this abomination doesn’t kill everyone? Tell nanotech module “pls don’t kill people”? You are going to have hard time translating this into nanotech designer internal language. Make ethics module sufficiently smart to analyse behavior of complex molecular structures in wide range of environments? You have now all problems with alignment of superintelligences.
You mean, “ban superintelligence”? Because superintelligences are not human-like.
The kind of superintelligence that doesn’t possess human-likeness that we want it to possess.
That’s the problem with your proposal of “ethics module”. Let’s suppose that we have system of “ethics module” and “nanotech design module”. Nanotech design module outputs 3D-model of supramolecular unholy abomination. What exactly should ethics module do to ensure that this abomination doesn’t kill everyone?
Nanotech design module has to be evaluatable by the ethics module. For that it also be made from multiple sequential LLM calls in explicit natural language. Other type of modules should be banned.
General meta-problem of such discussions is that direct counterargument to “LLMs are safe” is to tell how to make LLM unsafe, and it’s not a good practice.
Or to point to a situation where LLMs exhibit unsafe behavior in a realistic usage scenario. We don’t say
buildings are not typically built by arsonists
With every technology there is a way to make it stop working. There are any number of ways to make a plane unable to fly. But the important thing is that we know a way to make a plane fly—therefore humans can fly via a plane.
Likewise, the point that LLM-based-architecture can in principle be safe still stands even if there is a way to make an unsafe LLM-based-architecture.
And this is a huge point. Previously we were in a state where alignment wasn’t even a tractable problem. Where capabilities progressed and alignment stayed in the dirt. Where AI system may understand human values but still not care about them and we didn’t know what to do with it.
But now we can just
Which makes alignment tractable. Alignment can now be reduced to capability of the ethics module. We know that the system will care about our values as it understands them because we can explicitly code it to do this way via an if-else statement. This is an enormous improvement over the previous status quo.
I feel like I am a victim of transparency illusion. First part of OP argument is “LLMs need data, data is limited and synthetic data is meh”. Direct counterargument to this is “here is how to avoid drawbacks of sythetic data”. Second part of OP argument is “LLMs are humanlike and will remain so”, and direct counterargument is “here is how to make LLMs more capable but less humanlike, it will be adopted because it makes LLMs more capable”. Walking around telling everyone ideas of how to make AI more capable and less alignable is pretty much ill-adviced.
Thankfully, this is a class of problems that humanity has an experience dealing with. The solution boils down to regulating all the ways to make LLMs less human-like out of existence.
You mean, “ban superintelligence”? Because superintelligences are not human-like.
That’s the problem with your proposal of “ethics module”. Let’s suppose that we have system of “ethics module” and “nanotech design module”. Nanotech design module outputs 3D-model of supramolecular unholy abomination. What exactly should ethics module do to ensure that this abomination doesn’t kill everyone? Tell nanotech module “pls don’t kill people”? You are going to have hard time translating this into nanotech designer internal language. Make ethics module sufficiently smart to analyse behavior of complex molecular structures in wide range of environments? You have now all problems with alignment of superintelligences.
The kind of superintelligence that doesn’t possess human-likeness that we want it to possess.
Nanotech design module has to be evaluatable by the ethics module. For that it also be made from multiple sequential LLM calls in explicit natural language. Other type of modules should be banned.