We train an LLM to be an expert on AI design and wisdom. We might do this by feeding it AI research papers and “wisdom texts”, like principled arguments about wise behavior and stories of people behaving wisely, over and above those base models already have access to, and then fine tuning to prioritize giving wise responses.
We simultaneously train some AI safety researchers to be wiser.
Our wise AI safety researchers use this LLM as an assistant to help them think through how to design a superintelligent AI that would embody the wisdom necessary to be safe.
Iterate as necessary, using wisdom and understanding developed with the use of less wise AI to train more wise AI.
First, I wanted to dismiss this as not addressing the problem at all but on second thought, I think a key insight here may be that adding a focus on improving the wisdom of relevant parties involved in AI development could help to bootstrap more trustworthy “alignment verification” capacities.
However, I am not sure that something like this would fly in our economically oriented societies since I would expect that wiser people would decline to develop super-intelligent AI for the foreseeable future and rather urge us to look inward as the space to look for solutions to most of our problems (almost all of our problems are man-made after all). Having said this, if we were to get a regime in place that could reliably ensure that “wisdom” plays a key role in decision making around AI development, this seems like good a bet as any to help us deal with our predicament.
First, I wanted to dismiss this as not addressing the problem at all but on second thought, I think a key insight here may be that adding a focus on improving the wisdom of relevant parties involved in AI development could help to bootstrap more trustworthy “alignment verification” capacities.
However, I am not sure that something like this would fly in our economically oriented societies since I would expect that wiser people would decline to develop super-intelligent AI for the foreseeable future and rather urge us to look inward as the space to look for solutions to most of our problems (almost all of our problems are man-made after all). Having said this, if we were to get a regime in place that could reliably ensure that “wisdom” plays a key role in decision making around AI development, this seems like good a bet as any to help us deal with our predicament.