Then you could try to create alternative designs for AI systems such that they can do the things that goal-directed agents can do without themselves being goal-directed. You could also try to persuade AI researchers of these facts, so that they don’t build goal-directed systems.
I’m not sure this strategy is net positive. If dangerous AI (dangerous at least as Slaughterbots) is developed before alignment is solved, the world is probably better off if the first visibly-dangerous-AI is goal-directed rather than, say, an Oracle. The former would probably be a much weaker optimization process and probably won’t result in an existential catastrophe; and perhaps will make some governance solutions more feasible.
I’m not optimizing for raising awareness via an “obvious AI disaster” due to multiple reasons, including the huge risk to the reputation of the AI safety community and the unilateralist’s curse.
I do think that when considering whether to invest in an effort which might prevent recoverable near-term AI accidents, one should consider the possibility that the effort would prevent pivotal events (e.g. one that would have enabled useful governance solutions resulting in more time for alignment research).
Efforts that prevent recoverable near-term AI accidents might be astronomically net-positive if they help make AI alignment more mainstream in the general ML community.
(anyone who thinks I shouldn’t discuss this publicly is welcome to let me know via a PM or anonymously here)
In this scenario, wouldn’t you eventually build a sufficiently powerful goal-directed AI that leads to an existential catastrophe?
Perhaps the hope is that when everyone sees that the first goal-directed AI is visibly dangerous then they actually believe that goal-directed AI is dangerous. But in the scenario where we are building alternatives to goal-directed AI and they are actually getting used, I would predict that we have convinced most AI researchers that goal-directed AI is dangerous.
(Also, I think you can level this argument at nearly all AI safety research agendas, with possibly the exception of Agent Foundations.)
I think I didn’t articulate my argument clearly, I tried to clarify it in my reply to Jessica.
I think my argument might be especially relevant to the effort of persuading AI researchers not to build goal-directed systems.
If a result of this effort is convincing more AI researchers in the general premise that x-risk from AI is something worth worrying about, then that’s a very strong argument in favor of carrying out the effort (and I agree this result should correlate with convincing AI researchers not to build goal-directed systems—if that’s what you argued in your comment).
Yeah, I was imagining that we would convince AI researchers that goal-directed systems are dangerous, and that we should build the non-goal-directed versions instead.
I’m not sure this strategy is net positive. If dangerous AI (dangerous at least as Slaughterbots) is developed before alignment is solved, the world is probably better off if the first visibly-dangerous-AI is goal-directed rather than, say, an Oracle. The former would probably be a much weaker optimization process and probably won’t result in an existential catastrophe; and perhaps will make some governance solutions more feasible.
Can you clarify the argument? Are you optimizing for an obvious AI disaster to happen as soon as possible so people take the issue more seriously?
I’m not optimizing for raising awareness via an “obvious AI disaster” due to multiple reasons, including the huge risk to the reputation of the AI safety community and the unilateralist’s curse.
I do think that when considering whether to invest in an effort which might prevent recoverable near-term AI accidents, one should consider the possibility that the effort would prevent pivotal events (e.g. one that would have enabled useful governance solutions resulting in more time for alignment research).
Efforts that prevent recoverable near-term AI accidents might be astronomically net-positive if they help make AI alignment more mainstream in the general ML community.
(anyone who thinks I shouldn’t discuss this publicly is welcome to let me know via a PM or anonymously here)
In this scenario, wouldn’t you eventually build a sufficiently powerful goal-directed AI that leads to an existential catastrophe?
Perhaps the hope is that when everyone sees that the first goal-directed AI is visibly dangerous then they actually believe that goal-directed AI is dangerous. But in the scenario where we are building alternatives to goal-directed AI and they are actually getting used, I would predict that we have convinced most AI researchers that goal-directed AI is dangerous.
(Also, I think you can level this argument at nearly all AI safety research agendas, with possibly the exception of Agent Foundations.)
I think I didn’t articulate my argument clearly, I tried to clarify it in my reply to Jessica.
I think my argument might be especially relevant to the effort of persuading AI researchers not to build goal-directed systems.
If a result of this effort is convincing more AI researchers in the general premise that x-risk from AI is something worth worrying about, then that’s a very strong argument in favor of carrying out the effort (and I agree this result should correlate with convincing AI researchers not to build goal-directed systems—if that’s what you argued in your comment).
Yeah, I was imagining that we would convince AI researchers that goal-directed systems are dangerous, and that we should build the non-goal-directed versions instead.