These three proposals are basically examples of the “minimal pivotal act” specialized AI that Yudkowsky talks about and thinks is ~impossible for us to build safely. I personally think value alignment is way easier to solve than most pessimists assume, and that these attempts to circumvent the core of value alignment are (1) not going to work (as in, you’ll not be able to build a specialized system with only the capabilities required for your pivotal act before someone else builds a generalist ASI too strong for you pivotal act specialist to stop), and (2) just straight up more dangerous than trying to build a value-aligned AGI.
The reason for the latter is that you need value alignment anyways in order to prevent your pivotal act specialist from killing you (or turning into a thing that would kill you), and you’re more likely to solve value alignment if you actually set out to solve value alignment as the main thing you’re doing.
Good points. Your point about value alignment being better to solve then just trying to orchestrate a pivotal act is true, but if we don’t have alignment solved by the time AGI rolls around, then from a pure survival perspective, it might be better to try a narrow ASI pivotal act instead of hoping that AGI turns out to be aligned already. This solution above doesn’t solve alignment in the traditional sense, it just pushes the AGI timeline back hopefully enough to solving alignment.
The idea I have specifically is that you have something like GPT-3 (unintelligent in all other domains, doesn’t expand outside of its system or optimize outside of itself) that becomes an incredibly effective Tool AI. GPT-3 isn’t really aligned in the Yudkowsky sense, but I’m sure you could get it to write a mildly persuasive piece already. (It sort of already has: https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3 ).
Scale this to superintelligent levels like AlphaGo, and I think you could orchestrate a pivotal act pretty rapidly. It doesn’t solve the alignment problem, but it pushes it back.
The problems are that the user needs to be aligned and that the type of narrow ASI has to be developed before AGI. But given the state of narrow ASI, I think it might be one of the best shots, and I do think a narrow ASI could get to this level before AGI, much the same way as AlphaGo proceeded MuZero.
What I am ultimately saying is that if we get a narrow AI that has the power to make a pivotal act, we should probably use it.
In all three cases, the AI you’re asking for is a superintelligent AGI. Each has to navigate a broad array of physically instantiated problems requiring coherent, goal oriented optimisation. No stateless, unembedded and temporally incoherent system like GPT-3 is going to be able to create nanotechnology, beat all human computer security experts, or convince everyone of your position.
Values arise to guide the actions that intelligence systems perform. Evolution did not arrange for us to form values because it liked human values. It did so because forming values is an effective strategy for getting more performance out of an agentic system, and SGD can figure this fact out just as easily as evolution.
If you optimise a system to be coherent and take actions in the real world, it will end up with values oriented around doing so effectively. Nature abhors a vacuum. If you don’t populate your superintelligent AGI with human-compatible values, some other values will arise and consume the free energy you’ve left around.
Interesting! I appreciate the details here; it gives me a better sense of why narrow ASI is probably not something that can exist. Is there a place we could talk over audio about AGI alignment versus text here on LessWrong? I’d like to get a better idea of the field, especially as I move into work like creating an AI Alignment Sandbox.
My Discord is Soareverix#7614 and my email is maarocket@gmail.com. I’d really appreciate the chance to talk with you over audio before I begin working on sharing alignment info and coming up with my own methods for solving the problem.
These three proposals are basically examples of the “minimal pivotal act” specialized AI that Yudkowsky talks about and thinks is ~impossible for us to build safely. I personally think value alignment is way easier to solve than most pessimists assume, and that these attempts to circumvent the core of value alignment are (1) not going to work (as in, you’ll not be able to build a specialized system with only the capabilities required for your pivotal act before someone else builds a generalist ASI too strong for you pivotal act specialist to stop), and (2) just straight up more dangerous than trying to build a value-aligned AGI.
The reason for the latter is that you need value alignment anyways in order to prevent your pivotal act specialist from killing you (or turning into a thing that would kill you), and you’re more likely to solve value alignment if you actually set out to solve value alignment as the main thing you’re doing.
Good points. Your point about value alignment being better to solve then just trying to orchestrate a pivotal act is true, but if we don’t have alignment solved by the time AGI rolls around, then from a pure survival perspective, it might be better to try a narrow ASI pivotal act instead of hoping that AGI turns out to be aligned already. This solution above doesn’t solve alignment in the traditional sense, it just pushes the AGI timeline back hopefully enough to solving alignment.
The idea I have specifically is that you have something like GPT-3 (unintelligent in all other domains, doesn’t expand outside of its system or optimize outside of itself) that becomes an incredibly effective Tool AI. GPT-3 isn’t really aligned in the Yudkowsky sense, but I’m sure you could get it to write a mildly persuasive piece already. (It sort of already has: https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3 ).
Scale this to superintelligent levels like AlphaGo, and I think you could orchestrate a pivotal act pretty rapidly. It doesn’t solve the alignment problem, but it pushes it back.
The problems are that the user needs to be aligned and that the type of narrow ASI has to be developed before AGI. But given the state of narrow ASI, I think it might be one of the best shots, and I do think a narrow ASI could get to this level before AGI, much the same way as AlphaGo proceeded MuZero.
What I am ultimately saying is that if we get a narrow AI that has the power to make a pivotal act, we should probably use it.
In all three cases, the AI you’re asking for is a superintelligent AGI. Each has to navigate a broad array of physically instantiated problems requiring coherent, goal oriented optimisation. No stateless, unembedded and temporally incoherent system like GPT-3 is going to be able to create nanotechnology, beat all human computer security experts, or convince everyone of your position.
Values arise to guide the actions that intelligence systems perform. Evolution did not arrange for us to form values because it liked human values. It did so because forming values is an effective strategy for getting more performance out of an agentic system, and SGD can figure this fact out just as easily as evolution.
If you optimise a system to be coherent and take actions in the real world, it will end up with values oriented around doing so effectively. Nature abhors a vacuum. If you don’t populate your superintelligent AGI with human-compatible values, some other values will arise and consume the free energy you’ve left around.
Interesting! I appreciate the details here; it gives me a better sense of why narrow ASI is probably not something that can exist. Is there a place we could talk over audio about AGI alignment versus text here on LessWrong? I’d like to get a better idea of the field, especially as I move into work like creating an AI Alignment Sandbox.
My Discord is Soareverix#7614 and my email is maarocket@gmail.com. I’d really appreciate the chance to talk with you over audio before I begin working on sharing alignment info and coming up with my own methods for solving the problem.