Good points. Your point about value alignment being better to solve then just trying to orchestrate a pivotal act is true, but if we don’t have alignment solved by the time AGI rolls around, then from a pure survival perspective, it might be better to try a narrow ASI pivotal act instead of hoping that AGI turns out to be aligned already. This solution above doesn’t solve alignment in the traditional sense, it just pushes the AGI timeline back hopefully enough to solving alignment.
The idea I have specifically is that you have something like GPT-3 (unintelligent in all other domains, doesn’t expand outside of its system or optimize outside of itself) that becomes an incredibly effective Tool AI. GPT-3 isn’t really aligned in the Yudkowsky sense, but I’m sure you could get it to write a mildly persuasive piece already. (It sort of already has: https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3 ).
Scale this to superintelligent levels like AlphaGo, and I think you could orchestrate a pivotal act pretty rapidly. It doesn’t solve the alignment problem, but it pushes it back.
The problems are that the user needs to be aligned and that the type of narrow ASI has to be developed before AGI. But given the state of narrow ASI, I think it might be one of the best shots, and I do think a narrow ASI could get to this level before AGI, much the same way as AlphaGo proceeded MuZero.
What I am ultimately saying is that if we get a narrow AI that has the power to make a pivotal act, we should probably use it.
In all three cases, the AI you’re asking for is a superintelligent AGI. Each has to navigate a broad array of physically instantiated problems requiring coherent, goal oriented optimisation. No stateless, unembedded and temporally incoherent system like GPT-3 is going to be able to create nanotechnology, beat all human computer security experts, or convince everyone of your position.
Values arise to guide the actions that intelligence systems perform. Evolution did not arrange for us to form values because it liked human values. It did so because forming values is an effective strategy for getting more performance out of an agentic system, and SGD can figure this fact out just as easily as evolution.
If you optimise a system to be coherent and take actions in the real world, it will end up with values oriented around doing so effectively. Nature abhors a vacuum. If you don’t populate your superintelligent AGI with human-compatible values, some other values will arise and consume the free energy you’ve left around.
Interesting! I appreciate the details here; it gives me a better sense of why narrow ASI is probably not something that can exist. Is there a place we could talk over audio about AGI alignment versus text here on LessWrong? I’d like to get a better idea of the field, especially as I move into work like creating an AI Alignment Sandbox.
My Discord is Soareverix#7614 and my email is maarocket@gmail.com. I’d really appreciate the chance to talk with you over audio before I begin working on sharing alignment info and coming up with my own methods for solving the problem.
Good points. Your point about value alignment being better to solve then just trying to orchestrate a pivotal act is true, but if we don’t have alignment solved by the time AGI rolls around, then from a pure survival perspective, it might be better to try a narrow ASI pivotal act instead of hoping that AGI turns out to be aligned already. This solution above doesn’t solve alignment in the traditional sense, it just pushes the AGI timeline back hopefully enough to solving alignment.
The idea I have specifically is that you have something like GPT-3 (unintelligent in all other domains, doesn’t expand outside of its system or optimize outside of itself) that becomes an incredibly effective Tool AI. GPT-3 isn’t really aligned in the Yudkowsky sense, but I’m sure you could get it to write a mildly persuasive piece already. (It sort of already has: https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3 ).
Scale this to superintelligent levels like AlphaGo, and I think you could orchestrate a pivotal act pretty rapidly. It doesn’t solve the alignment problem, but it pushes it back.
The problems are that the user needs to be aligned and that the type of narrow ASI has to be developed before AGI. But given the state of narrow ASI, I think it might be one of the best shots, and I do think a narrow ASI could get to this level before AGI, much the same way as AlphaGo proceeded MuZero.
What I am ultimately saying is that if we get a narrow AI that has the power to make a pivotal act, we should probably use it.
In all three cases, the AI you’re asking for is a superintelligent AGI. Each has to navigate a broad array of physically instantiated problems requiring coherent, goal oriented optimisation. No stateless, unembedded and temporally incoherent system like GPT-3 is going to be able to create nanotechnology, beat all human computer security experts, or convince everyone of your position.
Values arise to guide the actions that intelligence systems perform. Evolution did not arrange for us to form values because it liked human values. It did so because forming values is an effective strategy for getting more performance out of an agentic system, and SGD can figure this fact out just as easily as evolution.
If you optimise a system to be coherent and take actions in the real world, it will end up with values oriented around doing so effectively. Nature abhors a vacuum. If you don’t populate your superintelligent AGI with human-compatible values, some other values will arise and consume the free energy you’ve left around.
Interesting! I appreciate the details here; it gives me a better sense of why narrow ASI is probably not something that can exist. Is there a place we could talk over audio about AGI alignment versus text here on LessWrong? I’d like to get a better idea of the field, especially as I move into work like creating an AI Alignment Sandbox.
My Discord is Soareverix#7614 and my email is maarocket@gmail.com. I’d really appreciate the chance to talk with you over audio before I begin working on sharing alignment info and coming up with my own methods for solving the problem.