I’m actually optimistic about prosaic alignment for a takeoff driven by language models. But I don’t know what the opportunity for action is there—I expect Deepmind to trigger the singularity, and they’re famously opaque. Call it 15% chance of not-doom, action or no action. To be clear, I think action is possible, but I don’t know who would do it or what form it would take. Convince OpenAI and race Deepmind to a working prototype? This is exactly the scenario we hoped to not be in...
edit: I think possibly step 1 is to prove that Transformers can scale to AGI. Find a list of remaining problems and knock them down—preferably in toy environments with weaker models. The difficulty is obviously demonstrating danger without instantiating it. Create a fire alarm, somehow. The hard part for judgment on this action is that it both helps and harms.
edit: Regulatory action may buy us a few years! I don’t see how we can get it though.
I’m definitely on board with prosaic alignment via language models. There’s a few different projects I’ve seen in this community related to that approach, including ELK and the project to teach a GPT-3-like model to produce violence-free stories. I definitely think these are good things.
I don’t understand why you would want to spend any effort proving that transformers could scale to AGI. Either they can or they can’t. If they can, then proving that they can will only accelerate the problem. If they can’t, then prosaic alignment will turn out to be a waste of time, but only in the sense that every lightbulb Edison tested was a waste of time (except for the one that worked). This is what I mean by a shotgun approach.
I don’t understand why you would want to spend any effort proving that transformers could scale to AGI.
The point would be to try and create common knowledge that they can. Otherwise, for any “we decided to not do X”, someone else will try doing X, and the problem remains.
Humanity is already taking a shotgun approach to unaligned AGI. Shotgunning safety is viable and important, but I think it’s more urgent to prevent the first shotgun from hitting an artery. Demonstrating AGI viability in this analogy is shotgunning a pig in the town square, to prove to everyone that the guns we are building can in fact kill.
We want safety to have as many pellets in flight as possible. But we want unaligned AGI to have as few pellets in flight as possible. (Preferably none.)
I’m actually optimistic about prosaic alignment for a takeoff driven by language models. But I don’t know what the opportunity for action is there—I expect Deepmind to trigger the singularity, and they’re famously opaque. Call it 15% chance of not-doom, action or no action. To be clear, I think action is possible, but I don’t know who would do it or what form it would take. Convince OpenAI and race Deepmind to a working prototype? This is exactly the scenario we hoped to not be in...
edit: I think possibly step 1 is to prove that Transformers can scale to AGI. Find a list of remaining problems and knock them down—preferably in toy environments with weaker models. The difficulty is obviously demonstrating danger without instantiating it. Create a fire alarm, somehow. The hard part for judgment on this action is that it both helps and harms.
edit: Regulatory action may buy us a few years! I don’t see how we can get it though.
I’m definitely on board with prosaic alignment via language models. There’s a few different projects I’ve seen in this community related to that approach, including ELK and the project to teach a GPT-3-like model to produce violence-free stories. I definitely think these are good things.
I don’t understand why you would want to spend any effort proving that transformers could scale to AGI. Either they can or they can’t. If they can, then proving that they can will only accelerate the problem. If they can’t, then prosaic alignment will turn out to be a waste of time, but only in the sense that every lightbulb Edison tested was a waste of time (except for the one that worked). This is what I mean by a shotgun approach.
The point would be to try and create common knowledge that they can. Otherwise, for any “we decided to not do X”, someone else will try doing X, and the problem remains.
Humanity is already taking a shotgun approach to unaligned AGI. Shotgunning safety is viable and important, but I think it’s more urgent to prevent the first shotgun from hitting an artery. Demonstrating AGI viability in this analogy is shotgunning a pig in the town square, to prove to everyone that the guns we are building can in fact kill.
We want safety to have as many pellets in flight as possible. But we want unaligned AGI to have as few pellets in flight as possible. (Preferably none.)