Even with zero discount rate the problem simplifies to your model of how much knowledge would a “30 year pause” world gain when it cannot build large AGI to determine how they work and their actual failure modes. If you believe from history of human engineering that the gain would be almost nothing, then that ends up being a bad bet because it has a large cost (all the deaths) and no real gain.
It seems that you see what can be gained in a pause is only technical alignment advances. But I want to point out that safety comes from solving two problems, the governance problem and the technical problem. And we need a lot of time to get the governance ironed out. The way I see it, misaligned AGI or ASI is the most dangerous thing ever, so we need the best regulation ever. The best safety / testing requirements. The best monitoring by governments of AI groups for unsafe actions, the best awareness among politicians. Among the public. And if one country has great governance figured out, it takes years or decades to get that level of excellence to be applied globally.
Do you know of examples of this? I don’t know cases of good government or good engineering or good anything without feedback, where the feedback proves the government or engineering is bad.
That’s the history of human innovation. I suspect that no pause would gain anything but more years alive for currently living humans by the length of the pause.
I do not have good examples no. You are right that normally there is learning from failure cases. But we should still try. Now we have nothing that is required that could prevent an AGI breakout. Nick Bostrom has wrote in Superintelligence for example that we could implement tripwires and honeypot situations in virtual worlds that would trigger a shutdown. We can think of things that are better than nothing.
I don’t think we should try. I think the potential benefits of tinkering with AGI are worth some risks, and if EY is right and it’s always uncontrollable and will turn against us then we are all dead one way or another anyways. If he’s wrong we’re throwing away the life of every living human being for no reason.
And there is reason to think EY is wrong. CAIS and careful control of what gets rewarded in training could lead to safe enough AGI.
That is a very binary assessment. You make it seem like either Safety is impossible or it is easy. If impossible, we could save everyone by not building AGI. If we know it to be easy, I agree, we should accelerate. But the reality is that we do not know, and that it can be somewhere on the spectrum from easy to impossible. And since everything is on the line, including your life. Better safe than sorry is to me the obvious approach. Do I see correctly that you think the pausing AGI situation is not ‘safe’ because if all would go well, the AGI could be used to make humans immortal?
One hidden bias here is that I think a large hidden component on safety is a constant factor.
So pSafe has two major components (natural law, human efforts).
“Natural law” is equivalent to the question of “will a fission bomb ignite the atmosphere”. In this context it would be “will a smart enough superintelligence be able to trivially overcome governing factors?”
Governing factors include: a lack of compute (by inventing efficient algorithms and switching to those), lack of money (by somehow manipulating the economy to give itself large amounts of money), lack of robotics (some shortcut to nanotechnology), lack of data (better analysis of existing data or see robotics) and so on. To the point of essentially “magic”, see the sci Fi story metamorphosis of prime intellect.
In worlds where intelligence scales high enough, the machine basically always breaks out and does what it will. Humans are too stupid to ever have a chance. Not just as individuals but organizationally stupid. Slowing things down does not do anything but delay the inevitable. (And if fission devices ignited the atmosphere, same idea. Almost all world lines end in extinction)
This is why EY is so despondent: if intelligence is this powerful there probably exists no solution.
In worlds where aligning AI is easy because they need rather expensive and obviously easy to control amounts of compute to be interesting in capabilities, and the machines are not particularly hard to corral into doing what we want, then alignment efforts don’t matter.
I don’t know how much probability mass lies in the “in between” region. Right now, I believe the actual evidence is heavily in favor of “trivial alignment”.
“Trivial alignment” is “stateless microservices with an in distribution detector before the AGI”. This is an architecture production software engineers are well aware of.
Nevertheless, “slow down” is almost always counterproductive. In world lines where AGI can be used to our favor or is also hostile, this is a weapon we have to have on our side or we will be defeated. Pauses disempower us. In world lines where alignment is easy, pauses kill everyone who isn’t life extended with better medicine. In world lines where alignment can’t be done by human beings, it doesn’t matter.
The world lines of “AI is extremely dangerous” and “humans can contain it if they collaborate smartly and internationally and very carefully inch forward in capabilities and they SUCCEED” may not exist. This is I think the crux of it. The probability of this combination of events may be so low no worldline within the permutation space of the universe contains this particular combination of events.
Notice it’s a series probability: demon like AGI that can escape anything but we can be very careful not to give them too much capabilities and “international agreement”.
Thank you for your comments and explanations! Very interesting to see your reasoning. I have not seen evidence of trivial alignment. I hope for the mass to be in the in between region. I want to point out that I think you do not need your “magic” level intelligence to do a world takeover. Just high human level with digital speed and working with your copies is likely enough I think. My blurry picture is that the AGI would only need a few robots in a secret company and some paid humans to work on a >90% mortality virus where the humans are not aware what the robots are doing. And hope for international agreement comes not so much from a pause but from a safe virtual testing environment that I am thinking about.
I am definitely willing to listen to such arguments, but ATM I don’t actually believe in “discount rates” on people, so ¯\(ツ)/¯
The discount rate is essentially how much you value a future person’s life over current lives.
I realize, and my “discount rate” under that framework is zero.
Nobody’s discount rate can be literally zero, because that leads to absurdities if actually acted upon.
Like what?
Variants of Pascal’s mugging.
Infinite regress.
etc.
Even with zero discount rate the problem simplifies to your model of how much knowledge would a “30 year pause” world gain when it cannot build large AGI to determine how they work and their actual failure modes. If you believe from history of human engineering that the gain would be almost nothing, then that ends up being a bad bet because it has a large cost (all the deaths) and no real gain.
It seems that you see what can be gained in a pause is only technical alignment advances. But I want to point out that safety comes from solving two problems, the governance problem and the technical problem. And we need a lot of time to get the governance ironed out. The way I see it, misaligned AGI or ASI is the most dangerous thing ever, so we need the best regulation ever. The best safety / testing requirements. The best monitoring by governments of AI groups for unsafe actions, the best awareness among politicians. Among the public. And if one country has great governance figured out, it takes years or decades to get that level of excellence to be applied globally.
Do you know of examples of this? I don’t know cases of good government or good engineering or good anything without feedback, where the feedback proves the government or engineering is bad.
That’s the history of human innovation. I suspect that no pause would gain anything but more years alive for currently living humans by the length of the pause.
I do not have good examples no. You are right that normally there is learning from failure cases. But we should still try. Now we have nothing that is required that could prevent an AGI breakout. Nick Bostrom has wrote in Superintelligence for example that we could implement tripwires and honeypot situations in virtual worlds that would trigger a shutdown. We can think of things that are better than nothing.
I don’t think we should try. I think the potential benefits of tinkering with AGI are worth some risks, and if EY is right and it’s always uncontrollable and will turn against us then we are all dead one way or another anyways. If he’s wrong we’re throwing away the life of every living human being for no reason.
And there is reason to think EY is wrong. CAIS and careful control of what gets rewarded in training could lead to safe enough AGI.
That is a very binary assessment. You make it seem like either Safety is impossible or it is easy. If impossible, we could save everyone by not building AGI. If we know it to be easy, I agree, we should accelerate. But the reality is that we do not know, and that it can be somewhere on the spectrum from easy to impossible. And since everything is on the line, including your life. Better safe than sorry is to me the obvious approach. Do I see correctly that you think the pausing AGI situation is not ‘safe’ because if all would go well, the AGI could be used to make humans immortal?
One hidden bias here is that I think a large hidden component on safety is a constant factor.
So pSafe has two major components (natural law, human efforts).
“Natural law” is equivalent to the question of “will a fission bomb ignite the atmosphere”. In this context it would be “will a smart enough superintelligence be able to trivially overcome governing factors?”
Governing factors include: a lack of compute (by inventing efficient algorithms and switching to those), lack of money (by somehow manipulating the economy to give itself large amounts of money), lack of robotics (some shortcut to nanotechnology), lack of data (better analysis of existing data or see robotics) and so on. To the point of essentially “magic”, see the sci Fi story metamorphosis of prime intellect.
In worlds where intelligence scales high enough, the machine basically always breaks out and does what it will. Humans are too stupid to ever have a chance. Not just as individuals but organizationally stupid. Slowing things down does not do anything but delay the inevitable. (And if fission devices ignited the atmosphere, same idea. Almost all world lines end in extinction)
This is why EY is so despondent: if intelligence is this powerful there probably exists no solution.
In worlds where aligning AI is easy because they need rather expensive and obviously easy to control amounts of compute to be interesting in capabilities, and the machines are not particularly hard to corral into doing what we want, then alignment efforts don’t matter.
I don’t know how much probability mass lies in the “in between” region. Right now, I believe the actual evidence is heavily in favor of “trivial alignment”.
“Trivial alignment” is “stateless microservices with an in distribution detector before the AGI”. This is an architecture production software engineers are well aware of.
Nevertheless, “slow down” is almost always counterproductive. In world lines where AGI can be used to our favor or is also hostile, this is a weapon we have to have on our side or we will be defeated. Pauses disempower us. In world lines where alignment is easy, pauses kill everyone who isn’t life extended with better medicine. In world lines where alignment can’t be done by human beings, it doesn’t matter.
The world lines of “AI is extremely dangerous” and “humans can contain it if they collaborate smartly and internationally and very carefully inch forward in capabilities and they SUCCEED” may not exist. This is I think the crux of it. The probability of this combination of events may be so low no worldline within the permutation space of the universe contains this particular combination of events.
Notice it’s a series probability: demon like AGI that can escape anything but we can be very careful not to give them too much capabilities and “international agreement”.
Thank you for your comments and explanations! Very interesting to see your reasoning. I have not seen evidence of trivial alignment. I hope for the mass to be in the in between region. I want to point out that I think you do not need your “magic” level intelligence to do a world takeover. Just high human level with digital speed and working with your copies is likely enough I think. My blurry picture is that the AGI would only need a few robots in a secret company and some paid humans to work on a >90% mortality virus where the humans are not aware what the robots are doing. And hope for international agreement comes not so much from a pause but from a safe virtual testing environment that I am thinking about.