If we expect there will be lots of intermediate steps—does this really change the analysis much? >
I think so yes. One fundamental way is that you might develop machines that are intelligent enough to produce new knowledge at a speed and quality above the current capacity of humans, without those machines being necessarily agentic. Those machines could potentially work in the alignment problem themselves
I think I know what EY objection would be (I might be wrong): a machine capable of doing that is already an AGI and henceforth already deadly. Well, I think this argument would be wrong too. I can envision a machine capable of doing science and not necessarily being agentic.
How will we know once we’ve reached the point where there aren’t many intermediate steps left before crossing a crticial threshold? >
I don’t know if it is useful to think in terms of thresholds. A threshold to what? To an AGI? To an AGI of unlimited power? Before making a very intelligent machine there will be less intelligent machines. The leap can be very quick, but I don’t expect that there will be at any point one single entity that is so powerful that will dominate any other life forms in a very short time (a window of time shorter than it takes to other companies/groups to develop similar entities). How do I know that? I don’t, but when I hear all the possible scenarios in which a machine pulls off a “end of the world” scenario, they all are based on the assumption (and I think it is fair calling it this way) that the machine will have almost unlimited power, e.g. it is able to simulate nanomachines and then devise a plan to successfully deploy simultaneously those nanomachines everywhere while being hidden. It is this part of the argument that I have problems with: it assumes that these things are possible in the first place. And some things are not, even if you have 100000 Von Neumanns thinking for 1000000 years. A machine that can play Go at the God level can’t win a game against AlphaZero with 20 handicap.
How do you expect everyone’s behaviour to change once we do get close?
Close to develop an AGI? I think we are close now. I just don’t think it will mean the end of the world.
While you can envision something, it doesn’t mean that envisioned is logically coherent/possible/trivial to achieve. In one fantasy novel protagonists travel to the world were physical laws make it impossible to light matches. It’s very easy to imagine that you try to light match again and again and fail, but “impossibility to light matches” implies such drastic changes in physical laws that Earth’s life probably can’t sustain itself here, because heads of matches contain phosphorous and phosphorous is vital for bodily processes (and I don’t even go for universe-wide consequences of different physical constants).
So it’s very easy to imagine terminal where you print “how to solve alignment?”, press “enter”, get solution after an hour and everybody lives happily ever after. But I can’t imagine how this thing should work without developing agency, if I don’t say in some moment “here happens Magic that prevents this system from developing agency”.
If we expect there will be lots of intermediate steps—does this really change the analysis much? >
I think so yes. One fundamental way is that you might develop machines that are intelligent enough to produce new knowledge at a speed and quality above the current capacity of humans, without those machines being necessarily agentic. Those machines could potentially work in the alignment problem themselves
I think I know what EY objection would be (I might be wrong): a machine capable of doing that is already an AGI and henceforth already deadly. Well, I think this argument would be wrong too. I can envision a machine capable of doing science and not necessarily being agentic.
How will we know once we’ve reached the point where there aren’t many intermediate steps left before crossing a crticial threshold? >
I don’t know if it is useful to think in terms of thresholds. A threshold to what? To an AGI? To an AGI of unlimited power? Before making a very intelligent machine there will be less intelligent machines. The leap can be very quick, but I don’t expect that there will be at any point one single entity that is so powerful that will dominate any other life forms in a very short time (a window of time shorter than it takes to other companies/groups to develop similar entities). How do I know that? I don’t, but when I hear all the possible scenarios in which a machine pulls off a “end of the world” scenario, they all are based on the assumption (and I think it is fair calling it this way) that the machine will have almost unlimited power, e.g. it is able to simulate nanomachines and then devise a plan to successfully deploy simultaneously those nanomachines everywhere while being hidden. It is this part of the argument that I have problems with: it assumes that these things are possible in the first place. And some things are not, even if you have 100000 Von Neumanns thinking for 1000000 years. A machine that can play Go at the God level can’t win a game against AlphaZero with 20 handicap.
How do you expect everyone’s behaviour to change once we do get close?
Close to develop an AGI? I think we are close now. I just don’t think it will mean the end of the world.
While you can envision something, it doesn’t mean that envisioned is logically coherent/possible/trivial to achieve. In one fantasy novel protagonists travel to the world were physical laws make it impossible to light matches. It’s very easy to imagine that you try to light match again and again and fail, but “impossibility to light matches” implies such drastic changes in physical laws that Earth’s life probably can’t sustain itself here, because heads of matches contain phosphorous and phosphorous is vital for bodily processes (and I don’t even go for universe-wide consequences of different physical constants).
So it’s very easy to imagine terminal where you print “how to solve alignment?”, press “enter”, get solution after an hour and everybody lives happily ever after. But I can’t imagine how this thing should work without developing agency, if I don’t say in some moment “here happens Magic that prevents this system from developing agency”.