In theory you are correct. A mind can be constructed to optimize for anything.
In practice, if you add details—so you have this ASI system. It has some lineage from humans. It is incredibly powerful and efficient and is rapidly gaining resources out in the solar system. Implicitly if you try to reason about a being with the curiosity to have discovered advanced technology humans don’t have (we can’t conquer the solar system with current tech), the persuasive abilities to have convinced humans to provide it with the rockets and seed factories to get started. The hyper-intelligence or diverse machine civilization of ideas that allowed it to get this far.
Is this machine going to destroy its own planet of origin for no purpose but some tiny amount of extra matter it doesn’t need? That seems...stupid and short sighted. Like blowing up Mayan temples because you need gravel. Implicitly contradictory with the cognitive system that got this far.
That’s why I suspect that Orthogonality may be wrong in any plausible timeline, despite being possible in theory. This is a “spherical cow” model of intelligence that may never actually exist.
With that said I would argue that humans should never give seed factories or rockets to any ASI system ever, or task any ASI system with wide sweeping authority with no human review or direct control. That is just bad engineering. I don’t think humans should trust anything we can call an “ASI” with anything but very narrow and checkable tasks.
I believe you are predicting that resource constraints will be unlikely. To use my analogy from the post, you are saying we will likely be safer because the ASI will not require our habitat for its highway. There are so many other places for it to build roads.
I do not think that is a case that it values our wellbeing...just that it will not get around to depriving us of resources because of a cost/benefit analysis.
Do you think the Human Worth hypothesis is likely true? That the more intelligent an agent is the more it will positively value human wellbeing?
That’s not the precise argument. Currently Humans believe the universe as far as we can see is cold and dead. The earth itself—not humans specifically, but this rich biosphere that appears to have evolved through cosmic levels of luck—has value to humans.
Kinda how Mayan ruins have value to humans, we have all the other places on the planet to exploit for resources, we do not need to destroy one of a kind artifacts of an early civilization. It’s not even utility, technically the land the ruins are on would make more money covered in condos, but we humans want to remember and understand our deep past.
Anthropically I am imagine that “ultra smart” means similar long term thinking to humans, just the ASI is better at it, and therefore some ASI would model regretting having destroyed the only evolved life on the universe in the future and not do the bad act of destroying it all.
This does not mean the ASI would help or harm individual humans or avoid killing humans that interfere with it. Just it probably wouldn’t wipe out the entire species and the ecosystem of the planet to make more robots.
Eliezer says exponential growth will exhaust all resources quickly and hes right...but will superintelligence waste a priceless biosphere for less than 0.1 percent more resources? This is possible but seems stupid.
That argument may even be correct—a sufficiently advanced intelligence may see just how much less-interesting matter there is to exploit before the optimization question of “a tiny bit of resources to keep some ruins and the critters who made them around” vs “another few percent of matter to make into computronium or whatever the super-AGI version of paperclips is”.
And then that extends to preserving part of the solar system while exploiting other star systems.
I don’t put a LOT of weight behind that argument—not only is it pretty tenuous (we don’t have any clue how many humans or in what condition the AI will decide is valuable enough to keep—note that we haven’t kept very many ancient ruins), but it ignores the ramp-up problem—the less-capable versions that are trying to get smart and powerful enough to get off of Earth (and then out of the solar system) in the first place.
I would agree with this. The easiest way to “encourage” ASI to leave humans alone would be for humans to arm themselves with the most powerful weapons they can produce, helped by the strongest AI models humans can reliably control. This matter needs to fight back, see Geohot.
Depends on doom definition? There’s an awful lot of weird futures that might happen that aren’t “everyone is dead and nothing but some single ai is turning the universe to cubes” and “human paradise”. Nature is weird, even our own civilization is barely recognizable to our distant ancestors. We have all kinds of new problems they could not relate to.
I think my general attitude is more that I am highly uncertain what will happen but I feel that an AI “pause” or “shutdown” at this time is clearly not the right decision, because in the past, civilizations that refused to adopt and arm themselves new technologies did not get good outcomes.
I think such choices need to be based on empirical evidence that would convince any rational person. Claiming you know what will happen in the future without evidence is not rational. There is no direct evidence of the Orthogonality hypothesis or most of the arguments for AI doom. There is strong evidence that gpt-4 is useful and a stronger model than gpt-4 is needed to meet meaningful thresholds for general utility.
A rationalist and an empiricist went backpacking together. They got lost, ended up in a desert, and were on the point of death from thirst. They wander to a point where they can see a cool, clear stream in the distance but unfortunately there is a sign that tells them to BEWARE THE MINE FIELD between them and the stream.
The rationalist says, “Let’s reason through this and find a path.” The empiricist says, “What? No. We’re going to be empirical. Follow me.” He starts walking through the mind field and gets blown to bits a few steps in.
The rationalist sits down and dies of thirst.
Alternate endings:
The rationalist gets killed by flying shrapnel along with the empiricist.
The rationalist grabs the empiricist and stops him. He carefully analyzes dirt patterns, draws a map, and tells the empiricist to start walking. The empiricist blows up. The rationalist sits down and dies of thirst chanting “The map is not the territory.”
The rationalist grabs the empiricist, analyzes dirt patterns, draws map, tells empiricist to start walking. Empiricist blows up. Rationalist says, “Hmmm. Now I understand the dirt patterns better.” Rationalist redraws map. Walks through mind field. While drinking water takes off fleece to reveal his “Closet Empiricist” t-shirt.
They sit down together, figure out how to find some magnetic rocks, build a very crude metal detector, put it on the end of a stick, and start making their way slowly through the mine field. Step on a mine and a nuclear mushroom cloud erupts.
So how powerful are those dad gum land mines?? Willingness to perform certain experiments should be a function of the expected size of the boom.
If you think you are walking over sand burs and not land mines, you are more willing to be an empiricist exploring the space. “Ouch don’t step there” instead of “Boom. <black screen>”
If one believes that smarter things will see >0 value in humanity, that is, if you believe some version of the Human Worth Hypothesis, then you believe the land mines are less deadly and it makes sense to proceed...especially for that clear, cool water that could save your life.
I’m not really making a point, here, but just turning the issues into a mental cartoon, I guess.
Okay, well, I guess I am trying to make one point: There are experiments one should not perform.
Okay, well, I guess I am trying to make one point: There are experiments one should not perform.
So this came up in an unpublished dialogue. How do we know a nuclear war would be devastating?
We know megaton devices are real and they work because we set them off
We set off the exact warheads mounted in ICBMs
We measured the blast effects at varying levels of overpressure and other parameters on mock structures
We fired the ICBMs without a live
warhead to test the arming and firing mechanisms and accuracy many times.
We fired live ICBMs into space with live warheads during the starfish prime tests
Despite all this, we are worried that ICBMs may not all work, so we also maintain a fleet of bombers and gravity warheads because these are actually tested with live warheads.
Thus everything but “nuke an actual city with and count how many people died and check all the buildings destroyed”...oh right we did that also.
This is how we know nukes are a credible threat that everyone takes a seriously.
With you analogy, there isn’t a sign saying there’s mines. Some “concerned citizen” who leads a small organization some call a cult, who is best known for writing fiction, with no formal education, says there are mines ahead, and produces thousands of pages of text arguing that mines exist and are probably ahead. More recently some Credible Experts (most of whom have no experience in sota AI) signed a couple letters saying there might be mines. (Conspicuously almost no one from SOTA mine labs signed the letter, though one famous guy retired and has spoken out)
The Government ordered mine labs to report if they are working on mines above a certain scale, and there are various lawsuits trying to make mines illegal for infringing on copyright.
Some people say the mines might be nuclear and your stick method won’t work, but no nuclear weapons have ever existed. In fact, in your analogy world, nobody has quite made a working mine. They got close but a human operator still has to sit there and press a button when the mine thinks it is time to explode, the error rate is too high otherwise.
People are also worried that a mine might get test detonated and have a design yield of 10 tons, but might secretly be hiding megaton yield, and they say we should assume the yield is enough to end the planet without evidence.
Some people are asking for a total shutdown of all mine building, but other rival groups seem to be buying an awful lot of explosives and don’t even appear to be slowing down...
Honestly in the end someone is going to have to devise a way to explore the minefield. Hope the stick is long enough.
(I’m liking my analogy even though it is an obvious one.)
To me, it feels like we’re at the moment when Szilard has conceived of the chain reaction, letters to presidents are getting written, and GPT-3 was a Fermi pile-like moment.
I would give it a 97% chance you feel we are not nearly there, yet. (And I should quit creating scientific by association feelings. Fair point.)
To me, I am convinced intelligence is a superpower because the power and control we have over all the other animals. That is enough evidence for me to believe the boom could be big. Humanity was a pretty big “boom” if you are a chimpanzee.
The empiricist in me (and probably you) says: “Feelings are worthless. Do an experiment.”
The rationalist in me says: “Be careful which experiments you do.” (Yes, hope stick is long enough as you say.)
In any event, we agree on: “Do some experiments with a long stick. Quickly.” Agreed!
I would give it a 97% chance you feel we are not nearly there, yet.
So in your analogy, it would be reasonable given the evidence to wonder:
How long before this exotic form of explosive works at all. Imagine how ridiculous it sounds to someone in 1943 that special rocks will blow up like nothing else
How much yield are we talking? Boosted bombs over what can already fit in a b-29? (Say 10 times yield). Kilotons? Megatons? Continent destroying devices? Technically if you assume total conversion the bigger yields are readily available.
Should I believe the worst case, that you can destroy the planet, when you haven’t started a chain reaction yet at all. And then shut down everything. Oh by the way the axis powers are working on it....
So yeah I think my view is more evidence based than those who declare that doom is certain. A “nuke doomer” in 1943 would be saying they KNOW a teraton or greater size device is imminent, with a median timeline of 1949...
As it turned out, no, the bomb needed would be the size of an oil tanker, use expensive materials, and the main “doomsday” element wouldn’t be the crater it leaves but the radioactive cobalt-60 transmuted as a side effect. And nobody can afford to build a doomsday nuke, or least hasn’t felt the need to build one yet.
Scaling and better bomb designs eventually saturated at only about 3 orders of magnitude improvement.
All that would have to be true for “my” view to be correct is that compute vs intelligence curves saturate, especially on old hardware. And that no amount of compute at any level of superintelligence level can actually allow reliable social engineering or hacking of well designed computer systems.
That stops ASIs from escaping and doom can’t happen.
Conversely well maybe nukes can set off the atmosphere. Then doom is certain, there is nothing you can do, you can only delay the experiment.
I am applying myself to try and come up with experiments. I have a kernel of an idea I’m going to hound some Eval experts with and make sure it is already being performed.
Totally agreed that we are fumbling in the dark. (To me, though, I’m fairly convinced there is a cliff out there somewhere given that intelligence is a superpower.)
And, I also agree on the need to be empirical. (Of course, there are some experiments that scare me.)
I am hoping that, just maybe, this framing (Human Worth Hypothesis) will lead to experiments.
In theory you are correct. A mind can be constructed to optimize for anything.
In practice, if you add details—so you have this ASI system. It has some lineage from humans. It is incredibly powerful and efficient and is rapidly gaining resources out in the solar system. Implicitly if you try to reason about a being with the curiosity to have discovered advanced technology humans don’t have (we can’t conquer the solar system with current tech), the persuasive abilities to have convinced humans to provide it with the rockets and seed factories to get started. The hyper-intelligence or diverse machine civilization of ideas that allowed it to get this far.
Is this machine going to destroy its own planet of origin for no purpose but some tiny amount of extra matter it doesn’t need? That seems...stupid and short sighted. Like blowing up Mayan temples because you need gravel. Implicitly contradictory with the cognitive system that got this far.
That’s why I suspect that Orthogonality may be wrong in any plausible timeline, despite being possible in theory. This is a “spherical cow” model of intelligence that may never actually exist.
With that said I would argue that humans should never give seed factories or rockets to any ASI system ever, or task any ASI system with wide sweeping authority with no human review or direct control. That is just bad engineering. I don’t think humans should trust anything we can call an “ASI” with anything but very narrow and checkable tasks.
I believe you are predicting that resource constraints will be unlikely. To use my analogy from the post, you are saying we will likely be safer because the ASI will not require our habitat for its highway. There are so many other places for it to build roads.
I do not think that is a case that it values our wellbeing...just that it will not get around to depriving us of resources because of a cost/benefit analysis.
Do you think the Human Worth hypothesis is likely true? That the more intelligent an agent is the more it will positively value human wellbeing?
That’s not the precise argument. Currently Humans believe the universe as far as we can see is cold and dead. The earth itself—not humans specifically, but this rich biosphere that appears to have evolved through cosmic levels of luck—has value to humans.
Kinda how Mayan ruins have value to humans, we have all the other places on the planet to exploit for resources, we do not need to destroy one of a kind artifacts of an early civilization. It’s not even utility, technically the land the ruins are on would make more money covered in condos, but we humans want to remember and understand our deep past.
Anthropically I am imagine that “ultra smart” means similar long term thinking to humans, just the ASI is better at it, and therefore some ASI would model regretting having destroyed the only evolved life on the universe in the future and not do the bad act of destroying it all.
This does not mean the ASI would help or harm individual humans or avoid killing humans that interfere with it. Just it probably wouldn’t wipe out the entire species and the ecosystem of the planet to make more robots.
Eliezer says exponential growth will exhaust all resources quickly and hes right...but will superintelligence waste a priceless biosphere for less than 0.1 percent more resources? This is possible but seems stupid.
That argument may even be correct—a sufficiently advanced intelligence may see just how much less-interesting matter there is to exploit before the optimization question of “a tiny bit of resources to keep some ruins and the critters who made them around” vs “another few percent of matter to make into computronium or whatever the super-AGI version of paperclips is”.
And then that extends to preserving part of the solar system while exploiting other star systems.
I don’t put a LOT of weight behind that argument—not only is it pretty tenuous (we don’t have any clue how many humans or in what condition the AI will decide is valuable enough to keep—note that we haven’t kept very many ancient ruins), but it ignores the ramp-up problem—the less-capable versions that are trying to get smart and powerful enough to get off of Earth (and then out of the solar system) in the first place.
I would agree with this. The easiest way to “encourage” ASI to leave humans alone would be for humans to arm themselves with the most powerful weapons they can produce, helped by the strongest AI models humans can reliably control. This matter needs to fight back, see Geohot.
I would predict your probability of doom is <10%. Am I right? And no judgment here!! I’m testing myself.
Depends on doom definition? There’s an awful lot of weird futures that might happen that aren’t “everyone is dead and nothing but some single ai is turning the universe to cubes” and “human paradise”. Nature is weird, even our own civilization is barely recognizable to our distant ancestors. We have all kinds of new problems they could not relate to.
I think my general attitude is more that I am highly uncertain what will happen but I feel that an AI “pause” or “shutdown” at this time is clearly not the right decision, because in the past, civilizations that refused to adopt and arm themselves new technologies did not get good outcomes.
I think such choices need to be based on empirical evidence that would convince any rational person. Claiming you know what will happen in the future without evidence is not rational. There is no direct evidence of the Orthogonality hypothesis or most of the arguments for AI doom. There is strong evidence that gpt-4 is useful and a stronger model than gpt-4 is needed to meet meaningful thresholds for general utility.
A rationalist and an empiricist went backpacking together. They got lost, ended up in a desert, and were on the point of death from thirst. They wander to a point where they can see a cool, clear stream in the distance but unfortunately there is a sign that tells them to BEWARE THE MINE FIELD between them and the stream.
The rationalist says, “Let’s reason through this and find a path.” The empiricist says, “What? No. We’re going to be empirical. Follow me.” He starts walking through the mind field and gets blown to bits a few steps in.
The rationalist sits down and dies of thirst.
Alternate endings:
The rationalist gets killed by flying shrapnel along with the empiricist.
The rationalist grabs the empiricist and stops him. He carefully analyzes dirt patterns, draws a map, and tells the empiricist to start walking. The empiricist blows up. The rationalist sits down and dies of thirst chanting “The map is not the territory.”
The rationalist grabs the empiricist, analyzes dirt patterns, draws map, tells empiricist to start walking. Empiricist blows up. Rationalist says, “Hmmm. Now I understand the dirt patterns better.” Rationalist redraws map. Walks through mind field. While drinking water takes off fleece to reveal his “Closet Empiricist” t-shirt.
They sit down together, figure out how to find some magnetic rocks, build a very crude metal detector, put it on the end of a stick, and start making their way slowly through the mine field. Step on a mine and a nuclear mushroom cloud erupts.
So how powerful are those dad gum land mines?? Willingness to perform certain experiments should be a function of the expected size of the boom.
If you think you are walking over sand burs and not land mines, you are more willing to be an empiricist exploring the space. “Ouch don’t step there” instead of “Boom. <black screen>”
If one believes that smarter things will see >0 value in humanity, that is, if you believe some version of the Human Worth Hypothesis, then you believe the land mines are less deadly and it makes sense to proceed...especially for that clear, cool water that could save your life.
I’m not really making a point, here, but just turning the issues into a mental cartoon, I guess.
Okay, well, I guess I am trying to make one point: There are experiments one should not perform.
So this came up in an unpublished dialogue. How do we know a nuclear war would be devastating?
We know megaton devices are real and they work because we set them off
We set off the exact warheads mounted in ICBMs
We measured the blast effects at varying levels of overpressure and other parameters on mock structures
We fired the ICBMs without a live warhead to test the arming and firing mechanisms and accuracy many times.
We fired live ICBMs into space with live warheads during the starfish prime tests
Despite all this, we are worried that ICBMs may not all work, so we also maintain a fleet of bombers and gravity warheads because these are actually tested with live warheads.
Thus everything but “nuke an actual city with and count how many people died and check all the buildings destroyed”...oh right we did that also.
This is how we know nukes are a credible threat that everyone takes a seriously.
With you analogy, there isn’t a sign saying there’s mines. Some “concerned citizen” who leads a small organization some call a cult, who is best known for writing fiction, with no formal education, says there are mines ahead, and produces thousands of pages of text arguing that mines exist and are probably ahead. More recently some Credible Experts (most of whom have no experience in sota AI) signed a couple letters saying there might be mines. (Conspicuously almost no one from SOTA mine labs signed the letter, though one famous guy retired and has spoken out)
The Government ordered mine labs to report if they are working on mines above a certain scale, and there are various lawsuits trying to make mines illegal for infringing on copyright.
Some people say the mines might be nuclear and your stick method won’t work, but no nuclear weapons have ever existed. In fact, in your analogy world, nobody has quite made a working mine. They got close but a human operator still has to sit there and press a button when the mine thinks it is time to explode, the error rate is too high otherwise.
People are also worried that a mine might get test detonated and have a design yield of 10 tons, but might secretly be hiding megaton yield, and they say we should assume the yield is enough to end the planet without evidence.
Some people are asking for a total shutdown of all mine building, but other rival groups seem to be buying an awful lot of explosives and don’t even appear to be slowing down...
Honestly in the end someone is going to have to devise a way to explore the minefield. Hope the stick is long enough.
(I’m liking my analogy even though it is an obvious one.)
To me, it feels like we’re at the moment when Szilard has conceived of the chain reaction, letters to presidents are getting written, and GPT-3 was a Fermi pile-like moment.
I would give it a 97% chance you feel we are not nearly there, yet. (And I should quit creating scientific by association feelings. Fair point.)
To me, I am convinced intelligence is a superpower because the power and control we have over all the other animals. That is enough evidence for me to believe the boom could be big. Humanity was a pretty big “boom” if you are a chimpanzee.
The empiricist in me (and probably you) says: “Feelings are worthless. Do an experiment.”
The rationalist in me says: “Be careful which experiments you do.” (Yes, hope stick is long enough as you say.)
In any event, we agree on: “Do some experiments with a long stick. Quickly.” Agreed!
So in your analogy, it would be reasonable given the evidence to wonder:
How long before this exotic form of explosive works at all. Imagine how ridiculous it sounds to someone in 1943 that special rocks will blow up like nothing else
How much yield are we talking? Boosted bombs over what can already fit in a b-29? (Say 10 times yield). Kilotons? Megatons? Continent destroying devices? Technically if you assume total conversion the bigger yields are readily available.
Should I believe the worst case, that you can destroy the planet, when you haven’t started a chain reaction yet at all. And then shut down everything. Oh by the way the axis powers are working on it....
So yeah I think my view is more evidence based than those who declare that doom is certain. A “nuke doomer” in 1943 would be saying they KNOW a teraton or greater size device is imminent, with a median timeline of 1949...
As it turned out, no, the bomb needed would be the size of an oil tanker, use expensive materials, and the main “doomsday” element wouldn’t be the crater it leaves but the radioactive cobalt-60 transmuted as a side effect. And nobody can afford to build a doomsday nuke, or least hasn’t felt the need to build one yet.
Scaling and better bomb designs eventually saturated at only about 3 orders of magnitude improvement.
All that would have to be true for “my” view to be correct is that compute vs intelligence curves saturate, especially on old hardware. And that no amount of compute at any level of superintelligence level can actually allow reliable social engineering or hacking of well designed computer systems.
That stops ASIs from escaping and doom can’t happen.
Conversely well maybe nukes can set off the atmosphere. Then doom is certain, there is nothing you can do, you can only delay the experiment.
Thank you. You are helping my thinking.
I am applying myself to try and come up with experiments. I have a kernel of an idea I’m going to hound some Eval experts with and make sure it is already being performed.
Totally agreed that we are fumbling in the dark. (To me, though, I’m fairly convinced there is a cliff out there somewhere given that intelligence is a superpower.)
And, I also agree on the need to be empirical. (Of course, there are some experiments that scare me.)
I am hoping that, just maybe, this framing (Human Worth Hypothesis) will lead to experiments.