SimplexAI-m is advocating for good decision theory.
agents that can cooperate with other agents are more effective
This is just another aspect of orthogonality.
Ability to cooperate is instrumentally useful for optimizing a value function in much the same way as intelligence
Super-intelligent super-”moral” clippy still makes us into paperclips because it hasn’t agreed not to and doesn’t need our cooperation
We should build agents that value our continued existence. If the smartest agents don’t, then we die out fairly quickly when they optimise for something else.
EDIT:
to fully cut this Gordian knot, consider that a human can turn over their resources and limit themselves to actions approved by some minimal aligned-with-their-interests AI with the required super-morality.
think a very smart shoulder angel/investment advisor:
can say “no you can’t do that”
manages assets of human in weird post-AGI world
has no other preferences of its own
other than making the human not a blight on existence that has to be destroyed
resulting Human+AI is “super-moral”
requires a trustworthy AI exists that humans can use to implement “super-morality”
In your edit, you are essentially describing somebody being “slap-droned” from the culture series by Ian M. Banks.
This super-moralist-AI-dominated world may look like a darker version of the Culture, where if superintelligent systems determine you or other intelligent systems within their purview are not intrinsically moral enough they contrive a clever way to have you eliminate yourself, and monitor/intervene if you are too non-moral in the meantime.
The difference being, that this version of the culture would not necessarily be all that concerned with maximizing the “human experience” or anything like that.
This super-moralist-AI-dominated world may look like a darker version of the Culture, where if superintelligent systems determine you or other intelligent systems within their purview are not intrinsically moral enough they contrive a clever way to have you eliminate yourself, and monitor/intervene if you are too non-moral in the meantime.
My guess is you get one of two extremes:
build a bubble of human survivable space protected/managed by an aligned AGI
die
with no middle ground. The bubble would be self contained. There’s nothing you can do from inside the bubble to raise a ruckus because if there was you’d already be dead or your neighbors would have built a taller fence-like-thing at your expense so the ruckus couldn’t affect them.
The whole scenario seems unlikely since building the bubble requires an aligned AGI and if we have those we probably won’t be in this mess to begin with. Winner take all dynamics abound. The rich get richer (and smarter) and humans just lose unless the first meaningfully smarter entity we build is aligned.
We should build agents that value our continued existence.
Can you explain the reasoning for this?
Even an agent that values humanity’s continued existence to the highest degree could still accidentally release a novel virus into the wild, such as a super-COVID-3.
So it seems hardly sufficient, or even desirable, if it makes the agent even the slightest bit overconfident in their correctness.
It seems more likely that the optimal mixture of ’should’s for such agents will be far more complex.
Agreed, recklessness is also bad. If we build an agent that prefers we keep existing we should also make sure it pursues that goal effectively and doesn’t accidentally kill us.
My reasoning is that we won’t be able to coexist with something smarter than us that doesn’t value us being alive if wants our energy/atoms.
barring new physics that lets it do it’s thing elsewhere, “wants our energy/atoms” seems pretty instrumentally convergent
“don’t built it” doesn’t seem plausible so:
we should not build things that kill us.
This probably means:
wants us to keep existing
effectively pursues that goal
note:”should” assumes you care about us not all dying. “Humans dying is good actually” accelerationists can ignore this advice obviously.
Things we shouldn’t build:
very chaotic but good autoGPT7 that:
make the most deadly possible virus (because it was curious)
accidentally release it (due to inadequate safety precautions)
compulsive murderer autoGPT7
it values us being alive but it’s also a compulsive murderer so it fails at that goal.
I predict a very smart agent won’t have such obvious failure modes unless it has very strange preferences
the virologists that might have caused COVID are a pretty convincing counterexample though
so yes recklessness is also bad.
In summary:
if you build a strong optimiser
or a very smart agent (same thing really)
make sure it doesn’t: kill everyone / (equivalently bad thing)
caring about us and not being horrifically reckless are two likely necessary properties of any such “not kill us all” agent
SimplexAI-m is advocating for good decision theory.
agents that can cooperate with other agents are more effective
This is just another aspect of orthogonality.
Ability to cooperate is instrumentally useful for optimizing a value function in much the same way as intelligence
Super-intelligent super-”moral” clippy still makes us into paperclips because it hasn’t agreed not to and doesn’t need our cooperation
We should build agents that value our continued existence. If the smartest agents don’t, then we die out fairly quickly when they optimise for something else.
EDIT:
to fully cut this Gordian knot, consider that a human can turn over their resources and limit themselves to actions approved by some minimal aligned-with-their-interests AI with the required super-morality.
think a very smart shoulder angel/investment advisor:
can say “no you can’t do that”
manages assets of human in weird post-AGI world
has no other preferences of its own
other than making the human not a blight on existence that has to be destroyed
resulting Human+AI is “super-moral”
requires a trustworthy AI exists that humans can use to implement “super-morality”
In your edit, you are essentially describing somebody being “slap-droned” from the culture series by Ian M. Banks.
This super-moralist-AI-dominated world may look like a darker version of the Culture, where if superintelligent systems determine you or other intelligent systems within their purview are not intrinsically moral enough they contrive a clever way to have you eliminate yourself, and monitor/intervene if you are too non-moral in the meantime.
The difference being, that this version of the culture would not necessarily be all that concerned with maximizing the “human experience” or anything like that.
My guess is you get one of two extremes:
build a bubble of human survivable space protected/managed by an aligned AGI
die
with no middle ground. The bubble would be self contained. There’s nothing you can do from inside the bubble to raise a ruckus because if there was you’d already be dead or your neighbors would have built a taller fence-like-thing at your expense so the ruckus couldn’t affect them.
The whole scenario seems unlikely since building the bubble requires an aligned AGI and if we have those we probably won’t be in this mess to begin with. Winner take all dynamics abound. The rich get richer (and smarter) and humans just lose unless the first meaningfully smarter entity we build is aligned.
Can you explain the reasoning for this?
Even an agent that values humanity’s continued existence to the highest degree could still accidentally release a novel virus into the wild, such as a super-COVID-3.
So it seems hardly sufficient, or even desirable, if it makes the agent even the slightest bit overconfident in their correctness.
It seems more likely that the optimal mixture of ’should’s for such agents will be far more complex.
Agreed, recklessness is also bad. If we build an agent that prefers we keep existing we should also make sure it pursues that goal effectively and doesn’t accidentally kill us.
My reasoning is that we won’t be able to coexist with something smarter than us that doesn’t value us being alive if wants our energy/atoms.
barring new physics that lets it do it’s thing elsewhere, “wants our energy/atoms” seems pretty instrumentally convergent
“don’t built it” doesn’t seem plausible so:
we should not build things that kill us.
This probably means:
wants us to keep existing
effectively pursues that goal
note:”should” assumes you care about us not all dying. “Humans dying is good actually” accelerationists can ignore this advice obviously.
Things we shouldn’t build:
very chaotic but good autoGPT7 that:
make the most deadly possible virus (because it was curious)
accidentally release it (due to inadequate safety precautions)
compulsive murderer autoGPT7
it values us being alive but it’s also a compulsive murderer so it fails at that goal.
I predict a very smart agent won’t have such obvious failure modes unless it has very strange preferences
the virologists that might have caused COVID are a pretty convincing counterexample though
so yes recklessness is also bad.
In summary:
if you build a strong optimiser
or a very smart agent (same thing really)
make sure it doesn’t: kill everyone / (equivalently bad thing)
caring about us and not being horrifically reckless are two likely necessary properties of any such “not kill us all” agent