I’m somewhat skeptical of the idea that there isn’t a universal morality that even a paperclip maximizer would converge to
You mean you’re somewhat convinced that there is a universal morality (that even a paperclip maximizer would converge to)? That sounds like a much less tenable position. I mean,
There’s goal system zero / God’s utility function / Universal Instrumental Values.
I’ve linkified the grandparent a bit—for those not familiar with the ideas.
The main idea is that many agents which are serious about attaining their long term goals will first take control of large quantities of spactime and resources—before they do very much else—to avoid low-utility fates like getting eaten by aliens.
Such goals represent something like an attractor in ethics-space. You could avoid the behaviour associated with the attractor by using discounting, or by adding constraints—at the expense of making the long-term goal less likely to be attained.
Thx for this. I found those links and the idea itself fascinating. Does anyone know if Roko or Hollerith developed the idea much further?
One is reminded of the famous quote from 1984: O’Brien to Winston: “Power is not a means. Power is the end.” But it certainly makes sense, that as an agent becomes better integrated into a coalition or community, and his day-to-day goals become more weighted toward the terminal values of other people and less weighted toward his own terminal values, that an agent might be led to rewrite his own utility function toward Power—instrumental power to achieve any goal makes sense as a synthetic terminal value.
After all, most of our instinctual terminal values—sexual pleasure, food, good health, social status, the joy of victory and the agony of defeat—were originally instrumental values from the standpoint of their ‘author’: natural selection.
Does anyone know if Roko or Hollerith developed the idea much further?
Roko combined the conccept with the (rather less sensible) idea of promoting those instrumental values into terminal values—and was met with a chorus of “Unfriendly AI”.
I was a little bit concerned about your initial Omohundro reaction.
Omohundro’s material is mostly fine and interesting. It’s a bit of a shame that there isn’t more maths—but it is a difficult area where it is tricky to prove things. Plus, IMO, he has the occasional zany idea that takes your brain to interesting places it didn’t dream of before.
Interesting, hadn’t seen Hollerith’s posts before. I came to a similar conclusion about AIXI’s behavior as exemplifying a final attractor in intelligent systems with long planning horizons.
If the horizon is long enough (infinite), the single behavioral attractor is maximizing computational power and applying it towards extensive universal simulation/prediction.
This relates to simulism and the SA, as any superintelligences/gods can thus be expected to create many simulated universes, regardless of their final goal evaluation criteria.
In fact, perhaps the final goal criteria applies to creating new universes with the desired properties.
These sound instrumental; you take control of the universe in order to achieve your terminal goals. That seems slightly different from what Newsome was talking about, which was more a converging of terminal goals on one superterminal goal.
You mean you’re somewhat convinced that there is a universal morality (that even a paperclip maximizer would converge to)? That sounds like a much less tenable position.
You mean you’re somewhat convinced that there is a universal morality (that even a paperclip maximizer would converge to)? That sounds like a much less tenable position. I mean,
A statement like this needs some support.
I’ve linkified the grandparent a bit—for those not familiar with the ideas.
The main idea is that many agents which are serious about attaining their long term goals will first take control of large quantities of spactime and resources—before they do very much else—to avoid low-utility fates like getting eaten by aliens.
Such goals represent something like an attractor in ethics-space. You could avoid the behaviour associated with the attractor by using discounting, or by adding constraints—at the expense of making the long-term goal less likely to be attained.
Thx for this. I found those links and the idea itself fascinating. Does anyone know if Roko or Hollerith developed the idea much further?
One is reminded of the famous quote from 1984: O’Brien to Winston: “Power is not a means. Power is the end.” But it certainly makes sense, that as an agent becomes better integrated into a coalition or community, and his day-to-day goals become more weighted toward the terminal values of other people and less weighted toward his own terminal values, that an agent might be led to rewrite his own utility function toward Power—instrumental power to achieve any goal makes sense as a synthetic terminal value.
After all, most of our instinctual terminal values—sexual pleasure, food, good health, social status, the joy of victory and the agony of defeat—were originally instrumental values from the standpoint of their ‘author’: natural selection.
Roko combined the conccept with the (rather less sensible) idea of promoting those instrumental values into terminal values—and was met with a chorus of “Unfriendly AI”.
Hollerith produced several pages on the topic.
Probably the best-known continuation is via Omohundro.
“Universal Instrumental Values” is much the same idea as “Basic AI drives” dressed up a little differently:
http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/
http://selfawaresystems.com/2007/10/05/paper-on-the-nature-of-self-improving-artificial-intelligence/
You are right. I hadn’t made that connection. Now I have a little more respect for Omohundro’s work.
I was a little bit concerned about your initial Omohundro reaction.
Omohundro’s material is mostly fine and interesting. It’s a bit of a shame that there isn’t more maths—but it is a difficult area where it is tricky to prove things. Plus, IMO, he has the occasional zany idea that takes your brain to interesting places it didn’t dream of before.
I maintain some Omohundro links here.
As a side point, you could also re-read “Basic AI drives” as “Basic Replicator Drives”—it’s systemic evolution.
Interesting, hadn’t seen Hollerith’s posts before. I came to a similar conclusion about AIXI’s behavior as exemplifying a final attractor in intelligent systems with long planning horizons.
If the horizon is long enough (infinite), the single behavioral attractor is maximizing computational power and applying it towards extensive universal simulation/prediction.
This relates to simulism and the SA, as any superintelligences/gods can thus be expected to create many simulated universes, regardless of their final goal evaluation criteria.
In fact, perhaps the final goal criteria applies to creating new universes with the desired properties.
These sound instrumental; you take control of the universe in order to achieve your terminal goals. That seems slightly different from what Newsome was talking about, which was more a converging of terminal goals on one superterminal goal.
Thus one the proposed titles: “Universal Instrumental Values”.
Newsome didn’t distinguish between instrumental and terminal values.
Those were Newsome’s words.
Ah. I misunderstood the quoting.