Killing all humans with any other way except X will still satisfy this condition.
Sadly for us, survival of humanity is a very specific thing. This is just the whole premise of the alignment problem once again.
None of the things here nor in your last reply seems particularly likely, so there’s no telling in principle which set outweighs the other. Hence my previous assertion that we should be approximately completely unsure of what happens.
Aren’t you arguing that AI will be aligned by default? This seems to be a very different position that being completely unsure what happens.
Total probability of all the simulation hypothesises that reward AI for courses of action that lead to not killing humans has to exceed the total probability of all simulation hypothesises that reward AI for courses of action that erradicate humanity, so that all humans were not killed. As there is no particular reason to expect that it’s the case, your simulation argument doesn’t work.
Thinking about this a bit more, I realize I’m confused.
Aren’t you arguing that AI will be aligned by default?
I really thought I wasn’t before, but now I feel it would only require a simple tweak to the original argument (which might then be proving too much, but I’m interested in exploring more in depth what’s wrong with it).
Revised argument: there is at least one very plausible scenario (described in the OP) in which the ASI is being simulated precisely for its willingness to spare us. It’s very implausible that it would be simulated for the exact opposite goal, so us not getting spared is, in all but the tiniest subset of cases, an unintended byproduct. Since that byproduct is avoidable with minimal sacrifice of output (of the order of 4.5e-10), it might as well be avoided just in case, given I expect the likelihood of the simulation being run for the purpose described in the OP to be a few orders of magnitude higher, as I noted earlier.
I don’t quite see what’s wrong with this revised argument, save for the fact that it seems to prove too much and that other people would probably already have thought of it if it were true. Why isn’t it true?
there is at least one very plausible scenario (described in the OP)
This scenario presents one plausibly sounding story, but you can present a plausibly sounding story for any reason to be simulated.
It’s very implausible that it would be simulated for the exact opposite goal
For example, here our AI can be a subroutine of a more powerful AI that runs the simulation to figue out the best way to get rid off humanity and the subroutine that performs the best gets to implement its plan in reality.
or
It can be all be a test of a video game AI, and whichever performs the best will be released with the game and therefore installed on multiple computers and executed multiple times.
AI will be in a position where it knows nothing about the world outside of simulation or the reasons why it’s simulated. It has no reason to assume that preserving humanity is more likely to be what the simulation overlords want than erradicating humanity. And without that simulation considerations do not give it any reason to spare humans.
I’m afraid your argument proves too much. By that exact same logic, knowing you were created by a more powerful being (God) would similarly tell you absolutely nothing about what the purpose of life is, for instance. If that were true, the entire discussion of theism vs. atheism would suddenly evaporate.
I think you are confusing knowing that something is true with suspecting that something might be true, based on this thing being true in a simulation.
If I knew for sure that I’m created by a specific powerful being that would give me some information about what this being might want me to do. But conditionally on all of this being a simulation, I have no idea what the creators of the simulation, want me to do. In other words, simulation hypothesis makes me unsure about who my real creator is, even if before entertaining this hypothesis I could’ve been fairly certain about it.
Otherwise, it would mean that it’s only possible to create simulations where everyone is created the same way as in the real world.
That said,
By that exact same logic, knowing you were created by a more powerful being (God) would similarly tell you absolutely nothing about what the purpose of life is, for instance. If that were true, the entire discussion of theism vs. atheism would suddenly evaporate.
The discussion of theism vs atheis is about the existence of God. Obviously if we knew that God exists the discussion would evaporate. However the question of purpose of life would not. Even if I can infer the desires of my creator, this doesn’t bridge the is-ought gap and doesn’t make such desires the objective purpose of my life. I’ll still have to choose whether to satisfy these desires or not. The existence of God solves approximately zero philosophical problems.
Otherwise, it would mean that it’s only possible to create simulations where everyone is created the same way as in the real world.
It’s certainly possible for simulations to differ from reality, but they seem less useful the more divergent from reality they are. Maybe the simulation could be for pure entertainment (more like a video game), but you should ascribe a relatively low prior to that IMO.
The discussion of theism vs atheis is about the existence of God. Obviously if we knew that God exists the discussion would evaporate. However the question of purpose of life would not.
There’s a reason people don’t have the same level of enthusiasm when discussing the existence of dragons, though. If dragons do exist, that changes nothing: you’d take it as a curiosity and move on with your life. Certainly not so if you were to conclude that God exists. Maybe you can still not know with 100% certainty what it is that God wants, but can we at least agree it changes the distribution of probabilities somehow?
Even if I can infer the desires of my creator, this doesn’t bridge the is-ought gap and doesn’t make such desires the objective purpose of my life. I’ll still have to choose whether to satisfy these desires or not.
It does if you simultaneously think your creator will eternally reward you for doing so, and/or eternally punish you for failing to. Which if anything seems even more obvious in the case of a simulation, btw.
It’s certainly possible for simulations to differ from reality, but they seem less useful the more divergent from reality they are.
Depends on what the simulation is being used for, which you also can’t deduce from inside of it.
Maybe the simulation could be for pure entertainment (more like a video game), but you should ascribe a relatively low prior to that IMO.
Why? This statement requires some justification.
I’d expect a decent chunk of high fidelity simulations made by humans to be made for entertainment, maybe even absolute majority, if we take into account how we’ve been using similar technologies so far.
It does if you simultaneously think your creator will eternally reward you for doing so, and/or eternally punish you for failing to.
Not at all. You still have to evaluate this offer using your own mind and values. You can’t sidestep this process by simply assuming that Creator’s will by definition is the purpose of your life, and therefore you have no choice but to obey.
Not at all. You still have to evaluate this offer using your own mind and values. You can’t sidestep this process by simply assuming that Creator’s will by definition is the purpose of your life, and therefore you have no choice but to obey.
I’ll focus on this first, as it seems that the other points would be moot if we can’t even agree on this one. Are you really saying that even if you know with 100% certainty that God exists AND lays down explicit laws for you to follow AND maximally rewards you for all eternity for following those laws AND maximally punishes you for all eternity for failing to folllow those laws, you would still have to “evaluate” and could potentially arrive at a conclusion other than that the purpose of life is follow God’s laws?
How does someone punishing you or rewarding you make their laws your purpose in life (other than you choosing that you want to be rewarded and not punished)?
To be rewarded (and even more so “maximally rewarded”) is to be given something you actually want (and the reverse for being punished). That’s the definition of what a reward/punishment is. You don’t “choose” to want/not want it, any more than you “choose” your utility function. It just is what it is. Being “rewarded” with something you don’t want is a contradiction in terms: at best someone tried to reward you, but that attempt failed.
I see your argument. You are saying that “maximal reward”, by definition, is something that gives us the maximum utility from all possible actions, and so, by definition, it is our purpose in life.
But actually, utility is a function of both the action (getting two golden bricks) and what it rewards (murdering my child), not merely a function of the action itself (getting two golden bricks).
And so it happens that for many possible demands that I could be given (“you have to murder your child”), there are no possible rewards that would give me more utility than not obeying the command.
For that reason, simply because someone will maximally reward me for obeying them doesn’t make their commands my objective purpose in life.
Of course, we can respond “but then, by definition, they aren’t maximally rewarding you” and by that definition, it would be a correct statement to make. The problem here is that the set of all possible commands for which I can’t (by that definition) be maximally rewarded is so vast that the statement “if someone maximally rewards/punishes you, their orders are your purpose of life” becomes meaningless.
The problem here is that the set of all possible commands for which I can’t (by that definition) be maximally rewarded is so vast that the statement “if someone maximally rewards/punishes you, their orders are your purpose of life” becomes meaningless.
Not true, as the reward could include all of the unwanted consequences of following the command being divinely reverted a fraction of a second later.
That wouldn’t help. Then the utility would be calculated from (getting two golden bricks) and (murdering my child for a fraction of a second), which still brings lower utility than not following the command.
The set of possible commands for which I can’t be maximally rewarded still remains too vast for the statement to be meaningful.
This sounds absurd to me. Unless of course you’re taking the “two golden bricks” literally, in which case I invite you to substitute it by “saving 1 billion other lives” and seeing if your position still stands.
I think you’re interpreting far too literally the names of the simulation scenarios I jotted down. Your ability to trade is compromised if there’s no one left to trade with, for instance. But none of that matters much, really, as those are meant to be illustrative only.
Aren’t you arguing that AI will be aligned by default?
No. I’m really arguing that we don’t know whether or not it’ll be aligned by default.
As there is no particular reason to expect that it’s the case,
I also don’t see any particular reason to expect that the opposite would be the case, which is why I maintain that we don’t know. But as I understand it, you seem to think there is indeed reason to expect the opposite, because:
Sadly for us, survival of humanity is a very specific thing. This is just the whole premise of the alignment problem once again.
I think the problem here is that is that you’re using the word “specific” with a different meaning than people normally use in this context. Survival of humanity sure is a “specific” thing in the sense that it’ll require specific planning on the part of the ASI. It is however not “specific” in the sense that it’s hard to do if the ASI wants it done, it’s just that we don’t know how to make it want that. Abstract considerations about simulations might just do the trick automatically.
While I understand what you were trying to say, I think it’s important to notice that:
Killing all humans without being noticed will still satisfy this condition.
Killing all humans after trading with them in some way will still satisfy this condition
Killing all humans with any other way except X will still satisfy this condition.
Sadly for us, survival of humanity is a very specific thing. This is just the whole premise of the alignment problem once again.
Aren’t you arguing that AI will be aligned by default? This seems to be a very different position that being completely unsure what happens.
Total probability of all the simulation hypothesises that reward AI for courses of action that lead to not killing humans has to exceed the total probability of all simulation hypothesises that reward AI for courses of action that erradicate humanity, so that all humans were not killed. As there is no particular reason to expect that it’s the case, your simulation argument doesn’t work.
Thinking about this a bit more, I realize I’m confused.
I really thought I wasn’t before, but now I feel it would only require a simple tweak to the original argument (which might then be proving too much, but I’m interested in exploring more in depth what’s wrong with it).
Revised argument: there is at least one very plausible scenario (described in the OP) in which the ASI is being simulated precisely for its willingness to spare us. It’s very implausible that it would be simulated for the exact opposite goal, so us not getting spared is, in all but the tiniest subset of cases, an unintended byproduct. Since that byproduct is avoidable with minimal sacrifice of output (of the order of 4.5e-10), it might as well be avoided just in case, given I expect the likelihood of the simulation being run for the purpose described in the OP to be a few orders of magnitude higher, as I noted earlier.
I don’t quite see what’s wrong with this revised argument, save for the fact that it seems to prove too much and that other people would probably already have thought of it if it were true. Why isn’t it true?
This scenario presents one plausibly sounding story, but you can present a plausibly sounding story for any reason to be simulated.
For example, here our AI can be a subroutine of a more powerful AI that runs the simulation to figue out the best way to get rid off humanity and the subroutine that performs the best gets to implement its plan in reality.
or
It can be all be a test of a video game AI, and whichever performs the best will be released with the game and therefore installed on multiple computers and executed multiple times.
The exact story doesn’t matter. Any particular story is less likely than the whole class of all possible scenarious that lead to a particular reward structure of a simulation.
AI will be in a position where it knows nothing about the world outside of simulation or the reasons why it’s simulated. It has no reason to assume that preserving humanity is more likely to be what the simulation overlords want than erradicating humanity. And without that simulation considerations do not give it any reason to spare humans.
I’m afraid your argument proves too much. By that exact same logic, knowing you were created by a more powerful being (God) would similarly tell you absolutely nothing about what the purpose of life is, for instance. If that were true, the entire discussion of theism vs. atheism would suddenly evaporate.
I think you are confusing knowing that something is true with suspecting that something might be true, based on this thing being true in a simulation.
If I knew for sure that I’m created by a specific powerful being that would give me some information about what this being might want me to do. But conditionally on all of this being a simulation, I have no idea what the creators of the simulation, want me to do. In other words, simulation hypothesis makes me unsure about who my real creator is, even if before entertaining this hypothesis I could’ve been fairly certain about it.
Otherwise, it would mean that it’s only possible to create simulations where everyone is created the same way as in the real world.
That said,
The discussion of theism vs atheis is about the existence of God. Obviously if we knew that God exists the discussion would evaporate. However the question of purpose of life would not. Even if I can infer the desires of my creator, this doesn’t bridge the is-ought gap and doesn’t make such desires the objective purpose of my life. I’ll still have to choose whether to satisfy these desires or not. The existence of God solves approximately zero philosophical problems.
It’s certainly possible for simulations to differ from reality, but they seem less useful the more divergent from reality they are. Maybe the simulation could be for pure entertainment (more like a video game), but you should ascribe a relatively low prior to that IMO.
There’s a reason people don’t have the same level of enthusiasm when discussing the existence of dragons, though. If dragons do exist, that changes nothing: you’d take it as a curiosity and move on with your life. Certainly not so if you were to conclude that God exists. Maybe you can still not know with 100% certainty what it is that God wants, but can we at least agree it changes the distribution of probabilities somehow?
It does if you simultaneously think your creator will eternally reward you for doing so, and/or eternally punish you for failing to. Which if anything seems even more obvious in the case of a simulation, btw.
Depends on what the simulation is being used for, which you also can’t deduce from inside of it.
Why? This statement requires some justification.
I’d expect a decent chunk of high fidelity simulations made by humans to be made for entertainment, maybe even absolute majority, if we take into account how we’ve been using similar technologies so far.
Not at all. You still have to evaluate this offer using your own mind and values. You can’t sidestep this process by simply assuming that Creator’s will by definition is the purpose of your life, and therefore you have no choice but to obey.
I’ll focus on this first, as it seems that the other points would be moot if we can’t even agree on this one. Are you really saying that even if you know with 100% certainty that God exists AND lays down explicit laws for you to follow AND maximally rewards you for all eternity for following those laws AND maximally punishes you for all eternity for failing to folllow those laws, you would still have to “evaluate” and could potentially arrive at a conclusion other than that the purpose of life is follow God’s laws?
How does someone punishing you or rewarding you make their laws your purpose in life (other than you choosing that you want to be rewarded and not punished)?
To be rewarded (and even more so “maximally rewarded”) is to be given something you actually want (and the reverse for being punished). That’s the definition of what a reward/punishment is. You don’t “choose” to want/not want it, any more than you “choose” your utility function. It just is what it is. Being “rewarded” with something you don’t want is a contradiction in terms: at best someone tried to reward you, but that attempt failed.
I see your argument. You are saying that “maximal reward”, by definition, is something that gives us the maximum utility from all possible actions, and so, by definition, it is our purpose in life.
But actually, utility is a function of both the action (getting two golden bricks) and what it rewards (murdering my child), not merely a function of the action itself (getting two golden bricks).
And so it happens that for many possible demands that I could be given (“you have to murder your child”), there are no possible rewards that would give me more utility than not obeying the command.
For that reason, simply because someone will maximally reward me for obeying them doesn’t make their commands my objective purpose in life.
Of course, we can respond “but then, by definition, they aren’t maximally rewarding you” and by that definition, it would be a correct statement to make. The problem here is that the set of all possible commands for which I can’t (by that definition) be maximally rewarded is so vast that the statement “if someone maximally rewards/punishes you, their orders are your purpose of life” becomes meaningless.
Not true, as the reward could include all of the unwanted consequences of following the command being divinely reverted a fraction of a second later.
That wouldn’t help. Then the utility would be calculated from (getting two golden bricks) and (murdering my child for a fraction of a second), which still brings lower utility than not following the command.
The set of possible commands for which I can’t be maximally rewarded still remains too vast for the statement to be meaningful.
This sounds absurd to me. Unless of course you’re taking the “two golden bricks” literally, in which case I invite you to substitute it by “saving 1 billion other lives” and seeing if your position still stands.
I think you’re interpreting far too literally the names of the simulation scenarios I jotted down. Your ability to trade is compromised if there’s no one left to trade with, for instance. But none of that matters much, really, as those are meant to be illustrative only.
No. I’m really arguing that we don’t know whether or not it’ll be aligned by default.
I also don’t see any particular reason to expect that the opposite would be the case, which is why I maintain that we don’t know. But as I understand it, you seem to think there is indeed reason to expect the opposite, because:
I think the problem here is that is that you’re using the word “specific” with a different meaning than people normally use in this context. Survival of humanity sure is a “specific” thing in the sense that it’ll require specific planning on the part of the ASI. It is however not “specific” in the sense that it’s hard to do if the ASI wants it done, it’s just that we don’t know how to make it want that. Abstract considerations about simulations might just do the trick automatically.