Unless I’m missing something, the trouble with this is that, absent a leverage penalty, all of the reasons you’ve listed for not having a muggable decision algorithm… drumroll… center on the real world, which, absent a leverage penalty, is vastly outweighed by tiny probabilities of googolplexes and ackermann numbers of utilons. If you don’t already consider the Mugger’s claim to be vastly improbable, then all the considerations of “But if I logically decide to let myself be mugged that retrologically increases his probability of lying” or “If I let myself mugged this real-world scenario will be repeated many times” are vastly outweighed by the tiny probability that the Mugger is telling the truth.
I thought for a while about the best way to formalize what I’m thinking in a way that works here.
I do observe that I keep getting tempted by “hey there’s an obvious leverage penalty here” by the “this can only happen for real one time in X where X is the number of lives saved times the percent of people who agree” because of the details of the mugging. Or alternatively, the maximum total impact people together can expect to have in paying off things like this (before the Mugger shows up) seems reasonably capped at one life per person, so our collective real-world decisions clearly matter far more than that very-much-not-hard upper bound.
I think that points to the answer I like most, which is that my reasons aren’t tied only to the real world. They’re also tied to the actions of other logically correlated agents, which includes the people on other worlds that I’m possibly being offered the chance to save, and also the Matrix lords. I mean, we’d hate to have bad results presented at the Matrix Lord decision theory conference he’s doing research for, that seems rather important. Science. My decision reaches into the origin worlds of the Matrix Lords, and if they spent all their days going around Mugging each other with lies about Matrix Lords during their industrial period, it’s doubtful the Matrix gets constructed at all. Then I wouldn’t even exist to pay the guy. It reaches into all the other worlds in the Matrix, which have to make the same decisions I do, and we can’t all be given this opportunity to win ackermann numbers of utilons.
I think I don’t need to do that, and I can get out of this simply by saying that before I see the Mugger I have to view potential genuine Muggers offering googleplexes of utility without evidence as not a large source of potential wins (e.g. at this point my probability really is that low). I mean, assuming the potential Mugger hasn’t seen any of these discussions (I do think there’s at least a one in a ten chance someone tries this Mugging on me within the year, for the fun of it, and it’s probably a favorite), the likelihood ratio of someone trying this even with no evidence is really high. But it’s not that high, it’s not googleplex high. So now we get back to the question of infinitesimal priors and how to react to evidence, and here I think we’re close but have a subtle disagreement about how to handle the question...
(I encourage others reading this to go back and read/remember Eliezer’s response to the Mugger, where he explains that he is allowed to be logically inconsistent due to incomplete computing power, which basically I agree with.)
I take the view that the fact that someone is making a claim at all is good enough to pop me out of my 1:3↑↑↑↑3 level prior, and into numbers that can be expressed reasonably with only one up arrow, and allows me to be inconsistent, simply because (I believe that) someone is saying it at all. Then the sky rift does it again, and I can get us into no-exponents-needed range. But I chose how I make decisions before the statement was made, and I did that without attaching much value to the “actual Matrix lord” branch of utility while putting a lot of utility on the “people might be crazy or lie to me” branch. So I’ve shifted my probability a lot but I still won’t pay until I see the rift. This feels potentially important, that I can and must be inconsistent in my probabilities in response to evidence, due to limited compute, but I shouldn’t throw out the decision algorithm (or at least, not unless this points to a logical mistake in it, or otherwise justifies doing that in similar fashion).
So one answer is that I’m putting my logical inconsistency after I choose my decision algorithm, so the fact that I’m then acting in a non-optimal way in reaction to the new utilon math is justified. The second answer is that if you’re allowed to offer probabilistic off-world utilon considerations then so am I, and mine (through correlated decision functions) still win both in the Matrix and non-Matrix cases. The third answer, as noted in III, is that Bayes’ rule and using logic often does imply something that’s effectively a lot like a leverage penalty in many cases, due to the nature of the propositions, although I think it’s quite reasonable to think that in general most people could in fact find ways to have quite oversize (and possibly not very bounded) impact if they put their minds to it. (Hey, I’m trying, guys.)
(Note to answer other comments I’ve seen: Yes, I can also get out of this via either risk aversion or bounded utility, or similar concepts, if I’m willing to use them. Granted. I’m exploring the case where my utility isn’t bounded, or it’s bounded stupidly large, partly because I think it shouldn’t be very bounded, and partly because it’s important for AGI cases.)
If I build an AGI and get to choose its utility function, I could choose to copy my own (or what mine would be under reflection, a personal CEV), and that’s far from the worst outcome, but as a group we have solutions that we prefer a lot more and that don’t prioritize me overly much, such as CEV. The CEV of an effectively unbounded group of potential agents (yes, laws of physics bounded, but I assume we both agree that’s not a bound small enough to matter here) is effectively unbounded even if each individual agent’s function is tightly bounded.
This should (I think) make intuitive sense; a group of people who individually want basic human-level stuff discover the best way to do that is to coordinate to build a great nation/legacy/civilization, and break the bound on what they care about. The jumps we’re considering don’t seem different in kind to that.
People who don’t exist until after an AGI is created don’t have much influence over how that AGI is designed, and I don’t see any need to make concessions to them (except for the fact that we care about their preferences being satisfied, of course, but that will be reflected in our utility functions).
If you already care about a legacy as a human and you do something to make an advanced computer system aligned with you. Then the advanced computer system should also care about legacy. I don’t see anything as being lost .
I find attempts to reify the legacy as anything more than a shared agreement between near peers deeply disturbing. Mainly because our understanding of nature, humans and reality are tentative and leaky abstractions, and will remain the same way for the foreseeable. Any reification of civilisation will need a method of revising it in some way.
So much will be lost if humans are no longer capable of being active participants in the revision of the greater system. Conversations like this would be pointless. I have more to say on this subject, I’ll try to write something in a bit
Unless I’m missing something, the trouble with this is that, absent a leverage penalty, all of the reasons you’ve listed for not having a muggable decision algorithm… drumroll… center on the real world, which, absent a leverage penalty, is vastly outweighed by tiny probabilities of googolplexes and ackermann numbers of utilons. If you don’t already consider the Mugger’s claim to be vastly improbable, then all the considerations of “But if I logically decide to let myself be mugged that retrologically increases his probability of lying” or “If I let myself mugged this real-world scenario will be repeated many times” are vastly outweighed by the tiny probability that the Mugger is telling the truth.
I thought for a while about the best way to formalize what I’m thinking in a way that works here.
I do observe that I keep getting tempted by “hey there’s an obvious leverage penalty here” by the “this can only happen for real one time in X where X is the number of lives saved times the percent of people who agree” because of the details of the mugging. Or alternatively, the maximum total impact people together can expect to have in paying off things like this (before the Mugger shows up) seems reasonably capped at one life per person, so our collective real-world decisions clearly matter far more than that very-much-not-hard upper bound.
I think that points to the answer I like most, which is that my reasons aren’t tied only to the real world. They’re also tied to the actions of other logically correlated agents, which includes the people on other worlds that I’m possibly being offered the chance to save, and also the Matrix lords. I mean, we’d hate to have bad results presented at the Matrix Lord decision theory conference he’s doing research for, that seems rather important. Science. My decision reaches into the origin worlds of the Matrix Lords, and if they spent all their days going around Mugging each other with lies about Matrix Lords during their industrial period, it’s doubtful the Matrix gets constructed at all. Then I wouldn’t even exist to pay the guy. It reaches into all the other worlds in the Matrix, which have to make the same decisions I do, and we can’t all be given this opportunity to win ackermann numbers of utilons.
I think I don’t need to do that, and I can get out of this simply by saying that before I see the Mugger I have to view potential genuine Muggers offering googleplexes of utility without evidence as not a large source of potential wins (e.g. at this point my probability really is that low). I mean, assuming the potential Mugger hasn’t seen any of these discussions (I do think there’s at least a one in a ten chance someone tries this Mugging on me within the year, for the fun of it, and it’s probably a favorite), the likelihood ratio of someone trying this even with no evidence is really high. But it’s not that high, it’s not googleplex high. So now we get back to the question of infinitesimal priors and how to react to evidence, and here I think we’re close but have a subtle disagreement about how to handle the question...
(I encourage others reading this to go back and read/remember Eliezer’s response to the Mugger, where he explains that he is allowed to be logically inconsistent due to incomplete computing power, which basically I agree with.)
I take the view that the fact that someone is making a claim at all is good enough to pop me out of my 1:3↑↑↑↑3 level prior, and into numbers that can be expressed reasonably with only one up arrow, and allows me to be inconsistent, simply because (I believe that) someone is saying it at all. Then the sky rift does it again, and I can get us into no-exponents-needed range. But I chose how I make decisions before the statement was made, and I did that without attaching much value to the “actual Matrix lord” branch of utility while putting a lot of utility on the “people might be crazy or lie to me” branch. So I’ve shifted my probability a lot but I still won’t pay until I see the rift. This feels potentially important, that I can and must be inconsistent in my probabilities in response to evidence, due to limited compute, but I shouldn’t throw out the decision algorithm (or at least, not unless this points to a logical mistake in it, or otherwise justifies doing that in similar fashion).
So one answer is that I’m putting my logical inconsistency after I choose my decision algorithm, so the fact that I’m then acting in a non-optimal way in reaction to the new utilon math is justified. The second answer is that if you’re allowed to offer probabilistic off-world utilon considerations then so am I, and mine (through correlated decision functions) still win both in the Matrix and non-Matrix cases. The third answer, as noted in III, is that Bayes’ rule and using logic often does imply something that’s effectively a lot like a leverage penalty in many cases, due to the nature of the propositions, although I think it’s quite reasonable to think that in general most people could in fact find ways to have quite oversize (and possibly not very bounded) impact if they put their minds to it. (Hey, I’m trying, guys.)
(Note to answer other comments I’ve seen: Yes, I can also get out of this via either risk aversion or bounded utility, or similar concepts, if I’m willing to use them. Granted. I’m exploring the case where my utility isn’t bounded, or it’s bounded stupidly large, partly because I think it shouldn’t be very bounded, and partly because it’s important for AGI cases.)
Why? If humans don’t have unbounded utility functions, then presumably we wouldn’t want our AIs to have unbounded utility functions either.
If I build an AGI and get to choose its utility function, I could choose to copy my own (or what mine would be under reflection, a personal CEV), and that’s far from the worst outcome, but as a group we have solutions that we prefer a lot more and that don’t prioritize me overly much, such as CEV. The CEV of an effectively unbounded group of potential agents (yes, laws of physics bounded, but I assume we both agree that’s not a bound small enough to matter here) is effectively unbounded even if each individual agent’s function is tightly bounded.
This should (I think) make intuitive sense; a group of people who individually want basic human-level stuff discover the best way to do that is to coordinate to build a great nation/legacy/civilization, and break the bound on what they care about. The jumps we’re considering don’t seem different in kind to that.
People who don’t exist until after an AGI is created don’t have much influence over how that AGI is designed, and I don’t see any need to make concessions to them (except for the fact that we care about their preferences being satisfied, of course, but that will be reflected in our utility functions).
If you already care about a legacy as a human and you do something to make an advanced computer system aligned with you. Then the advanced computer system should also care about legacy. I don’t see anything as being lost .
I find attempts to reify the legacy as anything more than a shared agreement between near peers deeply disturbing. Mainly because our understanding of nature, humans and reality are tentative and leaky abstractions, and will remain the same way for the foreseeable. Any reification of civilisation will need a method of revising it in some way.
So much will be lost if humans are no longer capable of being active participants in the revision of the greater system. Conversations like this would be pointless. I have more to say on this subject, I’ll try to write something in a bit