lackofcheese

Karma: 343

lackofcheese Oct 25, 2014, 12:58 AM
0 points
in reply to: Manfred’s comment on: Anthropic decision theory for selfish agents
The deeper point is important, and I think you’re mistaken about the necessary and sufficient conditions for an isomorphism here.

If a human appears in a gnome’s cell, then that excludes the counterfactual world in which the human did not appear in the gnome’s cell. However, on UDT, the gnome’s decision does depend on the payoffs in that counterfactual world.

Thus, for the isomorphism argument to hold, the preferences of the human and gnome must align over counterfactual worlds as well as factual ones. It is not sufficient to have the same probabilities for payoffs given linked actions when you have to make a decision, you also have to have the same probabilities for payoffs given linked actions when you don’t have to make a decision.

lackofcheese Oct 24, 2014, 11:13 PM
4 points
in reply to: Karl’s comment on: Introducing Corrigibility (an FAI research subfield)
I think this means “indifference” isn’t really the right term any more, because the agent is not actually indifferent between the two sets of observations, and doesn’t really need to be.

So, how about U(a1, o, a2) =
UN(a1, o, a2) + max_b(US(a1, o, b)), if o is not in Press
US(a1, o, a2) + max_b(UN(a1, o, b)), if o is in Press

or, in your notation, U(a1, o, a2) = g(a1, o) + UN(a1, o, a2) if o is in Press, or US(a1, o, a2) + f(a1, o) if o is in Press.

lackofcheese Oct 24, 2014, 11:00 PM
2 points
in reply to: Karl’s comment on: Introducing Corrigibility (an FAI research subfield)
OK, you’re right on that point; I misunderstood the “managing the news” problem because I hadn’t quite realised that it was about shifting observations between the Press/NotPress sets. As you’ve said, the only resolution is to select a1 based on
E[max_b(UN(a1, O, b) | O; a1]
and not
E[max_b(UN(a1, O, b) | O not in Press; a1]

lackofcheese Oct 24, 2014, 6:10 PM
4 points
on: Introducing Corrigibility (an FAI research subfield)
OK, following on from Karl’s comment below, which seems to resolve Section 4.2, I think it shouldn’t be that difficult to resolve some or all of the Section 4.1 concerns as well. In order to satisfy Desideratum 4, we need the agent to be incentivised to build in safeguards to ensure that its possible future actions will be rated well according to US in cases where the shutdown button is pressed. The most obvious way to do this is to penalise situations where there is no good shutdown action according to US and reward situations where there is such an action, but this fails according to Desideratum 2, because we want the agent to remain indifferent as to whether the shutdown button is actually pressed.

However, there appears to be a simple solution; consider the “penalty term”
g(a1) = E[max_b(US(a1, O, b)) | O in Press ; a1],
which represents the expected value of US when the shutdown button is pressed. The first step in doing things properly is to add that term to observations not in Press, rather than subtracting it from observations in Press; this helps to ensure future corrigibility (and hence also corrigibility of subagents and successor agents).

Additionally, to resolve the Section 4.2 concerns we also want the agent to be indifferent to pressing its own shutdown button in cases where it gets unlucky, and the way we do that is, as per Karl’s post, by calculating the bonus term conditional on the actual observation, or on the set of observations in Press, rather than conditional on the set of possible observations not in Press, i.e. Karl’s f(a1, o) = max_b(UN(a1, o, b))

Finally, we will also need the additional terms
fp(a1) = E[max_b(UN(a1, O, b) | O in Press; a1] = E[f(a1, O) | O in Press; a1]
fnp(a1) = E[max_b(UN(a1, O, b)) | O not in Press; a1] = E[f(a1, O) | O not in Press; a1]
in order to make things work.

The utility function I propose is

U(a1, o, a2) = {
g(a1) + UN(a1, o, a2), for o not in Press.
US(a1, o, a2) + fnp(a1) + f(a1, o) - fp(a1), for o in Press.
}

Let’s verify each of the Desiderata in turn.
(1) is satisfied because the value of a2 conditional on any o in Press depends only upon US, not UN.
(2) and (3) are not satisfied :( because of the “managing the news” problem, as Karl points out below. That being said, we’ve ensured indifference to the shutdown button because the expected value for any action conditional on Press is equivalent to its expected value conditional on NotPress, i.e. E[U | NotPress ; a1] = E[U | Press; a1], which holds because of the following three equations:
g(a1) = E[max_b(US(a1, O, b)) | O in Press; a1],
fnp(a1) =E[max_b(UN(a1, O, b)) | O not in Press; a1], and
fp(a1) = E[f(a1, O) | O in Press; a1].

(4) holds because the g(a1) term is added to the utility of every action, and will be greater for actions that promote future corrigibility.
(5) holds because apart from the g(a1) term, which is required to satisfy (4), the agent is simply maximising
E[UN | NotPress; a1] over all possible a1, which is what we wanted.

In actual fact, when choosing a1 our agent is simply maximising E[UN | NotPress; a1] + E[US | Press; a1].

EDIT: I’ve edited this to reflect the errors pointed out by Karl.

lackofcheese Oct 24, 2014, 4:27 PM
0 points
in reply to: Lumifer’s comment on: Anthropic decision theory for selfish agents
I guess your comment means that you must have blinked an eye, so your comment can’t be completely true. That said, as discussions of pre-emptively submissive gnomes go, I would generally expect the amount of eye-blinking on LW to be well below average ^_~

lackofcheese Oct 24, 2014, 3:52 PM
2 points
in reply to: lackofcheese’s comment on: Anthropic decision theory for selfish agents
OK, time for further detail on the problem with pre-emptively submissive gnomes. Let’s focus on the case of total utilitarianism, and begin by looking at the decision in unlinked form, i.e. we assume that the gnome’s advice affects only one human if there is one in the room, and zero humans otherwise. Conditional on there being a human in cell B, the expected utility of the human in cell B buying a ticket for $x is, indeed, (1/3)(-x) + (2/3)(1-x) = ²⁄₃ - x, so the breakeven is obviously at x = ²⁄₃. However, if we also assume that the gnome in the other cell will give the same advice, we get (1/3)(-x) + 2(2/3)(1-x) = ⁴⁄₃ - (5/3)x, with breakeven at x=4/5. In actual fact, the gnome’s reasoning, and the ⁴⁄₅ answer, is correct. If tickets were being offered at a price of, say, 75 cents, then the overall outcome (conditional on there being a human in cell B) is indeed better if the humans buy at 75 cents than if they refuse to buy at 75 cents, because ³⁄₄ is less than ⁴⁄₅.

As I mentioned previously, in the case where the gnome only cares about total $ if there is a human in its cell, then ⁴⁄₅ is correct before conditioning on the presence of a human, and it’s also correct after conditioning on the presence of a human; the number is ⁴⁄₅ regardless. However, the situation we’re examining here is different, because the gnome cares about total $ even if no human is present. Thus we have a dilemma, because it appears that UDT is correct in advising the gnome to precommit to ²⁄₃, but the above argument also suggests that after seeing a human in its cell it is correct for the gnome to advise ⁴⁄₅.

The key distinction, analogously to mwenger’s answer to Psy-Kosh’s non-anthropic problem, has to do with the possibility of a gnome in an empty cell. For a total utilitarian gnome in an empty cell, any money at all spent in the other cell translates directly into negative utility. That gnome would prefer the human in the other cell to spend $0 at most, but of course there is no way to make this happen, since the other gnome has no way of knowing that this is the case.

The resolution to this problem is that, for linked decisions, you must (as UDT does) necessarily consider the effects of that decision over all a priori possible worlds affected by that decision. As it happens, this is the same thing as what you would do if you had the opportunity to precommit in advance.

It’s a bit trickier to justify why this should be the case, but the best argument I can come up with is to apply that same “linked decision” reasoning at one meta-level up, the level of “linked decision theories”. In short, by adopting a decision theory that ignores linked decisions in a priori possible worlds that are excluded by your observations, you are licensing yourself and other agents to do the same thing in future decisions, which you don’t want. If other agents follow this reasoning, they will give the “yea” answer in Psy-Kosh’s non-anthropic problem, but you don’t want them to do that.

Note that most of the time, decisions in worlds excluded by your observations do not usually tend to be “linked”. This is because exclusion by observation would usually imply that you receive a different observation in the other possible world, thus allowing you to condition your decision on that observation, and thereby unlinking the decisions. However, some rare problems like the Counterfactual Mugging and Psy-Kosh’s non-anthropic problem violate this tendency, and should therefore be treated differently.

Overall, then, the “linked decision theory” argument supports adopting UDT, and it means that you should consider all linked decisions in all a priori possible worlds.

lackofcheese Oct 24, 2014, 2:38 PM
3 points
in reply to: Stuart_Armstrong’s comment on: Anthropic decision theory for selfish agents
Yep, I think that’s a good summary. UDT-like reasoning depends on the utility values of counterfactual worlds, not just real ones.

lackofcheese Oct 24, 2014, 11:16 AM
3 points
in reply to: Stuart_Armstrong’s comment on: Anthropic decision theory for selfish agents
I don’t think that works, because 1) isn’t actually satisfied. The selfish human in cell B is indifferent over worlds where that same human doesn’t exist, but the gnome is not indifferent.

Consequently, I think that as one of the humans in your “closest human” case you shouldn’t follow the gnome’s advice, because the gnome’s recommendation is being influenced by a priori possible worlds that you don’t care about at all. This is the same reason a human with utility function T shouldn’t follow the gnome recommendation of ⁴⁄₅ from a gnome with utility function IT. Even though these recommendations are correct for the gnomes, they aren’t correct for the humans.

As for the “same reasons” comment, I think that doesn’t hold up either. The decisions in all of the cases are linked decisions, even in the simple case of U = S above. The difference in the S case is simply that the linked nature of the decision turns out to be irrelevant, because the other gnome’s decision has no effect on the first gnome’s utility. I would argue that the gnomes in all of the cases we’ve put forth have always had the “same reasons” in the sense that they’ve always been using the same decision algorithm, albeit with different utility functions.

lackofcheese Oct 24, 2014, 2:40 AM
4 points
in reply to: lackofcheese’s comment on: Anthropic decision theory for selfish agents
Having established the nature of the different utility functions, it’s pretty simple to show how the gnomes relate to these. The first key point to make, though, is that there are actually two distinct types of submissive gnomes and it’s important not to confuse the two. This is part of the reason for the confusion over Beluga’s post.
Submissive gnome: I adopt the utility function of any human in my cell, but am completely indifferent otherwise.
Pre-emptively submissive gnome: I adopt the utility function of any human in my cell; if there is no human in my cell I adopt the utility function they would have had if they were here.

The two are different precisely in the key case that Stuart mentioned—the case where there is no human at all in the gnome’s cell. Fortunately, the utility function of the human who will be in the gnome’s cell (which we’ll call “cell B”) is entirely well-defined, because any existing human in the same cell will always end up with the same utility function. The “would have had” case for the pre-emptively submissive gnomes is a little stranger, but it still makes sense—the gnome’s utility would correspond to the anti-indexical component JU of the human’s utility function U (which, for selfish humans, is just zero). Thus we can actually remove all of the dangling references in the gnome’s utility function, as per the discussion between Stuart and Beluga. If U is the utility function the human in cell B has (or would have), then the submissive gnome’s utility function is IU (note the indexicalisation!) whereas the pre-emptively submissive gnome’s utility function is simply U.

Following Beluga’s post here, we can use these ideas to translate all of the various utility functions to make them completely objective and observer-independent, although some of them reference cell B specifically. If we refer to the second cell as “cell C”, swapping between the two gnomes is equivalent to swapping B and C. For further simplification, we use $(B) to refer to the number of dollars in cell B, and o(B) as an indicator function for whether the cell has a human in it. The simplified utility functions are thus
T = $B + $C
A = ($B + $C) / (o(B) + o(C))
S = IS = $B
IT = o(B) ($B + $C)
IA = o(B) ($B + $C) / (o(B) + o(C))
Z = - $C
H = $B - $C
IH = o(B) ($B - $C)
Note that T and A are the only functions that are invariant under swapping B and C.

This invariance means that, for both cases involving utilitarian humans and pre-emptively submissive gnomes, all of the gnomes (including the one in an empty cell) and all of the humans have the same utility function over all possible worlds. Moreover, all of the decisions are obviously linked, and so there is effectively only one decision. Consequently, it’s quite trivial to solve with UDT. Total utilitarianism gives
E[T] = 0.5(-x) + 2*0.5(1-x) = 1-1.5x
with breakeven at x = ²⁄₃, and average utilitarianism gives
E[A] = 0.5(-x) + 0.5(1-x) = 0.5-x
with breakeven at x = ¹⁄₂.

In the selfish case, the gnome ends up with the same utility function whether it’s pre-emptive or not, because IS = S. Also, there is no need to worry about decision linkage, and hence the decision problem is a trivial one. From the gnome’s point of view, ¹⁄₄ of the time there will be no human in the cell, ¹⁄₂ of time there will be a human in the cell and the coin will have come up tails, and ¹⁄₄ of the time there will be a human in the cell and the coin will have come up heads. Thus
E[S] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
and the breakeven point is x = ²⁄₃, as with the total utilitarian case.

In all of these cases so far, I think the humans quite clearly should follow the advice of the gnomes, because
1) Their utility functions coincide exactly over all a priori possible worlds.
2) The humans do not have any extra information that the gnomes do not.

Now, finally, let’s go over the reasoning that leads to the so-called “incorrect” answers of ⁴⁄₅ and ²⁄₃ for total and average utilitarianism. We assume, as before, that the decisions are linked. As per Beluga’s post, the argument goes like this:

With probability ²⁄₃, the coin has shown tails. For an average utilitarian, the expected utility after paying x$ for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5$, respectively.

So, what’s the problem with this argument? In actual fact, for a submissive gnome, that advice is correct, but the human should not follow it. The problem is that a submissive gnome’s utility function doesn’t coincide with the utility function of the human over all possible worlds, because IT != T and IA != A. The key difference between the two cases is the gnome in the empty cell. If it’s a submissive gnome, then it’s completely indifferent to the plight of the humans; if it’s a pre-emptively submissive gnome then it still cares.

If we were to do the full calculations for the submissive gnome, the gnome’s utility function is IT for total utilitarian humans and IA for average utilitariam humans; since IIT = IT and IIA = IA the calculations are the same if the humans have indexical utility functions. For IT we get
E[IT] = 0.25(0) + 0.25(-x) + 2*0.5(1-x) = 1-1.25x
with breakeven at x = ⁴⁄₅, and for IA we get
E[IA] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
with breakeven at x = ²⁄₃. Thus the submissive gnome’s ²⁄₃ and ⁴⁄₅ numbers are correct for the gnome, and indeed if the human’s total/average utilitarianism is indexical they should just follow the advice, because their utility function would then be identical to the gnome’s.

So, if this advice is correct for the submissive gnome, why should the pre-emptive submissive gnome’s advice be different? After all, after conditioning on the presence of a human in the cell the two utility functions are the same. This particular issue is indeed exactly analogous to the mistaken “yea” answer in Psy-Kosh’s non-anthropic problem. Although I side with UDT and/or the precommitment-based reasoning, I think that question warrants further discussion, so I’ll leave that for a third comment.
What links here?
- “Solving” selfishness for UDT by Stuart_Armstrong (Oct 27, 2014, 5:51 PM; 39 points)
- lackofcheese's comment on Anthropic decision theory for selfish agents by Beluga (Oct 25, 2014, 3:53 AM; 2 points)

lackofcheese Oct 24, 2014, 12:19 AM
4 points
on: Anthropic decision theory for selfish agents
I think I can resolve the confusion here, but as a quick summary, I’m quite sure Beluga’s argument holds up. The first step is to give a clear statement of what the difference is between the indexical and non-indexical versions of the utility functions. This is important because the UDT approach translates to “What is the optimal setting for decision variable X, in order to maximise the expected utility over all a priori possible worlds that are influenced by decision variable X?” On the basis of UDT or UDT-like principles such as an assumption of linked decisions, it thus follows that two utility functions are equivalent for this purpose if and only if they are equivalent over all possible worlds in which the outcomes are dependent upon X.

Now, as the first step in resolving these issues I think it’s best to go over all of the relevant utility functions for this problem. First, let’s begin with the three core non-indexical cases (or “lexicality-independent” cases, although I’m not sure of the term):
Indifference (0): I don’t care at all about anything (i.e. a constant function).
Total utilitarian (T): I care linearly in the sum total dollars owned by humans in all possible worlds.
Average utilitarian (A): I care linearly in the average dollars owned by humans in all possible worlds.
There’s also one essential operator we can apply to these functions:
Negation (-): -F = my preferences are the exact inverse of F.
e.g. -T would mean that you want humans to lose as many total dollars as possible.

Now for indexical considerations, the basic utility function is
Selfish (S): I care linearly in the amount of dollars that I own.
Notably, as applied to worlds where you don’t exist, selfishness is equivalent to indifference. With this in mind, it’s useful to introduce two indexical operators; first there’s
Indexicalization (I): IF(w) = F(w) if you exist in world w, and 0 if you do not exist in world w.
Of course, it’s pretty clear that IS=I, since S was already indifferent to worlds where you don’t exist. Similarly, we can also introduce
Anti-indexicalization (J): JF(w) = 0 if you exist in world w, and F(w) if you do not exist in world w.

It’s important to note that if you can influence the probability of yourself existing the constant value of the constant function becomes important, so these indexical operators are actually ill-conditioned in the general case. In this case, though, you don’t affect the probability of your own existence, and so we may as well pick the constant to be zero. Also, since our utility functions are all enumerated in dollars we can also reasonably talk about making linear combinations of them, and so we can add, subtract, and multiply by constants. In general this wouldn’t make sense but it’s a useful trick here. With this in mind, we also have the identity IF + JF = F.

Now we already have all we need to define the other utility functions discussed here. Indexical total utilitarianism is simply IT, which translates into English as “I care about the total dollars owned by humans, but only if I exist; otherwise I’m indifferent.”

As for “hatred”, it’s important to note that there are several different kinds. First of all, there is “anti-selflessness”, which I represent via Z = S—T; this translates to “I don’t care about myself, but I want people who aren’t me to lose as many dollars as possible, whether or not I exist”. Then there’s the kind of hatred proposed below, where you still care about your own money as well; that one still comes in two different kinds. There is plain “selfish hatred” H = 2S—T, and then there’s its indexical version IH = I(2S—T) = 2S—IT, which translates to “In worlds in which I exist, I want to get as much money as possible and for other people to have as little money as possible”. The latter is probably best referred to as “jealousy” rather than hatred. From these definitions, two identities of selfishness as mixes of total utilitarianism and hatred follow pretty clearly, as S = 0.5(H+T) = 0.5(IH+IT).

Next comment: submissive gnomes, and the correct answers.

EDIT: Apparently the definitions of “hater” used in the other comments assume that haters still care about their own money, so I’ve updated my definitions.

lackofcheese Oct 23, 2014, 11:11 PM
1 point
in reply to: Stuart_Armstrong’s comment on: Anthropic decision theory for selfish agents
There’s some confusion here that needs to be resolved, and you’ve correctly pinpointed that the issue is with the indexical versions of the utility functions, or, equivalently, the gnomes who don’t see a human at all.

I think I have a comprehensive answer to these issues, so I’m going to type it up now.

lackofcheese Oct 21, 2014, 6:42 AM
1 point
in reply to: hyporational’s comment on: On Caring

A good point. By abuse I wouldn’t necessarily mean anything blatant though, just that selfish people are happy to receive resources from selfless people.

Sure, and there isn’t really anything wrong with that as long as the person receiving the resources really needs them.

Valuing people equally by default when their instrumental value isn’t considered. I hope I didn’t misunderstand you. That’s about as extreme it gets but I suppose you could get even more extreme by valuing other people more highly than yourself.

The term “altruism” is often used to refer to the latter, so the clarification is necessary; I definitely don’t agree with that extreme.

In any case, it may not be reasonable to expect people (or yourself) to hold to that valuation, or to act in complete recognition of what that valuation implies even if they do, but it seems like the right standard to aim for. If you are likely biased against valuing distant strangers as much as you ought to, then it makes sense to correct for it.

lackofcheese Oct 21, 2014, 5:19 AM
2 points
in reply to: hyporational’s comment on: On Caring
That’s one way to put it, yes.

lackofcheese Oct 21, 2014, 4:30 AM
3 points
in reply to: hyporational’s comment on: On Caring

One can reasonably argue the other way too. New children are easier to make than new adults.

True. However, regardless of the relative value of children and adults, it is clear that one ought to devote significantly more time and effort to children than to adults, because they are incapable of supporting themselves and are necessarily in need of help from the rest of society.

Since she has finite resources, is there a practical difference?

Earlier I specifically drew a distinction between devoting time and effort and valuation; you don’t have to value your own children more to devote yourself to them and not to other peoples’ children.

That said, there are some practical differences. First of all, it may be better not to have children if you could do more to help other peoples’ children. Secondly, if you do have children and still have spare resources over and above what it takes to properly care for them, then you should consider where those spare resources could be spent most effectively.

It seems to me extreme altruism is so easily abused that it will inevitably wipe itself out in the evolution of moral systems.

If an extreme altruist recognises that taking such an extreme position would lead overall to less altruism in the future, and thus worse overall consequences, surely the right thing to do is stand up to that abuse. Besides, what exactly do you mean by “extreme altruism”?

lackofcheese Oct 21, 2014, 4:07 AM
0 points
in reply to: hyporational’s comment on: One Life Against the World
If you have the values already and you don’t have any reason to believe the values themselves could be problematic, does it matter how you got them?

It may be that an altruistic high in the past has led you to value altruism in the present, but what matters in the present is whether you value the altruism itself over and above the high.

lackofcheese Oct 21, 2014, 3:47 AM
0 points
in reply to: Lumifer’s comment on: On Caring
Accounting for possible failure modes and the potential effects of those failure modes is a crucial part of any correctly done “morality math”.

Granted, people can’t really be relied upon to actually do it right, and it may not be a good idea to “shut up and multiply” if you can expect to get it wrong… but then failing to shut up and multiply can also have significant consequences. The worst thing you can do with morality math is to only use it when it seems convenient to you, and ignore it otherwise.

However, none of this talk of failure modes represents a solid counterargument to Singer’s main point. I agree with you that there is no strict moral equivalence to killing a child, but I don’t think it matters. The point still holds that by buying luxury goods you bear moral responsibility for failing to save children who you could (and should) have saved.

lackofcheese Oct 21, 2014, 3:09 AM
2 points
in reply to: Lumifer’s comment on: On Caring
Probably not just any random person, because one can reasonably argue that children should be valued more highly than adults.

However, I do think that the mother should hold other peoples’ children as being of equal value to her own. That doesn’t mean valuing her own children less, it means valuing everyone else’s more.

Sure, it’s not very realistic to expect this of people, but that doesn’t mean they shouldn’t try.

lackofcheese Oct 20, 2014, 7:24 PM
0 points
in reply to: Lumifer’s comment on: On Caring
So, either there is such a thing as the “objective” value and hence, implicitly, you should seek to approach that value, or there is not.

I don’t see any reason to believe in an objective worth of this kind, but I don’t really think it matters that much. If these is no single underlying value, then the act of assigning your own personal values to people is still the same thing as “passing judgement on the worth of humans”, because it’s the only thing those words could refer to; you can’t avoid the issue simply by calling it a subjective matter.

In my view, regardless of whether the value in question is “subjective” or “objective”, I don’t think it should be determined by the mere circumstance of whether I happened to meet that person or not.

lackofcheese Oct 20, 2014, 6:59 PM
3 points
in reply to: Jiro’s comment on: On Caring
My actions alone don’t necessarily imply a valuation, or at least not one that makes any sense.

There are a few different levels at which one can talk about what it means to value something, and revealed preference is not the only one that makes sense.

lackofcheese Oct 20, 2014, 5:30 PM
0 points
in reply to: Lumifer’s comment on: On Caring
I’m not entirely sure what a “personal perception of the value of a human being” is, as distinct from the value or worth of a human being. Surely the latter is what the former is about?

Granted, I guess you could simply be talking about their instrumental value to yourself (e.g. “they make me happy”), but I don’t think that’s really the main thrust of what “caring” is.