Brian_Tomasik

Karma: 367

Brian_Tomasik Aug 18, 2018, 3:18 PM
1 point
in reply to: philh’s comment on: Preliminary thoughts on moral weight
The naive form of the argument is the same between the classic and moral-uncertainty two-envelopes problems, but yes, while there is a resolution to the classic version based on taking expected values of absolute rather than relative measurements, there’s no similar resolution for the moral-uncertainty version, where there are no unique absolute measurements.

Brian_Tomasik Aug 15, 2018, 12:16 PM
10 points
in reply to: Jacy Reese Anthis’s comment on: Preliminary thoughts on moral weight
I think the moral-uncertainty version of the problem is fatal unless you make further assumptions about how to resolve it, such as by fixing some arbitrary intertheoretic-comparison weights (which seems to be what you’re suggesting) or using the parliamentary model.

Brian_Tomasik Oct 3, 2017, 3:51 AM
1 point
in reply to: AABoyles’s comment on: Is life worth living?
Currently I don’t care much about strongly positive events, so at this point I’d say no. In the throes of such a positive event I might change my mind. :)

Brian_Tomasik Oct 3, 2017, 3:49 AM
1 point
in reply to: AABoyles’s comment on: Is life worth living?
Yes, because I don’t see any significant selfish upside to life, only possible downside in cases of torture/etc. Life is often fun, but I don’t strongly care about experiencing it.

Brian_Tomasik Oct 3, 2017, 3:46 AM
0 points
in reply to: Lumifer’s comment on: Is life worth living?
Yeah, but it would be very bad relative to my altruistic goals if I died any time soon. The thought experiment in the OP ignores altruistic considerations.

Brian_Tomasik Sep 23, 2017, 5:11 AM
1 point
on: Naturalized induction – a challenge for evidential and causal decision theory

However, if you believe that the agent in world 2 is not an instantiation of you, then naturalized induction concludes that world 2 isn’t actual and so pressing the button is safe.

By “isn’t actual” do you just mean that the agent isn’t in world 2? World 2 might still exist, though?

Brian_Tomasik Sep 3, 2017, 6:35 AM
1 point
on: Is life worth living?
I assume the thought experiment ignores instrumental considerations like altruistic impact.

For re-living my actual life, I wouldn’t care that much either way, because most of my experiences haven’t been extremely good or extremely bad. However, if there was randomness, such that I had some probability of, e.g., being tortured by a serial killer, then I would certainly choose not to repeat life.

Brian_Tomasik Jun 20, 2017, 10:06 PM
5 points
in reply to: Lumifer’s comment on: S-risks: Why they are the worst existential risks, and how to prevent them
Is it still a facepalm given the rest of the sentence? “So, s-risks are roughly as severe as factory farming, but with an even larger scope.” The word “severe” is being used in a technical sense (discussed a few paragraphs earlier) to mean something like “per individual badness” without considering scope.

Brian_Tomasik Jun 20, 2017, 10:04 PM
2 points
in reply to: username2’s comment on: S-risks: Why they are the worst existential risks, and how to prevent them
Thanks for the feedback! The first sentence below the title slide says: “I’ll talk about risks of severe suffering in the far future, or s-risks.” Was this an insufficient definition for you? Would you recommend a different definition?

Brian_Tomasik Aug 14, 2015, 8:28 AM
1 point
in reply to: Stuart_Armstrong’s comment on: False thermodynamic miracles
I guess you mean that the AGI would care about worlds where the explosives won’t detonate even if the AGI does nothing to stop the person from pressing the detonation button. If the AGI only cared about worlds where the bomb didn’t detonate for any reason, it would try hard to stop the button from being pushed.

But to make the AGI care about only worlds where the bomb doesn’t go off even if it does nothing to avert the explosion, we have to define what it means for the AGI to “try to avert the explosion” vs. just doing ordinary actions. That gets pretty tricky pretty quickly.

Anyway, you’ve convinced me that these scenarios are at least interesting. I just want to point out that they may not be as straightforward as they seem once it comes time to implement them.

Brian_Tomasik Aug 12, 2015, 11:07 PM
3 points
in reply to: Stuart_Armstrong’s comment on: False thermodynamic miracles
Fair enough. I just meant that this setup requires building an AGI with a particular utility function that behaves as expected and building extra machinery around it, which could be more complicated than just building an AGI with the utility function you wanted. On the other hand, maybe it’s easier to build an AGI that only cares about worlds where one particular bitstring shows up than to build a friendly AGI in general.

Brian_Tomasik Aug 12, 2015, 12:43 AM
0 points
on: False thermodynamic miracles
I’m nervous about designing elaborate mechanisms to trick an AGI, since if we can’t even correctly implement an ordinary friendly AGI without bugs and mistakes, it seems even less likely we’d implement the weird/clever AGI setups without bugs and mistakes. I would tend to focus on just getting the AGI to behave properly from the start, without need for clever tricks, though I suppose that limited exploration into more fanciful scenarios might yield insight.

Brian_Tomasik Aug 11, 2015, 10:25 PM
8 points
on: Satisficers want to become maximisers
As I understand it, your satisficing agent has essentially the utility function min(E[paperclips], 9). This means it would be fine with a 10^-100 chance of producing 10^101 paperclips. But isn’t it more intuitive to think of a satisficer as optimizing the utility function E[min(paperclips, 9)]? In this case, the satisficer would reject the 10^-100 gamble described above, in favor of just producing 9 paperclips (whereas a maximizer would still take the gamble and hence would be a poor replacement for the satisficer).

A satisficer might not want to take over the world, since doing that would arouse opposition and possibly lead to its defeat. Instead, the satisficer might prefer to request very modest demands that are more likely to be satisfied (whether by humans or by an ascending uncontrolled AI who wants to mollify possible opponents).

Brian_Tomasik Jul 1, 2015, 9:52 PM
0 points
in reply to: Caspar Oesterheld’s comment on: Two-boxing, smoking and chewing gum in Medical Newcomb problems
If there were a perfect correlation between choosing to one-box and having the one-box gene (i.e., everyone who one-boxes has the one-box gene, and everyone who two-boxes has the two-box gene, in all possible circumstances), then it’s obvious that you should one-box, since that implies you must win more. This would be similar to the original Newcomb problem, where Omega also perfectly predicts your choice. Unfortunately, if you really will follow the dictates of your genes under all possible circumstances, then telling someone what she should do is useless, since she will do what her genes dictate.

The more interesting and difficult case is when the correlation between gene and choice isn’t perfect.

Brian_Tomasik Jul 1, 2015, 9:51 PM
0 points
in reply to: Caspar Oesterheld’s comment on: Two-boxing, smoking and chewing gum in Medical Newcomb problems
(moved comment)

Brian_Tomasik Jun 29, 2015, 10:15 PM
1 point
on: Two-boxing, smoking and chewing gum in Medical Newcomb problems
I assume that the one-boxing gene makes a person generically more likely to favor the one-boxing solution to Newcomb. But what about when people learn about the setup of this particular problem? Does the correlation between having the one-boxing gene and inclining toward one-boxing still hold? Are people who one-box only because of EDT (even though they would have two-boxed before considering decision theory) still more likely to have the one-boxing gene? If so, then I’d be more inclined to force myself to one-box. If not, then I’d say that the apparent correlation between choosing one-boxing and winning breaks down when the one-boxing is forced. (Note: I haven’t thought a lot about this and am still fairly confused on this topic.)

I’m reminded of the problem of reference-class forecasting and trying to determine which reference class (all one-boxers? or only grudging one-boxers who decided to one-box because of EDT?) to apply for making probability judgments. In the limit where the reference class consists of molecule-for-molecule copies of yourself, you should obviously do what made the most of them win.

Brian_Tomasik Jun 14, 2015, 2:26 AM
7 points
on: Taking Occam Seriously
Paul’s site has been offline since 2013. Hopefully it will come back, but in the meanwhile, here are links to most of his pieces on Internet Archive.

Brian_Tomasik Mar 23, 2015, 8:46 PM
0 points
in reply to: Viliam_Bur’s comment on: Seeking Estimates for P(Hell)
Good point. Also, in most multiverse theories, the worst possible experience necessarily exists somewhere.

Brian_Tomasik Mar 22, 2015, 10:18 PM
3 points
in reply to: Baughn’s comment on: Seeking Estimates for P(Hell)
From a practical perspective, accepting the papercut is the obvious choice because it’s good to be nice to other value systems.

Even if I’m only considering my own values, I give some intrinsic weight to what other people care about. (“NU” is just an approximation of my intrinsic values.) So I’d still accept the papercut.

I also don’t really care about mild suffering—mostly just torture-level suffering. If it were 7 billion really happy people plus 1 person tortured, that would be a much harder dilemma.

In practice, the ratio of expected heaven to expected hell in the future is much smaller than 7 billion to 1, so even if someone is just a “negative-leaning utilitarian” who cares orders of magnitude more about suffering than happiness, s/he’ll tend to act like a pure NU on any actual policy question.

Brian_Tomasik Mar 22, 2015, 12:03 AM
12 points
on: Seeking Estimates for P(Hell)
Short answer:

Donate to MIRI, or split between MIRI and GiveWell charities if you want some fuzzies for short-term helping.

Long answer:

I’m a negative utilitarian (NU) and have been thinking since 2007 about the sign of MIRI for NUs. (Here’s some relevant discussion.) I give ~70% chance that MIRI’s impact is net good by NU lights and ~30% that it’s net bad, but given MIRI’s high impact, the expected value of MIRI is still very positive.

As far as your question: I’d put the probability of uncontrolled AI creating hells higher than 1 in 10,000 and the probability that MIRI as a whole prevents that from happening higher than 1 in 10,000,000. Say such hells used 10^-15 of the AI’s total computing resources. Assuming computing power to create ~10^30 humans for ~10^10 years, MIRI would prevent in expectation ~10^18 hell-years. Assuming MIRI’s total budget ever is $1 billion (too high), that’s ~10^9 hell-years prevented per dollar. Now apply rigorous discounts to account for priors against astronomical impacts and various other far-future-dampening effects. MIRI still seems very promising at the end of the calculation.