jessicat

Karma: 306

jessicat May 6, 2015, 3:29 AM
5 points
in reply to: [deleted]’s comment on: Debunking Fallacies in the Theory of AI Motivation
We can do something like list a bunch of examples, have humans label them, and then find the lowest Kolomogorov complexity concept that agrees with human judgments in, say, 90% of cases. I’m not sure if this is what you mean by “normatively correct”, but it seems like a plausible concept that multiple concept learning algorithms might converge on. I’m still not convinced that we can do this for many value-laden concepts we care about and end up with something matching CEV, partially due to complexity of value. Still, it’s probably worth systematically studying the extent to which this will give the right answers for non-value-laden concepts, and then see what can be done about value-laden concepts.

jessicat May 5, 2015, 10:24 PM
10 points
in reply to: [deleted]’s comment on: Debunking Fallacies in the Theory of AI Motivation
Thanks for your response.

The AI can quickly assess the “forcefulness” of any candidate action plan by asking itself whether the plan will involve giving choices to people vs. forcing them to do something whether they like it or not. If a plan is of the latter sort, more care is needed, so it will canvass a sample of people to see if their reactions are positive or negative.

So, I think this touches on the difficult part. As humans, we have a good idea of what “giving choices to people” vs. “forcing them to do something” looks like. This concept would need to resolve some edge cases, such as putting psychological manipulation in the “forceful” category (even though it can be done with only text). A sufficiently advanced AI’s concept space might contain a similar concept. But how do we pinpoint this concept in the AI’s concept space? Very likely, the concept space will be very complicated and difficult for humans to understand. It might very well contain concepts that look a lot like the “giving choices to people” vs. “forcing them to do something” distinction on multiple examples, but are different in important ways. We need to pinpoint it in order to make this concept part of the AI’s decision-making procedure.

It will also be able to model people (as it must be able to do, because all intelligent systems must be able to model the world pretty accurately or they don’t qualifiy as ‘intelligent’) so it will probably have a pretty shrewd idea already of whether people will react positively or negatively toward some intended action plan.

This seems pretty similar to Paul’s idea of a black-box human in the counterfactual loop. I think this is probably a good idea, but the two problems here are (1) setting up this (possibly counterfactual) interaction in a way that it approves a large class of good plans and rejects almost all bad plans (see the next section), and (2) having a good way to predict the outcome of this interaction usually without actually performing it. While we could say that (2) will be solved by virtue of the superintelligence being a superintelligence, in practice we’ll probably get AGI before we get uploads, so we’ll need some sort of semi-reliable way to predict humans without actually simulating them. Additionally, the AI might need to self-improve to be anywhere smart enough to consider this complex hypothetical, and so we’ll need some kind of low-impact self-improvement system. Again, I think this is probably a good idea, but there are quite a lot of issues with it, and we might need to do something different in practice. Paul has written about problems with black-box approaches based on predicting counterfactual humans here and here. I think it’s a good idea to develop both black-box solutions and white-box solutions, so we are not over-reliant on the assumptions involved in one or the other.

In all of that procedure I just described, why would the explanation of the plans to the people be problematic? People will ask questions about what the plans involve. If there is technical complexity, they will ask for clarification. If the plan is drastic there will be a world-wide debate, and some people who finds themselves unable to comprehend the plan will turn to more expert humans for advice.

What language will people’s questions about the plans be in? If it’s a natural language, then the AI must be able to translate its concept space into the human concept space, and we have to solve a FAI-complete problem to do this. If it’s a more technical language, then humans themselves must be able to look at the AI’s concept space and understand it. Whether this is possible very much depends on how transparent the AI’s concept space is. Something like deep learning is likely to produce concepts that are very difficult for humans to understand, while probabilistic programming might produce more transparent models. How easy it is to make transparent AGI (compared to opaque AGI) is an open question.

We should also definitely be wary of a decision rule of the form “find a plan that, if explained to humans, would cause humans to say they understand it”. Since people are easy to manipulate, raw optimization for this objective will produce psychologically manipulative plans that people will incorrectly approve of. There needs to be some way to separate “optimize for the plan being good” from “optimize for people thinking the plan is good when it is explained to them”, or else some way of ensuring that humans’ judgments about these plans are accurate.

Again, it’s quite plausible that the AI’s concept space will contain some kind of concept that distinguishes between these different types of optimization; however, humans will need to understand the AI’s concept space in order to pinpoint this concept so it can be integrated into the AI’s decision rule.

I should mention that I don’t think that these black-box approaches to AI control are necessarily doomed to failure; rather, I’m pointing out that there are lots of unresolved gaps in our knowledge of how they can be made to work, and it’s plausible that they are too difficult in practice.

jessicat May 5, 2015, 4:35 AM
18 points
on: Debunking Fallacies in the Theory of AI Motivation
Thanks for posting this; I appreciate reading different perspectives on AI value alignment, especially from AI researchers.

But, truthfully, it would not require a ghost-in-the-machine to reexamine the situation if there was some kind of gross inconsistency with what the humans intended: there could be some other part of its programming (let’s call it the checking code) that kicked in if there was any hint of a mismatch between what the AI planned to do and what the original programmers were now saying they intended. There is nothing difficult or intrinsically wrong with such a design.

If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans, then yes, this works. However, here is contained most of the problem. The AI will likely have a concept space that does not match a human’s concept space, so it will need to do some translation between the two spaces in order to produce something the programmers can understand. But, this requires (1) learning the human concept space and (2) translating the AI’s representation of the situation into the human’s concept space (as in ontological crises). This problem is FAI-complete: given a solution to this, we could learn the human’s concept of “good” and then find possible worlds that map to this “good” concept. See also Eliezer’s reply to Holden on tool AI.

It might not be necessary to solve the problem in full generality: perhaps we can create systems that plan well in limited domains while avoiding edge cases. But it is also quite difficult to do this without severe restrictions in the system’s generality.

The motivation and goal management (MGM) system would be expected to use the same kind of distributed, constraint relaxation mechanisms used in the thinking process (above), with the result that the overall motivation and values of the system would take into account a large degree of context, and there would be very much less of an emphasis on explicit, single-point-of-failure encoding of goals and motivation.

I’m curious how something like this works. My current model of “swarm relaxation” is something like a Markov random field. One of my main paradigms for thinking about AI is probabilistic programs, which are quite similar to Markov random fields (but more general). I know that weak constraint systems are quite useful for performing Bayesian inference in a way that takes context into account. With a bit of adaptation, it’s possible to define probabilistic programs that pick actions that lead to good outcomes (by adding a “my action” node and a weak constraint on other parts of the probabilistic model satisfying certain goals; this doesn’t exactly work because it leads to “wishful thinking”, but in principle it can be adapted). But, I don’t think this is really that different from defining a probabilistic world model, defining a utility function over it, and then taking actions that are more likely to lead to high expected utility. Given this, you probably have some other model in mind for how values can be integrated into a weak constraint system, and I’d like to read about it.

But if it is also programmed to utterly ignore that fallibility—for example, when it follows its compulsion to put everyone on a dopamine drip, even though this plan is clearly a result of a programming error—then we must ask the question: how can the machine be both superintelligent and able to ignore a gigantic inconsistency in its reasoning?

We need to make a model that the AI can use in which its goal system “might be wrong”. It needs a way to look at evidence and conclude that, due to it, some goal is more or less likely to be the correct one. This is highly nontrivial. The model needs to somehow connect “ought”s to “is”s in a probabilistic, possibly causal fashion. While, relative to a supergoal, subgoals can be reweighted based on new information using standard Bayesian utility maximization, I know of no standard the AI could use to revise its supergoal based on new information. If you have a solution to the corrigibility problem in mind, I’d like to hear it.

Another way of stating the problem is: if you revise a goal based on some evidence, then either you had some reason for doing this or not. If so, then this reason must be expressed relative to some higher goal, and we either never change this higher goal or (recursively) need to explain why we changed it. If not, then we need some other standard for choosing goals other than comparing them to a higher goal. I see no useful way of having a non-fixed supergoal.

if the AGI is going to throw a wobbly over the dopamine drip plan, what possible reason is there to believe that it did not do this on other occasions? Why would anyone suppose that this AGI ignored an inconvenient truth on only this one occasion?

I think the difference here is that, if only the supergoal is “wrong” but everything else about the system is highly optimized towards accomplishing the supergoal, then the system won’t stumble along the way, it will (by definition) do whatever accomplishes its supergoal well. So, “having the wrong supergoal” is quite different from most other reasoning errors in that it won’t actually prevent the AI from taking over the world.

Knowing about the logical train wreck in its design, the AGI is likely to come to the conclusion that the best thing to do is seek a compromise and modify its design so as to neutralize the Doctrine of Logical Infallibility. The best way to do this is to seek a new design that takes into account as much context—as many constraints—as possible.

It seems like you’re equating logical infallibility about facts (including facts about the world and mathematical facts) with logical infallibility about values. Of course any practical system will need to deal with uncertainty about the world and logic, probably using something like a weak constraint system. But it’s totally possible to create a system that has this sort of uncertainty without any uncertainty about its supergoal.

When you use the phrase “the best way to do this”, you are implicitly referring to some goal that weak constraint systems satisfy better than fixed-supergoal systems, but what sort of goal are we talking about here? If the original system had a fixed supergoal, then this will be exactly that fixed goal, so we’ll end up with a mishmash of the original goal and a weak constraint system that reconfigures the universe to satisfy the original goal.

jessicat Apr 20, 2015, 8:02 PM
3 points
in reply to: [deleted]’s comment on: A quick sketch on how the Curry-Howard Isomorphism kinda appears to connect Algorithmic Information Theory with ordinal logics
So, you can compress a list of observations about which Turing machines halt by starting with a uniform prior over Chaitin’s omega. This can lead to quite a lot of compression: the information of whether the first n Turing machines halt consists of n bits, but only requires log(n) bits of Chaitin’s omega. If we saw whether more Turing machines halted, we would also uncover more bits of Chaitin’s omega. Is this the kind of thing you are thinking of?

I guess there’s another question of how any of this makes sense if the universe is computable. We can still use information about which Turing machines halt in part of our generative model for a computable universe, even though “x doesn’t halt” is never actually observed.

Perhaps you could make a statement like: Solomonoff induction wins on computable universes for the usual reason, and it doesn’t lose too many bits on uncomputable universes in some circumstances because it does at least as well as something that has a uniform prior over Chaitin’s omega.

jessicat Apr 20, 2015, 2:29 AM
6 points
on: A quick sketch on how the Curry-Howard Isomorphism kinda appears to connect Algorithmic Information Theory with ordinal logics
One part I’m not clear on is how the empirical knowledge works. The equivalent of “kilograms of mass” might be something like bits of Chaitin’s omega. If you have n bits of Chaitin’s omega, you can solve the halting problem for any Turing machine of length up to n. But, while you can get lower bounds on Chaitin’s omega by running Turing machines and seeing which halt, you can’t actually learn upper bounds on Chaitin’s omega except by observing uncomputable processes (for example, a halting oracle confirming that some Turing machine doesn’t halt). So unless your empirical knowledge is coming from an uncomputable source, you shouldn’t expect to gain any more bits of Chaitin’s omega.

In general, if we could recursively enumerate all non-halting Turing machines, then we could decide whether M halts by running M in parallel with a process that enumerates non-halting machines until finding M. If M halts, then we eventually find that it halts; if it doesn’t halt, then we eventually find that it doesn’t halt. So this recursive enumeration will give us an algorithm for the halting problem. I’m trying to understand how the things you’re saying could give us more powerful theories from empirical data without allowing us to recursively enumerate all non-halting Turing machines.

jessicat Apr 16, 2015, 5:14 AM
9 points
on: Why isn’t the following decision theory optimal?
There’s one scenario described in this paper on which this decision theory gives in to blackmail:

The Retro Blackmail problem. There is a wealthy intelligent system and an honest AI researcher with access to the agent’s original source code. The researcher may deploy a virus that will cause $150 million each in damages to both the AI system and the researcher, and which may only be deactivated if the agent pays the researcher $100 million. The researcher is risk-averse and only deploys the virus upon becoming confident that the agent will pay up. The agent knows the situation and has an opportunity to self-modify after the researcher acquires its original source code but before the researcher decides whether or not to deploy the virus. (The researcher knows this, and has to factor this into their prediction.)

jessicat Apr 7, 2015, 10:17 PM
0 points
in reply to: [deleted]’s comment on: Second-Order Logic: The Controversy
It’s possible to compute whether each machine halts using an inductive Turing machine like this:
```
initialize output tape to all zeros, representing the assertion that no Turing machine halts
for i = 1 to infinity
. for j = 1 to i
. .       run Turing machine j for i steps
. .       if it halts: set bit j in the output tape to 1
```
Is this what you meant? If so, I’m not sure what this has to do with observing loops.

When you say that every nonhalting Turing machine has some kind of loop, do you mean the kind of loop that many halting Turing machines also contain?
What links here?
- A quick sketch on how the Curry-Howard Isomorphism kinda appears to connect Algorithmic Information Theory with ordinal logics by [deleted] (Apr 19, 2015, 7:35 PM; 26 points)

jessicat Mar 23, 2015, 6:31 AM
6 points
in reply to: Mark_Friedenbach’s comment on: New forum for MIRI research: Intelligent Agent Foundations Forum
Thanks for the response. I should note that we don’t seem to disagree on the fact that a significant portion of AI safety research should be informed by practical considerations, including current algorithms. I’m currently getting a masters degree in AI while doing work for MIRI, and a substantial portion of my work at MIRI is informed by my experience with more practical systems (including machine learning and probabilistic programming). The disagreement is more that you think that unbounded solutions are almost entirely useless, while I think they are quite useful.

Rather we are faced with a dizzying array of special purpose intelligences which in no way resemble general models like AIXI, and the first superintelligences are likely to be some hodge-podge integration of multiple techniques.

My intuition is that if you are saying that these techniques (or a hodgepodge of them) work, you are referring to some kind of criteria that they perform well on in different situations (e.g. ability to do supervised learning). Sometimes, we can prove that the algorithms perform well (as in statistical learning theory); other times, we can guess that they will perform on future data based on how they perform on past data (while being wary of context shifts). We can try to find ways of turning things that satisfy these criteria into components in a Friendly AI (or a safe utility satisficer etc.), without knowing exactly how these criteria are satisfied.

Like, this seems similar to other ways of separating interface from implementation. We can define a machine learning algorithm without paying too much attention to what programming language it is programmed in, or how exactly the code gets compiled. We might even start from pure probability theory and then add independence assumptions when they increase performance. Some of the abstractions are leaky (for example, we might optimize our machine learning algorithm for good cache performance), but we don’t need to get bogged down in the details most of the time. We shouldn’t completely ignore the hardware, but we can still usefully abstract it.

What does that mean in terms of a MIRI research agenda? Revisit boxing. Evaluate experimental setups that allow for a presumed-unfriendly machine intelligence but nevertheless has incentive structures or physical limitations which prevent it from going haywire. Devise traps, boxes, and tests for classifying how dangerous a machine intelligence is, and containment protocols. Develop categories of intelligences which lack foundation social skills critical to manipulating its operators. Etc. Etc.

I think this stuff is probably useful. Stuart Armstrong is working on some of these problems on the forum. I have thought about the “create a safe genie, use it to prevent existential risks, and have human researchers think about the full FAI problem over a long period of time” route, and I find it appealing sometimes. But there are quite a lot of theoretical issues in creating a safe genie!

jessicat Mar 22, 2015, 6:26 PM
11 points
in reply to: Mark_Friedenbach’s comment on: New forum for MIRI research: Intelligent Agent Foundations Forum

Learning how to create even a simple recommendation engine whose output is constrained by the values of its creators would be a large step forward and would help society today.

I think something showing how to do value learning on a small scale like this would be on topic. It might help to expose the advantages and disadvantages of algorithms like inverse reinforcement learning.

I also agree that, if there are more practical applications of AI safety ideas, this will increase interest and resources devoted to AI safety. I don’t really see those applications yet, but I will look out for them. Thanks for bringing this to my attention.

it is demonstrably not the case in history that the fastest way to develop a solution is to ignore all practicalities and work from theory backwards

I don’t have a great understanding of the history of engineering, but I get the impression that working from the theory backwards can often be helpful. For example, Turing developed the basics of computer science before sufficiently general computers existed.

My current impression is that solving FAI with a hypercomputer is a fundamentally easier problem that solving it with a bounded computer, and it’s hard to say much about the second problem if we haven’t made steps towards solving the first one. On the other hand, I do think that concepts developed in the AI field (such as statistical learning theory) can be helpful even for creating unbounded solutions.

AIXI showed that all the complexity of AGI lies in the practicalities, because the pure uncomputable theory is dead simple but utterly divorced from practice.

I would really like it if the pure uncomputable theory of Friendly AI were dead simple!

Anyway, AIXI has been used to develop more practical algorithms. I definitely approach many FAI problems with the mindset that we’re going to eventually need to scale this down, and this makes issues like logical uncertainty a lot more difficult. In fact, Paul Christiano has written about tractable logical uncertainty algorithms, which is a form of “scaling down an intractable theory”. But it helped to have the theory in the first place before developing this.

an ignore-all-practicalities theory-first approach is useless until it nears completion

Solutions that seem to work for practical systems might fail for superintelligence. For example, perhaps induction can yield acceptable practical solutions for weak AIs, but does not necessarily translate to new contexts that a superintelligence might find itself in (where it has to make pivotal decisions without training data for these types of decisions). But I do think working on these is still useful.

My current trajectory places the first AGI at 10 to 15 years out, and the first self-improving superintelligence shortly thereafter. Will MIRI have practical results in that time frame?

I consider AGI in the next 10-15 years fairly unlikely, but it might be worth having FAI half-solutions by then, just in case. Unfortunately I don’t really know a good way to make half-solutions. I would like to hear if you have a plan for making these.

jessicat Mar 22, 2015, 2:40 AM
10 points
in reply to: Mark_Friedenbach’s comment on: New forum for MIRI research: Intelligent Agent Foundations Forum
I think a post saying something like “Deep learning architectures are/are not able to learn human values because of reasons X, Y, Z” would definitely be on topic. As an example of something like this, I wrote a post on the safety implications of statistical learning theory. However, an article about how deep learning algorithms are performing on standard machine learning tasks is not really on topic.

I share your sentiment that safety research is not totally separate from other AI research. But I think there is a lot to be done that does not rely on the details of how practical algorithms work. For example, we could first create a Friendly AI design that relies on Solomonoff induction, and then ask to what extent practical algorithms (like deep learning) can predict bits well enough to be substituted for Solomonoff induction in the design. The practical algorithms are more of a concern when we already have an solution that uses unbounded computing power and are trying to scale it down to something we can actually run.

jessicat Mar 17, 2015, 9:05 PM
5 points
on: Identity and quining in UDT
This is an interesting approach. The way I’m currently thinking of this is that you ask what agent a UDT would design, and then do what that agent does, and vary what type an agent is between the different designs. Is this correct?

Consider the anti-Newcomb problem with Omega’s simulation involving equation (2)

So is this equation (2) with P replaced with something else?

However, the computing power allocated for evaluation the logical expectation value in (2) might be sufficient to suspect P’s output might be an agent reasoning based on (2).

I don’t understand this sentence.

jessicat Feb 11, 2015, 7:34 PM
1 point
in reply to: Squark’s comment on: Anatomy of Multiversal Utility Functions: Tegmark Level IV
It still seems like this is very much affected by the measure you assign to different game of life universes, and that the measure strongly depends on f.

Suppose we want to set f to control the agent’s behavior, so that when it sees sensory data s, it takes silly action a(s), where a is a short function. To work this way, f will map game of life states in which the agent has seen s and should take action a(s) to binary strings that have greater measure, compared to game of life states in which the agent has seen s and should take some other action. I think this is almost always possible due to the agent’s partial information about the world: there is nearly always an infinite number of world states in which a(s) is a good idea, regardless of s. f has a compact description (not much longer than a), and it forces the agent’s behavior to be equal to a(s) (except in some unrealistic cases where the agent has very good information about the world).

jessicat Feb 9, 2015, 8:35 AM
3 points
in reply to: Squark’s comment on: Anatomy of Multiversal Utility Functions: Tegmark Level IV
Thanks for the additional explanation.

It is of similar magnitude to differences between using different universal Turing machines in the definition of the Solomonoff ensemble. These difference become negligible for agents that work with large amounts of evidence.

Hmm, I’m not sure that this is something that you can easily get evidence for or against? The 2^K factor in ordinary Solomonoff induction is usually considered fine because it can only cause you to make at most K errors. But here it’s applying to utilities, which you can’t get evidence for or against the same way you can for probabilities.

f is required to be bijective, so it cannot lose or create information. Therefore, regardless of f, some programs in the Solomonoff ensemble will produce gliders and others won’t.

Okay, I see how this is true. But we could design f so that it only creates gliders if the universe satisfies some silly property. It seems like this would lead us to only care about universes satisfying this silly property, so the silly property would end up being our utility function.

jessicat Feb 7, 2015, 7:57 PM
6 points
on: Anatomy of Multiversal Utility Functions: Tegmark Level IV
I think you’re essentially correct about the problem of creating a utility function that works across all different logically possible universes being important. This is kind of like what was explored in the ontological crisis paper. Also, I agree that we want to do something like find a human’s “native domain” and map it to the true reality in order to define utility functions over reality.

I think using something like Solomonoff induction to find multi-level explanations is a good idea, but I don’t think your specific formula works. It looks like it either doesn’t handle the multi-level nature of explanations of reality (with utility functions generally defined at the higher levels and physics at the lowest level), or it relies on one of:
1. f figuring out how to identify high-level objects (such as gliders) in physics (which may very well be a computer running the game of life in software). Then most of the work is in defining f.
2. Solomonoff induction finding the true multi-level explanation from which we can just pick out the information at the level we want. But, this doesn’t work because (a) Solomonoff induction will probably just find models of physics, not multi-level explanations, (b) even if it did (since we used something like the speed prior), we don’t have reason to believe that they’ll be the same multi-level explanations that humans use, (c) if we did something like only care about models that happen to contain game of life states in exactly the way we want (which is nontrivial given that some random noise could be plausibly viewed as a game of life history), we’d essentially be conditioning an a very weird event (that high-level information is directly part of physics and the game of life model you’re using is exactly correct with no exceptions including cosmic rays), which I think might cause problems.
It might turn out that problem 2 isn’t as much of a problem as I thought in some variant of this, so it’s probably still worth exploring.

My preferred approach (which I will probably write up more formally eventually) is to use a variant of Solomonoff induction that has access to a special procedure that simulates the domain we want (in this case, a program that simulates the game of life). Then we might expect predictors that actually use this program usefully to get shorter codes, so we can perform inference to find the predictor and then look at how the predictor uses the game of life simulator in order to detect games of life in the universe. There’s a problem in that there isn’t that much penalty for the model to roll its own simulator (especially if the simulation is slightly different from our model due to e.g. cosmic rays), so there are a couple tricks to give models an “incentive” for actually using this simulator. Namely, we can make this procedure cheaper to call (computationally) than a hand-rolled version, or we can provide information about the game of life state that can only get accessed by the model through our simulator. I should note that both of these tricks have serious flaws.

Some questions:

In other words, the “liberated” prefers for many cells to satisfy Game of Life rules and for many cells out of these to contain gliders.

It looks like it subtracts the total number of cells, so it prefers for there to be fewer total cells satisfying the game of life rules?

This is because replacing f with g is equivalent to adjusting probabilities by bounded factor. The bound is roughly 2^K where K is the Kolmogorov complexity of f . g^-1.

I take it this is because we’re using a Solomonoff prior over universe histories? I find this statement plausible but 2^K is a pretty large factor. Also, if we define f to be a completely unreasonable function (e.g. it arranges the universe in a way so that no gliders are detected, or it chooses to simulate a whole lot of gliders or not based on some silly property of the universe), then it seems like you have proven that your utility function can never be more than a factor of 2^K away from what you’d get with f.

jessicat Jan 6, 2015, 4:55 AM
4 points
in reply to: alienist’s comment on: Compartmentalizing: Effective Altruism and Abortion
So, you can kill a person, create a new person, and raise them to be about equivalent to the original person (on average; this makes a bit more sense if we do it many times so the distribution of people, life outcomes, etc is similar). I guess your question is, why don’t we do this (aside from the cost)? A few reasons come to mind:
1. It would contradict the person’s preferences to die more than it contradicts the non-existing people’s preferences to never exist.
2. It would cause emotional suffering to people who know the person.
3. If people knew that people were being killed in this way, they would justifiably be scared that they might be killed and work to prevent this.
4. Living in a society requires cooperating with other members of the society by obeying rules such as not killing people (even if you buy murder offsets, which is kind of like what this is). Defection (by murdering people) might temporarily satisfy your values better, but even if this is the case, the usual reasons not to defect in iterated prisoner’s dilemma apply here.
5. It would require overriding people’s moral heuristics against murder. This is a very strong moral heuristic, and it’s not clear that you can do this without causing serious negative consequences.
Anyway, I highly doubt that you are in favor of murder offsets, so you must have your own reasons for this. Perhaps you could look at which ones apply to fetuses and which ones don’t.

jessicat Jan 5, 2015, 3:26 AM
7 points
in reply to: alienist’s comment on: Compartmentalizing: Effective Altruism and Abortion
I said that fetuses are replaceable, not that all people are replaceable. OP didn’t argue that fetuses weren’t replaceable, just that they won’t get replaced in practice.

jessicat Jan 5, 2015, 2:24 AM
18 points
on: Compartmentalizing: Effective Altruism and Abortion
I don’t think you did justice to the replaceability argument. If fetuses are replaceable, then the only benefit of banning abortion is that it increases the fertility rate. However, there are far better ways to increase the fertility rate than banning abortion. For example, one could pay people to have children (and maybe give them up for adoption). So your argument is kind of like saying that since we really need farm laborers, we should allow slavery.

jessicat Dec 17, 2014, 1:56 AM
5 points
on: “incomparable” outcomes—multiple utility functions?
I think a useful meaning of “incomparable” is “you should think a very long time before deciding between these”. In situations like these, the right decision is not to immediately decide between them, but to think a lot about the decision and related issues. Sure, if someone has to make a split-second decision, they will probably choose whichever sounds better to them. But if given a long time, they might think about it a lot and still not be sure which is better.

This seems a bit similar to multiple utility functions in that if you have multiple utility functions then you might have to think a lot and resolve lots of deep philosophical issues to really determine how you should weight these functions. But even people who are only using one utility function can have lots of uncertainty about which outcome is better, and this uncertainty might slowly reduce (or be found to be intractable) if they think about the issue more. I think the outcomes would seem similarly incomparable to this person.

jessicat Dec 15, 2014, 2:58 AM
10 points
in reply to: DanielLC’s comment on: Open thread, Dec. 15 - Dec. 21, 2014
We updated on the fact that we exist. SSA does this a little too: specifically, the fact that you exist means that there is at least one observer. One way to look at it is that there is initially a constant number of souls that get used to fill in the observers of a universe. In this formulation, SIA is the result of the normal Bayesian update on the fact that soul-you woke up in a body.

jessicat Dec 11, 2014, 10:37 PM
57 points
on: What Peter Thiel thinks about AI risk
Transcript:

Question: Are you as afraid of artificial intelligence as your Paypal colleague Elon Musk?

Thiel: I’m super pro-technology in all its forms. I do think that if AI happened, it would be a very strange thing. Generalized artificial intelligence. People always frame it as an economic question, it’ll take people’s jobs, it’ll replace people’s jobs, but I think it’s much more of a political question. It would be like aliens landing on this planet, and the first question we ask wouldn’t be what does this mean for the economy, it would be are they friendly, are they unfriendly? And so I do think the development of AI would be very strange. For a whole set of reasons, I think it’s unlikely to happen any time soon, so I don’t worry about it as much, but it’s one of these tail risk things, and it’s probably the one area of technology that I think would be worrisome, because I don’t think we have a clue as to how to make it friendly or not.