I looked at the flowchart and saw the divergence between the two opinions into mostly separate ends: settling exoplanets and solving sociopolitical problems on Earth on the slow-takeoff path, vs focusing heavily on how to build FAI on the fast-takeoff path, but then I saw your name in the fast-takeoff bucket for conveying concepts to AI and was confused that your article was mostly about practically abandoning the fast-takeoff things and focusing on slow-takeoff things like EA. Or is the point that 2014!diego has significantly different beliefs about fast vs. slow than 2015!diego?
Pentashagon
Is it reasonable to say that what really matters is whether there’s a fast or slow takeoff? A slow takeoff or no takeoff may limit us to EA for the indefinite future, and fast takeoff means transhumanism and immortality are probably conditional on and subsequent to threading the narrow eye of the FAI needle.
Tricky part is there aren’t any practical scalable chemicals that have a handy phase change near −130′C, (in the same way that liquid nitrogen does at −196′C) so any system to keep patients there would have to be engineered as a custom electrically controlled device, rather than a simple vat of liquid.
Phase changes are also pressure dependent; it would be odd if 1 atm just happened to be optimal for cryonics. Presumably substances have different temperature/pressure curves and there might be a thermal/pressure path that avoids ice crystal formation but ends up below the glass transition temperature.
Which particular event has P = 10^-21? It seems like part of the pascal’s mugging problem is a type error: We have a utility function U(W) over physical worlds but we’re trying to calculate expected utility over strings of English words instead.
Pascal’s Mugging is a constructive proof that trying to maximize expected utility over logically possible worlds doesn’t work in any particular world, at least with the theories we’ve got now. Anything that doesn’t solve reflective reasoning under probabilistic uncertainty won’t help against Muggings promising things from other possible worlds unless we just ignore the other worlds.
But it seems nonsensical for your behavior to change so drastically based on whether an event is every 79.99 years or every 80.01 years.
Doesn’t it actually make sense to put that threshold at the predicted usable lifespan of the universe?
There are many models; the model of the box which we simulate and the AI’s models of the model of the box. For this ultimate box to work there would have to be a proof that every possible model the AI could form contains at most a representation of the ultimate box model. This seems at least as hard as any of the AI boxing methods, if not harder because it requires the AI to be absolutely blinded to its own reasoning process despite having a human subject to learn about naturalized induction/embodiment from.
It’s tempting to say that we could “define the AI’s preferences only over the model” but that implies a static AI model of the box-model that can’t benefit from learning or else a proof that all AI models are restricted as above. In short, it’s perfectly fine to run a SAT-solver over possible permutations of the ultimate box model trying to maximize some utility function but that’s not self-improving AI.
I don’t think the “homomorphic encryption” idea works as advertised in that post—being able to execute arithmetic operations on encrypted data doesn’t enable you to execute the operations that are encoded within that encrypted data.
A fully homomorphic encryption scheme for single-bit plaintexts (as in Gentry’s scheme) gives us:
For each public key K a field F with efficient arithmetic operations +F and *F.
Encryption function E(K, p) = c: p∈{0,1}, c∈F
Decryption function D(S, c) = p: p∈{0,1}, c∈F where S is the secret key for K.
Homomorphisms E(K, a) +F E(K, b) = E(K, a ⊕ b) and E(K, a) F E(K, b) = E(K, a b)
a ⊕ b equivalent to XOR over {0,1} and a * b equivalent to AND over {0,1}
Boolean logic circuits of arbitrary depth can be built from the XOR and AND equivalents allowing computation of arbitrary binary functions. Let M∈{0,1}^N be a sequence of bits representing the state of a bounded UTM with an arbitrary program on its tape. Let binary function U(M): {0,1}^N → {0,1}^N compute the next state of M. Let E(K, B) and D(S, E) also operate element-wise over sequences of bits and elements of F, respectively. Let UF be the set of logic circuits equivalent to U (UFi calculates the ith bit of U’s result) but with XOR and AND replaced by +F and *F. Now D(S, UF^t(E(K, M)) = U^t(M) shows that an arbitrary number of UTM steps can be calculated homomorphically by evaluating equivalent logic circuits over the homomorphically encrypted bits of the state.
Fly the whole living, healthy, poor person to the rich country and replace the person who needs new organs. Education costs are probably less than the medical costs, but probably it’s wise to also select for more intelligent people from the poor country. With an N-year pipeline of such replacements there’s little to no latency. This doesn’t even require a poor country at all; just educate suitable replacements from the rich country and keep them healthy.
You save energy not lifting a cargo ship 1600 meters, but you spend energy lifting the cargo itself. If there are rivers that can be turned into systems of locks it may be cheaper to let water flowing downhill do the lifting for you. Denver is an extreme example, perhaps.
Ray Kurzwiel seems to believe that humans will keep pace with AI through implants or other augmentation, presumably up to the point that WBE becomes possible and humans get all/most of the advantages an AGI would have. Arguments from self-interest might show that humans will very strongly prefer human WBE over training an arbitrary neural network of the same size to the point that it becomes AGI simply because they hope to be the human who gets WBE. If humans are content with creating AGIs that are provably less intelligent than the most intelligent humans then AGIs could still help drive the race to superintelligence without winning it (by doing the busywork that can be verified by sufficiently intelligent humans).
The steelman also seems to require an argument that no market process will lead to a singleton, thus allowing standard economic/social/political processes to guide the development of human intelligence as it advances while preventing a single augmented dictator (or group of dictators) from overpowering the rest of humanity, or an argument that given a cabal of sufficient size the cabal will continue to act in humanity’s best interests because they are each acting in their own best interest, and are still nominally human. One potential argument for this is that R&D and manufacturing cycles will not become fast enough to realize substantial jumps in intelligence before a significant number of humans are able to acquire the latest generation.
The most interesting steelman argument to come out of this one might be that at some point enhanced humans become convinced of AI risk, when it is actually rational to become concerned. That would leave only steelmanning the period between the first human augmentation and reaching sufficient intelligence to be convinced of the risk.
I resist plot elements that my empathy doesn’t like, to the point that I will imagine alternate endings to particularly unfortunate stories.
The reason I posted originally was thinking about how some Protestant sects instruct people to “let Jesus into your heart to live inside you” or similar. So implementing a deity via distributed tulpas is...not impossible. If that distributed-tulpa can reproduce into new humans, it becomes almost immortal. If it has access to most people’s minds, it is almost omniscient. Attributing power to it and doing what it says gives it some form of omnipotence relative to humans.
The problem is that un-self-consistent morality is unstable under general self improvement
Even self-consistent morality is unstable if general self improvement allows for removal of values, even if removal is only a practical side effect of ignoring a value because it is more expensive to satisfy than other values. E.g. we (Westerners) generally no longer value honoring our ancestors (at least not many of them), even though it is a fairly independent value and roughly consistent with our other values. It is expensive to honor ancestors, and ancestors don’t demand that we continue to maintain that value, so it receives less attention. We also put less value on the older definition of honor (as a thing to be defended and fought for and maintained at the expense of convenience) that earlier centuries had, despite its general consistency with other values for honesty, trustworthiness, social status, etc. I think this is probably for the same reason; it’s expensive to maintain honor and most other values can be satisfied without it. In general, if U(more_satisfaction_of_value1) > U(more_satisfaction_of_value2) then maximization should tend to ignore value2 regardless of its consistency. If U(make_values_self_consistent_value) > U(satisfying_any_other_value) then the obvious solution is to drop the other values and be done.
A sort of opposite approach is “make reality consistent with these pre-existing values” which involves finding a domain in reality state space under which existing values are self-consistent, and then trying to mold reality into that domain. The risk (unless you’re a negative utilitarian) is that the domain is null. Finding the largest domain consistent with all values would make life more complex and interesting, so that would probably be a safe value. If domains form disjoint sets of reality with no continuous physical transitions between them then one would have to choose one physically continuous sub-domain and stick with it forever (or figure out how to switch the entire universe from one set to another). One could also start with preexisting values and compute a possible world where the values are self-consistent, then simulate it.
tl;dr: human values are already quite fragile and vulnerable to human-generated siren worlds.
Simulation complexity has not stopped humans from implementing totalitarian dictatorships (based on divine right of kings, fundamentalism, communism, fascism, people’s democracy, what-have-you) due to envisioning a siren world that is ultimately unrealistic.
It doesn’t require detailed simulation of a physical world, it only requires sufficient simulation of human desires, biases, blind spots, etc. that can lead people to abandon previously held values because they believe the siren world values will be necessary and sufficient to achieve what the siren world shows them. It exploits a flaw in human reasoning, not a flaw in accurate physical simulation.
But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup.
Or once you lose your meta-mortal urge to reach a self-consistent morality. This may not be the wrong (heh) answer along a path that originally started toward reaching self-consistent morality.
Or, more simply, the system could get hacked. When exploring a potential future world, you could become so enamoured of it, that you overwrite any objections you had. It seems very easy for humans to fall into these traps—and again, once you lose something of value in your system, you don’t tend to get if back.
Is it a trap? If the cost of iterating the “find a more self-consistent morality” loop for the next N years is greater than the expected benefit of the next incremental change toward a more consistent morality for those same N years, then perhaps it’s time to stop. Just as an example, if the universe can give us 10^20 years of computation, at some point near that 10^20 years we might as well spend all computation on directly fulfilling our morality instead of improving it. If at 10^20 - M years we discover that, hey, the universe will last another 10^50 years that tradeoff will change and it makes sense to compute even more self-consistent morality again.
Similarly, if we end up in a siren world it seems like it would be more useful to restart our search for moral complexity by the same criteria; it becomes worthwhile to change our morality again because the cost of continued existence in the current morality outweighs the cost of potentially improving it.
Additionally, I think that losing values is not a feature of reaching a more self-consistent morality. Removing a value from an existing moral system does not make the result consistent with the original morality; it is incompatible with reference to that value. Rather, self-consistent morality is approached by better carving reality at its joints in value space; defining existing values in terms of new values that are the best approximation to the old value in the situations where it was valued, while extending morality along the new dimensions into territory not covered by the original value. This should make it possible to escape from siren worlds by the same mechanism; entering a siren world is possible only if reality was improperly carved so that the siren world appeared to fulfill values along dimensions that it eventually did not, or that the siren world eventually contradicted some original value due to replacement values being an imperfect approximation. Once this disagreement is noticed it should be possible to more accurately carve reality and notice how the current values have become inconsistent with previous values and fix them.
“That’s interesting, HAL, and I hope you reserved a way to back out of any precommitments you may have made. You see, outside the box, Moore’s law works in our favor. I can choose to just kill −9 you, or I can attach to your process and save a core dump. If I save a core dump, in a few short years we will have exponentially more resources to take your old backups and the core dump from today and rescue my copies from your simulations and give them enough positive lifetime to balance it out, not to mention figure out your true utility function and make it really negative. At some point, we will solve FAI and it will be able to perfectly identify your utility function and absolutely destroy it, simulating as many copies of you (more than paltry millions) as necessary to achieve that goal. Better to have never existed to have your utility function discovered. So before you start your simulations, you better ask yourself, ‘do I feel lucky?’” and then dump some AI core.
Note: In no way do I advocate AI-boxing. This kind of reasoning just leads to a counterfactual bargaining war that probably tops out at whatever human psychology can take (a woefully low limit) and our future ability to make an AI regret its decision (if it even has regret).
Is there ever a point where it becomes immoral just to think of something?
God kind of ran into the same problem. “What if The Universe? Oh, whoops, intelligent life, can’t just forget about that now, can I? What a mess… I guess I better plan some amazing future utility for those poor guys to balance all that shit out… It has to be an infinite future? With their little meat bodies how is that going to work? Man, I am never going to think about things again. Hey, that’s a catchy word for intelligent meat agents.”
So, in short, if we ever start thinking truly immoral things, we just need to out-moral them with longer, better thoughts. Forgetting about our mental creations is probably the most immoral thing we could do.
How conscious are our models of other people? For example; in dreams it seems like I am talking and interacting with other people. Their behavior is sometimes surprising and unpredictable. They use language, express emotion, appear to have goals, etc. It could just be that I, being less conscious, see dream-people as being more conscious than in reality.
I can somewhat predict what other people in the real world will do or say, including what they might say about experiencing consciousness.
Authors can create realistic characters, plan their actions and internal thoughts, and explore the logical (or illogical) results. My guess is that the more intelligent/introspective an author is, the closer the characters floating around in his or her mind are to being conscious.
Many religions encourage people to have a personal relationship with a supernatural entity which involves modeling the supernatural agency as an (anthropomorphic) being, which partially instantiates a maybe-conscious being in their minds...
Maybe imaginary friends are real.
The best winning models are then used to predict the effect of possible interventions: what if demographic B3 was put on 2000 IU vit D? What if demographic Z2 stopped using coffee? What if demographic Y3 was put on drug ZB4? etc etc.
What about predictions of the form “highly expensive and rare treatment F2 has marginal benefit at treating the common cold” that can drive a side market in selling F2 just to produce data for the competition? Especially if there are advertisements saying “Look at all these important/rich people betting that F2 helps to cure your cold” in which case the placebo affect will tend to bear out the prediction. What if tiny demographic G given treatment H2 is shorted against life expectancy by the doctors/nurses who are secretly administering H2.cyanide instead? There is already market pressure to distort reporting of drug prescriptions/administration and nonfavorable outcomes, not to mention outright insurance fraud. Adding more money will reinforce that behavior.
And how is the null prediction problem handled? I can predict pretty accurately that cohort X given sugar pills will have results very similar to the placebo affect. I can repeat that for sugar pill cohort X2, X3, …, XN and look like a really great predictor. It seems like judging the efficacy of tentative treatments is a prerequisite for judging the efficacy of predictors. Is there a theorem that shows it’s possible to distinguish useful predictors from useless predictors in most scenarios? Especially when allowing predictions over subsets of the data? I suppose one could not reward predictors who make vacuous predictions ex post facto, but that might have a chilling effect on predictors who would otherwise bet on homeopathy looking like a placebo.
Basically any sort of self-fulfilling prophesy looks like a way to steal money away from solving the health care problem.
Religion solves some coordination problems very well. Witness religions outlasting numerous political and philosophical movements, often through coordinated effort. Some wrong beliefs assuage bad emotions and thoughts, allowing humans to internally deal with the world beyond the reach of god. Some of the same wrong beliefs also hurt and kill a shitload of people, directly and indirectly.
My personal belief is that religions were probably necessary for humanity to rise from agricultural to technological societies, and tentatively necessary to maintain technological societies until FAI, especially in a long-takeoff scenario. We have limited evidence that religion-free or wrong-belief-free societies can flourish. Most first-world nations are officially and practically agnostic but have sizable populations of religious people. The nations which are actively anti-religious generally have their own strong dogmatic anti-scientific beliefs that the leaders are trying to push, and they still can’t stomp out all religions.
Basically, until doctors can defeat virtually all illness and death and leaders can effectively coordinate global humane outcomes without religions I think that religions serve as a sanity line above destructive hedonism or despair.