I’m not asking researchers to predict what they will discover. There are different mindsets of research. One mindset is looking for heuristics that maximize short term progress on problems of direct practical relevance. Another mindset is looking for a rigorously defined overarching theory. MIRI is using the latter mindset while most other AI researchers are much closer to the former mindset.
Squark
I disagree with the part “her actions lead to different outcomes depending on what day it is.” The way I see it, the “outcome” is the state of the entire multiverse. It doesn’t depend on “what day it is” since “it” is undefined. The sleeping beauty’s action simultaneously affects the multiverse through several “points of interaction” which are located in different days.
Hi Charlie! Actually I complete agree with Vladimir on this: subjective probabilities are meaningless, meaningful questions are decision theoretic. When the sleeping beauty is asked “what day is it?” the question is meaningless because she is simultaneously in several different days (since identical copies of her are in different days).
[LINK] Vladimir Slepnev talks about logical counterfactuals
A “coincidence” is an a priori improbable event in your model that has to happen in order to create a situation containing a “copy” of the observer (which roughly means any agent with a similar utility function and similar decision algorithm).
Imagine two universe clusters in the multiverse: one cluster consists of universe running on fragile physics, another cluster consists of universes running on normal physics. The fragile cluster will contain much less agent-copies than the normal cluster (weighted by probability). Imagine you have to make a decision which produces different utilities depending on whether you are in the fragile cluster or the normal cluster. According to UDT, you have to think as even you are deciding for all copies. In other words, if you make decisions under the assumption you are in the fragile cluster, all copies make decisions under this assumption, if you make decisions under the assumption you are in the normal cluster, all copies make decisions under this assumption. Since the normal cluster is much more “copy-dense”, it pays off much more to make decisions as if you are in the normal cluster (since utility is aggregated over the entire multiverse).
The weighting comes from the Solomonoff prior. For example, see the paper by Legg.
I did a considerable amount of software engineer recruiting during my career. I only called the references at an advanced stage, after an interview. It seems to me that calling references before an interview would take too much of their time (since if everyone did this they would be called very often) and too much of my time (since I think their input would rarely disqualify a candidate at this point). The interview played the most important role in my final decision, but when a reference mentioned something negative which resonated with something that concerned me after the interview, this was often a reason to reject.
I’m digging into this a little bit, but I’m not following your reasoning. UDT from what I see doesn’t mandate the procedure you outline. (perhaps you can show an article where it does) I also don’t see how which decision theory is best should play a strong role here.
Unfortunately a lot of the knowledge on UDT is scattered in discussions and it’s difficult to locate good references. The UDT point of view is that subjective probabilities are meaningless (the third horn of the anthropic trilemma) thus the only questions it make sense to ask are decision-theoretic questions. Therefore decision theory does play a strong role in any question involving anthropics. See also this.
But anyways I think the heart of your objection seems to be “Fragile universes will be strongly discounted in the expected utility because of the amount of coincidences required to create them”. So I’ll free admit to not understanding how this discounting process works...
The weight of a hypothesis in the Solomonoff prior equals N 2^{-(K + C)} where K is its Kolomogorov complexity, C is the number of coin flips needed to produce the given observation and N is the number of different coin flip outcomes compatible with the given observation. Your fragile universes have high C and low N.
...but I will note that current theoretical structures (standard model inflation cosmology/string theory) have a large amount of constants that are considered coincidences and also produce a large amount of universes like ours in terms of physical law but different in terms of outcome.
Right. But these are weak points of the theory, not strong points. That is, if we find an equally simple theory which doesn’t require these coincidences it will receive substantially higher weight. Anyway your fragile universes have a lot more coincidences than any conventional physical theory.
I would also note that fragile universe “coincidences” don’t seem to me to be more coincidental in character than the fact we happen to live on a planet suitable for life.
In principle hypotheses with more planets suitable for life also get higher weight, but the effect levels off when reaching O(1) civilizations per current cosmological horizon because it is offset by the high utility of having the entire future light cone to yourself. This is essentially the anthropic argument for a late filter in the Fermi paradox, and the reason this argument doesn’t work in UDT.
Lastly I would also note that at this point we don’t have a good H1 or H2.
All of the physical theories we have so far are not fragile, therefore they are vastly superior to any fragile physics you might invent.
Hi Peter! I suggest you read up on UDT (updateless decision theory). Unfortunately, there is no good comprehensive exposition but see the links in the wiki and IAFF. UDT reasoning leads to discarding “fragile” hypotheses, for the following reason.
According to UDT, if you have two hypotheses H1, H2 consistent with your observations you should reason as if there are two universes Y1 and Y2 s.t. Hi is true in Yi and the decisions you make control the copies of you in both universes. Your goal is to maximize the a priori expectation value of your utility function U where the prior includes the entire level IV multiverse weighted according to complexity (Solomonoff prior). Fragile universes will be strongly discounted in the expected utility because of the amount of coincidences required to create them. Therefore if H1 is “fragile” and H2 isn’t, H2 is by far the more important hypothesis unless the complexity difference between them is astronomic.
Scalable in what sense? Do you foresee some problem with one kitchen using the hiring model and other kitchens using the volunteer model?
Meetup : Tel Aviv: Board Game Night
I don’t follow. Do you argue that in some cases volunteering in the kitchen is better than donating? Why? What’s wrong with the model where the kitchen uses your money to hire workers?
I didn’t develop the idea, and I’m still not sure whether it’s correct. I’m planning to get back to these questions once I’m ready to use the theory of optimal predictors to put everything on rigorous footing. So I’m not sure we really need to block the external inputs. However, note that the AI is in a sense more fragile than a human since the AI is capable of self-modifying in irreversible damaging ways.
I assume you meant “more ethical” rather than “more efficient”? In other words, the correct metric shouldn’t just sum over QALYs, but should assign f(T) utils to a person with life of length T of reference quality, for f a convex function. Probably true, and I do wonder how it would affect charity ratings. But my guess is that the top charities of e.g. GiveWell will still be close to the top in this metric.
Your preferences are by definition the things you want to happen. So, you want your future self to be happy iff your future self’s happiness is your preference. Your ideas about moral equivalence are your preferences. Et cetera. If you prefer X to happen and your preferences are changed so that you no longer prefer X to happen, the chance X will happen becomes lower. So this change of preferences goes against your preference for X. There might be upsides to the change of preferences which compensate the loss of X. Or not. Decide on a case by case basis, but ceteris paribus you don’t want your preferences to change.
I don’t follow. Are you arguing that saving a person’s life is irresponsible if you don’t keep saving them?
If we find a mathematical formula describing the “subjectively correct” prior P and give it to the AI, the AI will still effectively use a different prior initially, namely the convolution of P with some kind of “logical uncertainty kernel”. IMO this means we still need a learning phase.
“I understand that it will reduce the chance of any preference A being fulfilled, but my answer is that if the preference changes from A to B, then at that time I’ll be happier with B”. You’ll be happier with B, so what? Your statement only makes sense of happiness is part of A. Indeed, changing your preferences is a way to achieve happiness (essentially it’s wireheading) but it comes on the expense of other preferences in A besides happiness.
″...future-me has a better claim to caring about what the future world is like than present-me does.” What is this “claim”? Why would you care about it?
I think it is more interesting to study how to be simultaneously supermotivated about your objectives and realistic about the obstacles. Probably requires some dark arts techniques (e.g. compartmentalization). Personally I find that occasional mental invocations of quasireligious imagery are useful.
I’m not sure about “no correct prior”, and even if there is no “correct prior”, maybe there is still “the right prior for me”, or “my actual prior”, which we can somehow determine or extract and build into an FAI?
This sounds much closer home. Note, however, that there is certain ambiguity between the prior and the utility function. UDT agents maximize Sum Prior(x) U(x) so certain simultaneous redefinitions of Prior and U will lead to the same thing.
In Savage’s theorem acts are arbitrary functions from the set of states to the set of consequences. Therefore to apply Savage’s theorem in this context you have to consider blatantly inconsistent counterfactuals in which the sleeping beauty makes difference choices in computationally equivalent situations. If you have an extension of the utility function to these counterfactuals and it happens to satisfy the conditions of Savage’s theorem then you can assign probabilities. This extension is not unique. Moreover, in some anthropic scenarios in doesn’t exist (as you noted yourself).
Cox’s theorem only says that any reasonable measure of uncertainty can be transformed into a probability assignment. Here there is no such measure of uncertainty. Different counterfactual games lead to different probability assignments.