Scalable in what sense? Do you foresee some problem with one kitchen using the hiring model and other kitchens using the volunteer model?
Squark
I don’t follow. Do you argue that in some cases volunteering in the kitchen is better than donating? Why? What’s wrong with the model where the kitchen uses your money to hire workers?
I didn’t develop the idea, and I’m still not sure whether it’s correct. I’m planning to get back to these questions once I’m ready to use the theory of optimal predictors to put everything on rigorous footing. So I’m not sure we really need to block the external inputs. However, note that the AI is in a sense more fragile than a human since the AI is capable of self-modifying in irreversible damaging ways.
I assume you meant “more ethical” rather than “more efficient”? In other words, the correct metric shouldn’t just sum over QALYs, but should assign f(T) utils to a person with life of length T of reference quality, for f a convex function. Probably true, and I do wonder how it would affect charity ratings. But my guess is that the top charities of e.g. GiveWell will still be close to the top in this metric.
Your preferences are by definition the things you want to happen. So, you want your future self to be happy iff your future self’s happiness is your preference. Your ideas about moral equivalence are your preferences. Et cetera. If you prefer X to happen and your preferences are changed so that you no longer prefer X to happen, the chance X will happen becomes lower. So this change of preferences goes against your preference for X. There might be upsides to the change of preferences which compensate the loss of X. Or not. Decide on a case by case basis, but ceteris paribus you don’t want your preferences to change.
I don’t follow. Are you arguing that saving a person’s life is irresponsible if you don’t keep saving them?
If we find a mathematical formula describing the “subjectively correct” prior P and give it to the AI, the AI will still effectively use a different prior initially, namely the convolution of P with some kind of “logical uncertainty kernel”. IMO this means we still need a learning phase.
“I understand that it will reduce the chance of any preference A being fulfilled, but my answer is that if the preference changes from A to B, then at that time I’ll be happier with B”. You’ll be happier with B, so what? Your statement only makes sense of happiness is part of A. Indeed, changing your preferences is a way to achieve happiness (essentially it’s wireheading) but it comes on the expense of other preferences in A besides happiness.
″...future-me has a better claim to caring about what the future world is like than present-me does.” What is this “claim”? Why would you care about it?
I think it is more interesting to study how to be simultaneously supermotivated about your objectives and realistic about the obstacles. Probably requires some dark arts techniques (e.g. compartmentalization). Personally I find that occasional mental invocations of quasireligious imagery are useful.
I’m not sure about “no correct prior”, and even if there is no “correct prior”, maybe there is still “the right prior for me”, or “my actual prior”, which we can somehow determine or extract and build into an FAI?
This sounds much closer home. Note, however, that there is certain ambiguity between the prior and the utility function. UDT agents maximize Sum Prior(x) U(x) so certain simultaneous redefinitions of Prior and U will lead to the same thing.
Puerto Rico?! But Puerto Rico is already a US territory!
Cool! Who is this Kris Langman person?
As I discussed before, IMO the correct approach is not looking for the one “correct” prior since there is no such thing but specifying a “pure learning” phase in AI development. In the case of your example, we can imagine the operator overriding the agent’s controls and forcing it to produce various outputs in order to update away from Hell. Given a sufficiently long learning phase, all universal priors should converge to the same result (of course if we start from a ridiculous universal prior it will take ridiculously long, so I still grant that there is a fuzzy domain of “good” universal priors).
I have described essentially the same problem about a year ago, only in the framework of the updateless intelligence metric which is more sophisticated than AIXI. I have also proposed a solution, albeit provided no optimality proof. Hopefully such a proof will become possible once I make the updatless intelligence metric rigorous using the formalism of optimal predictors.
The details may change but I think that something in the spirit of that proposal has to be used. The AI’s subhuman intelligence growth phase has to be spent in a mode with frequentism-style optimality guarantees while in the superhuman phase it will switch to Bayesian optimization.
I fail to understand what is repugnant about the repugnant conclusion. Are there any arguments here except discrediting the conclusion using the label “repugnant”?
It is indeed conceivable to construct “safe” oracle AIs that answer mathematical questions. See also writeup by Jim Babcock and my comment. The problem is that the same technology can be relatively easily repurposed into an agent AI. Therefore, anyone building an oracle AI is really bad news unless FAI is created shortly afterwards.
I think that oracle AIs might be useful to control the initial testing process for an (agent) FAI but otherwise are far from solving the problem.
This is not a very meaningful claim since in modern physics momentum is not “mv” or any such simple formula. Momentum is the Noether charge associated with spatial translation symmetry which for field theory typically means the integral over space of some expression involving the fields and their derivatives. In general relativity things are even more complicated. Strictly speaking momentum conservation only holds for spacetime asymptotics which have spatial translation symmetry. There is no good analogue of momentum conservation for e.g. compact space.
Nonetheless, the EmDrive drive still shouldn’t work (and probably doesn’t work).
The concern that ML has no solid theoretical foundations reflects the old computer science worldview, which is all based on finding bit exact solutions to problems within vague asymptotic resource constraints.
It is an error to confuse the “exact / approximate” axis with the “theoretical / empirical” exis. There is plenty of theoretical work in complexity theory on approximate algorithms.
A good ML researcher absolutely needs a good idea of what is going on under the hood—at least at a sufficient level of abstraction.
There is difference between “having an idea” and “solid theoretical foundations”. Chemists before quantum mechanics had a lots of ideas. But they didn’t have a solid theoretical foundation.
Why not test safety long before the system is superintelligent? - say when it is a population of 100 child like AGIs. As the population grows larger and more intelligent, the safest designs are propagated and made safer.
Because this process is not guaranteed to yield good results. Evolution did the exact same thing to create humans, optimizing for genetic fitness. And humans still went and invented condoms.
So it may actually be easier to drop the traditional computer science approach completely.
When the entire future of mankind is at stake, you don’t drop approaches because it may be easier. You try every goddamn approach you have (unless “trying” is dangerous in itself of course).
Hi Yaacov, welcome!
I guess that you can reduce X-risk by financing the relevant organizations, contributing to research, doing outreach or some combination of the three. You should probably decide which of these paths you expect to follow and plan accordingly.
Hi Peter! I suggest you read up on UDT (updateless decision theory). Unfortunately, there is no good comprehensive exposition but see the links in the wiki and IAFF. UDT reasoning leads to discarding “fragile” hypotheses, for the following reason.
According to UDT, if you have two hypotheses H1, H2 consistent with your observations you should reason as if there are two universes Y1 and Y2 s.t. Hi is true in Yi and the decisions you make control the copies of you in both universes. Your goal is to maximize the a priori expectation value of your utility function U where the prior includes the entire level IV multiverse weighted according to complexity (Solomonoff prior). Fragile universes will be strongly discounted in the expected utility because of the amount of coincidences required to create them. Therefore if H1 is “fragile” and H2 isn’t, H2 is by far the more important hypothesis unless the complexity difference between them is astronomic.