Wei_Dai2

Karma: 170

Wei_Dai2 Dec 14, 2008, 6:58 AM
3 points
on: What I Think, If Not Why
(Eliezer, why do you keep using “intelligence” to mean “optimization” even after agreeing with me that intelligence includes other things that we don’t yet understand?)

Morality does not compress

You can’t mean that morality literally does not compress (i.e. is truly random). Obviously there are plenty of compressible regularities in human morality. So perhaps what you mean is that it’s too hard or impossible to compress it into a small enough description that humans can understand. But, we also have no evidence that effective universal optimization in the presence of real-world computational constraints (as opposed to idealized optimization with unlimited computing power) can be compressed into a small enough description that humans can understand.

Wei_Dai2 Dec 13, 2008, 3:44 AM
2 points
on: What I Think, If Not Why
Eliezer, you write as if there is no alternative to this plan, as if your hand is forced. But that’s exactly what some people believe about neural networks. What about first understanding human morality and moral growth, enough so that we (not an AI) can deduce and fully describe someone’s morality (from his brain scan, or behavior, or words) and predict his potential moral growth in various circumstances, and maybe enough to correct any flaws that we see either in the moral content or in the growth process, and finally program the seed AI’s morality and moral growth based on that understanding once we’re convinced it’s sufficiently good? Your logic of (paraphrasing) “this information exists only in someone’s brain so I must let the AI grab it directly without attempting to understand it myself” simply makes no sense. First the conclusion doesn’t follow from the premise, and second if you let the AI grab and extrapolate the information without understanding it yourself, there is no way you can predict a positive outcome.

In case people think I’m some kind of moralist for harping on this so much, I think there are several other aspects of intelligence that are not captured by the notion of “optimization”. I gave some examples here. We need to understand all aspects of intelligence, not just the first facet for which we have a good theory, before we can try to build a truly Friendly AI.

Wei_Dai2 Dec 12, 2008, 7:24 PM
3 points
on: What I Think, If Not Why
Eliezer, as far as I can tell, “reflective equilibrium” just means “the AI/simulated non-sentient being can’t think of any more changes that it wants to make” so the real question is what counts as a change that it wants to make? Your answer seems to be whatever is decided by “a human library of non-introspectively-accessible circuits”. Well the space of possible circuits is huge, and “non-introspectively-accessible” certainly doesn’t narrow it down much. And (assuming that “a human library of circuits” = “a library of human circuits”) what is a “human circuit”? A neural circuit copied from a human being? Isn’t that exactly what you argued against in “Artificial Mysterious Intelligence”?

(It occurs to me that perhaps you’re describing your understanding of how human beings do moral growth and not how you plan for an AI/simulated non-sentient being to do it. But if so, that understanding seems to be similar in usefulness to “human beings use neural networks to decide how to satisfy their desires.”)

Eliezer wrote: I don’t think I’m finished with this effort, but are you unsatisfied with any of the steps I’ve taken so far? Where?

The design space for “moral growth” is just as big as the design space for “optimization” and the size of the target you have to hit in order to have a good outcome is probably just as small. More than any dissatisfaction with the specific steps you’ve taken, I don’t understand why you don’t seem to (judging from your public writings) view the former problem to be as serious and difficult as the latter one, if not more so, because there is less previous research and existing insights that you can draw from. Where are the equivalents of Bayes, von Neumann-Morgenstern, and Pearl, for example?

Wei_Dai2 Dec 12, 2008, 7:47 AM
4 points
on: What I Think, If Not Why
Isn’t CEV just a form of Artificial Mysterious Intelligence? Eliezer’s conversation with the anonymous AIfolk seems to make perfect sense if we search and replace “neural network” with “CEV” and “intelligence” with “moral growth/value change”.

How can the same person that objected to “Well, intelligence is much too difficult for us to understand, so we need to find some way to build AI without understanding how it works.” by saying “Look, even if you could do that, you wouldn’t be able to predict any kind of positive outcome from it. For all you knew, the AI would go out and slaughter orphans.” be asking us to to place our trust in the mysterious moral growth of nonsentient but purportedly human-like simulations?

Wei_Dai2 Dec 9, 2008, 6:59 AM
4 points
on: Artificial Mysterious Intelligence
Eliezer, MacKay’s math isn’t very difficult. I think it will take you at most a couple of hours to go through how he derived his equations, understand what they mean, and verify that they are correct. (If I knew you were going to put this off for a year, I’d mentioned that during the original discussion.) After doing that, the idea that sexual reproduction speeds up evolution by gathering multiple bad mutations together to be disposed of at once will become pretty obvious in retrospect.

Jeff, I agree with what you are saying, but you’re using the phrase “sexual selection” incorrectly, which might cause confusion to others. I think what you mean is “natural selection in a species with sexual reproduction”. “Sexual selection” actually means “struggle between the individuals of one sex, generally the males, for the possession of the other sex”.

Wei_Dai2 Nov 15, 2008, 1:57 AM
2 points
on: The Weighted Majority Algorithm
Even if P=BPP, that just means that giving up randomness causes “only” a polynomial slowdown instead of an exponential one, and in practice we’ll still need to use pseudorandom generators to simulate randomness.

It seems clear to me that noise (in the sense of randomized algorithms) does have power, but perhaps we need to develop better intuitions as to why that is the case.

Wei_Dai2 Nov 14, 2008, 10:41 PM
11 points
0
on: The Weighted Majority Algorithm
To generalize Peter’s example, a typical deterministic algorithm has low Kolmogorov complexity, and therefore its worst-case input also has low Kolmogorov complexity and therefore a non-negligible probability under complexity-based priors. The only possible solutions to this problem I can see are:

1. add randomization
2. redesign the deterministic algorithm so that it has no worst-case input
3. do a cost-benefit analysis to show that the cost of doing either 1 or 2 is not justified by the expected utility of avoiding the worst-case performance of the original algorithm, then continue to use the original deterministic algorithm

The main argument in favor of 1 is that its cost is typically very low, so why bother with 2 or 3? I think Eliezer’s counterargument is that 1 only works if we assume that in addition to the input string, the algorithm has access to a truly random string with a uniform distribution, but in reality we only have access to one input, i.e., sensory input from the environment, and the so called random bits are just bits from the environment that seem to be random.

My counter-counterargument is to consider randomization as a form of division of labor. We use one very complex and sophisticated algorithm to put a lower bound on the Kolmogorov complexity of a source of randomness in the environment, then after that, this source of randomness can be used by many other simpler algorithms to let them cheaply and dramatically reduce the probability of hitting a worst-case input.

Or to put it another way, before randomization, the environment does not need to be a malicious superintelligence for our algorithms to hit worst-case inputs. After randomization, it does.
What links here?
- The Power of Noise by jsteinhardt (Jun 16, 2014, 5:26 PM; 60 points)

Wei_Dai2 Nov 5, 2008, 6:59 AM
0 points
on: Complexity and Intelligence
Rolf, I was implicitly assuming that even knowing BB(k), it still takes O(k) bits to learn BB(k+1). But if this assumption is incorrect, then I need to change the setup of my prediction game so that the input sequence consists of the unary encodings of BB(1), BB(2), BB(4), BB(8), â¦, instead. This shouldnât affect my overall point, I think.

Wei_Dai2 Nov 5, 2008, 6:23 AM
3 points
on: Complexity and Intelligence
After further thought, I need to retract my last comment. Consider P(next symbol is 0|H) again, and suppose you’ve seen 100 0′s so far, so essentially you’re trying to predict BB(101). The human mathematician knows that any non-zero number he writes down for this probability would be way too big, unless he resorts to non-constructive notation like 1/BB(101). If you force him to answer “over and over, what their probability of the next symbol being 0 is” and don’t allow him to use notation like 1/BB(101) then he’d be forced to write down an inconsistent probability distribution. But in fact the distribution he has in mind is not computable, and that explains how he can beat Solomonoff Induction.

Wei_Dai2 Nov 5, 2008, 3:18 AM
4 points
on: Complexity and Intelligence
Good question, Eliezer. If the human mathematician is computable, why isn’t it already incorporated into Solomonoff Induction? It seems to me that the human mathematician does not behave like a Bayesian. Let H be the hypothesis that the input sequence is the unary encodings of Busy Beaver numbers. The mathematician will try to estimate, as best as he can, P(next symbol is 0|H). But when the next symbol turns out to be 1, he doesn’t do a Bayesian update and decrease P(H), but instead says “Ok, so I was wrong. The next Busy Beaver number is bigger than I expected.”

I’m not sure I understand what you wrote after “to be fair”. If you think a Solomonoff inductor can duplicate the above behavior with an alternative setup, can you elaborate how?

Wei_Dai2 Nov 5, 2008, 12:33 AM
1 point
on: Complexity and Intelligence
A halting oracle is usually said to output 1s or 0s, not proofs or halting times, right?

It’s easy to use such an oracle to produce proofs and halting times. The following assumes that the oracle outputs 1 if the input TM halts, and 0 otherwise.

For proofs: Write a program p which on inputs x and i, enumerates all proofs. If it finds a proof for x, and the i-th bit of that proof is 1, then it halts, otherwise it loops forever. Now query the oracle with (p,x,0), (p,x,1), …, and you get a proof for x if it has a proof.

Halting times: Write a program p which on inputs x and i, runs x for i steps. If x halts before i steps, then it halts, otherwise it loops forever. Now query the oracle with (p,x,2), (p,x,4), (p,x,8), …, until you get an output of “1” and then use binary search to get the exact halting time.

I don’t recall if I’ve mentioned this before, but Solomonoff induction in the mixture form makes no mention of the truth of its models. It just says that any computable probability distribution is in the mixture somewhere, so you can do as well as any computable form of cognitive uncertainty up to a constant.

Eliezer, if what you say is true, then it shouldn’t be possible for anyone, using just a Turing machine, to beat Solomonoff Induction at a pure prediction game (by more than a constant), even if the input sequence is uncomputable. But here is a counterexample. Suppose the input sequence consists of the unary encodings of Busy Beaver numbers BB(1), BB(2), BB(3), …, that is, BB(1) number of 1s followed by a zero, then BB(2) number of 1s followed by a 0, and so on. Let’s ask the predictor, after seeing n input symbols, what is the probability that it will eventually see a 0 again, and call this p(n). With Solomonoff Induction, p(n) will approach arbitrarily close to 0 as you increase n. A human mathematician on the other hand will recognize that the input sequence may not be computable and won’t let p(n) fall below some non-zero bound.

Wei_Dai2 Nov 4, 2008, 6:51 AM
5 points
on: Complexity and Intelligence
Nick wrote: Good point, but when the box says “doesn’t halt”, how do I know it’s correct?

A halting-problem oracle can be used for all kinds of things besides just checking whether an individual Turing machine will halt or not. For example you can use it to answer various mathematical questions and produce proofs of the answers, and then verify the proofs yourself. You should be able to obtain enough proofs to convince yourself that the black box is not just giving random answers or just being slightly smarter than you are.

If P!=NP, you should be able to convince yourself that the black box has at least exponentially more computational power than you do. So if you are an AI with say the computational resources of a solar system, you should be able to verify that the black box either contains exotic physics or has access to more resources than the rest of the universe put together.

Eliezer wrote: So once again I say: it is really hard to improve your math abilities with eyes open in a way that you couldn’t theoretically do with eyes closed.

It seems to me that an AI should/can never completely rule out the possibility that the universe contains physics that is mathematically more powerful than what it has already incorporated into itself, so it should always keep its eyes open. Even if it has absorbed the entire universe into itself, it might still be living inside a simulation, right?

Wei_Dai2 Nov 4, 2008, 4:15 AM
8 points
on: Complexity and Intelligence
In fact, it’s just bloody hard to fundamentally increase your ability to solve math problems in a way that “no closed system can do” just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.

Eliezer, you’re making an important error here. I don’t think it affects the main argument you’re making in this article (that considerations of “complexity” doesn’t rule out self-improving minds), but this error may have grave consequences elsewhere. The error is that while the environment does have to be magic, you don’t need to have faith in this, just not have faith that it’s impossible.

Suppose you get a hold of a black box that seems to act as a halting-problem oracle. You’ve thrown thousands of problems at it, and have never seen in incorrect or inconsistent answer. What are the possibilities here? Either (A) the environment really is magic (i.e. there is uncomputable physics that enables implementation of actual halting-problem oracles), or (B) the box is just giving random answers that happen to be correct by chance, or (C) you’re part of a simulation where the box is giving all possible combinations of answers and you happen to be in the part of the simulation where the box is giving correct answers. As long as your prior probability for (A) is not zero, as you do more and more tests and keep getting correct answers, it’s posterior probability will eventually dominate (B) and (C).

Why is this so important? Well in standard Solomonoff Induction, the prior for (A) is zero, and if we program that into an AI, it won’t do the right thing in this situation. This may have a large effect on expected utility (of us, people living today), because while the likelihood of us living in an uncomputable universe with halting-problem oracles is low, the utility we gain from correctly recognizing and exploiting such a universe could be huge.

Wei_Dai2 Jul 16, 2008, 12:52 AM
2 points
on: Fundamental Doubts
Some problems are hard to solve, and hard even to define clearly. It’s possible that “qualia” is not referring to anything meaningful, but unless you are able to explain why it feels meaningful to someone, but isn’t really, I don’t think you should demand that they stop using it.

Having said that, here’s my attempt at an operational definition of qualia.

Wei_Dai2 Jun 12, 2008, 11:55 AM
1 point
on: Living in Many Worlds
Eliezer, suppose the nature of the catastrophe is such that everyone on the planet dies instantaneously and painlessly. Why should such deaths bother you, given that identical people are still living in adjacent branches? If avoiding death is simply a terminal value for you, then I don’t see why encouraging births shouldn’t be a similar terminal value.

I agree that the worlds in which we survive may not be pleasant, but average utilitarianism implies that we should try to minimize such unpleasant worlds that survive, rather than the existential risk per se, which is still strongly counterintuitive.

I don’t know what you are referring to by “hard to make numbers add up on anthropics without Death events”. If you wrote about that somewhere else, I’ve missed it.

A separate practical problem I see with the combination of MWI and consequentialism is that due to branching, the measure of worlds a person is responsible for is always rapidly and continuously decreasing, so that for example I’m now responsible for a much smaller portion of the multiverse than I was just yesterday or even a few seconds ago. In theory this doesnât matter because the costs and benefits of every choice I face are reduced by the same factor, so the relative rankings are preserved. But in practice this seems pretty demotivational, since the subjective mental cost of making an effort appears to stay the same, while the objective benefits of such effort decreases rapidly. Eliezer, I’m curious how you’ve dealt with this problem.

Wei_Dai2 Jun 11, 2008, 6:49 PM
1 point
on: Living in Many Worlds
Put me down as a long time many-worlder who doesn’t see how it makes average utilitarianism more attractive.

It seems to me that MWI poses challenges for both average utilitarianism and sum utilitarianism. For sum utilitarianism, why bother to bring more potential people into existence in this branch, if those people are living in many other branches already?

But I wonder if Eliezer has considered that MWI plus average utilitarianism seems to imply that we don’t need to worry about certain types of existential risk. If some fraction of the future worlds that we’re responsible for gets wiped out, that wouldn’t lower the average utility, unless for some reason the fraction that gets wiped out would otherwise have had an average utility that’s higher than the average of the surviving branches. Assuming that’s not the case, the conclusion follows that we don’t need to worry about these risks, which seems pretty counterintuitive.

Wei_Dai2 Dec 20, 2007, 11:00 AM
3 points
on: Argument Screens Off Authority
Eliezer, what is your view of the relationship between Bayesian Networks and Solomonoff Induction? You’ve talked about both of these concepts on this blog, but I’m having trouble understanding how they fit together. A Google search for both of these terms together yields only one meaningful hit, which happens to be a mailing list post by you. But it doesn’t really touch on my question.

On the face of it, both Bayesian Networks and Solomonoff Induction are “Bayesian”, but they seem to be incompatible with each other. In the Bayesian Networks approach, conditional probabilities are primary, and the full probability distribution function is more of a mathematical formalism that stays in the background. Solomonoff Induction on the other hand starts with a fully specified (even if uncomputable) prior distribution and derives any conditional probabilities from it as needed. Do you have any idea how to reconcile these two approaches?

Wei_Dai2 Nov 23, 2007, 9:37 PM
2 points
on: Not for the Sake of Happiness (Alone)
Toby, how do you get around the problem that the greatest sum of happiness across all lifes probably involves turning everyone into wireheads and putting them in vats? Or in an even more extreme scenario, turning the universe into computers that all do nothing but repeatedly runs a program that simulates a person in an ultimate state of happiness. Assuming that we have access to limited resources, these methods seem to maximize happiness for a given amount of resources.

I’m sure you agree that this is not something we do want. Do you think that it is something we should want, or that the greatest sum of happiness across all lifes can be achieved in some other way?

Wei_Dai2 Nov 23, 2007, 1:29 AM
9 points
on: Not for the Sake of Happiness (Alone)
I agree with Eliezer here. Not all values can be reduced to desire for happiness. For some of us, the desire not to be wireheaded or drugged into happiness is at least as strong as the desire for happiness. This shouldn’t be a surprise since there were and still are pyschoactive substances in our environment of evolutionary adaptation.

I think we also have a more general mechanism of aversion towards triviality, where any terminal value that becomes “too easy” loses its value (psychologically, not just over evolutionary time). I’m guessing this is probably because many of our terminal values (art, science, etc.) exist because they helped our ancestors attract mates by signaling genetic superiority. But you can’t demonstrate genetic superiority by doing something easy.

Toby, I read your comment several times, but still can’t figure out what distinction you are trying to draw between the two senses of value. Can you give an example or thought experiment, where valuing happiness in one sense would lead you to do one thing, and valuing it in the other sense would lead you to do something else?

Michael, do you have a more specific reference to something Parfit has written?

Wei_Dai2 Nov 8, 2007, 10:08 PM
0 points
on: Natural Selection’s Speed Limit and Complexity Bound
Eliezer, I just noticed that you’ve updated the main post again. The paper by Worden that you link to makes the mistake of assuming no crossing or even chromosomal assortment, as you can see from the following quotes. It’s not surprising that sex doesn’t help under those assumptions.

(being quote)
Next consider what happens to one of the haploid genotypes j in one generation. Through random mating, it gets paired with another haploid genotype k, with probability q; then the pair have a probability of surviving sigmajk.
…
(b) Crossing: Similarly, in a realistic model of crossing, we can show that it always decreases the diploid genotype information JÂµ. This is not quite the same as proving that crossing always decreases IÂµ, but is a powerful plausibility argument that it does so. In that case, crossing will not violate the limit.
(end quote)

As for not observing species gaining thousands of bits per generation, that might be due to the rarity of beneficial mutations. A dog not apparently having greater morphological or biochemical complexity than a dinosaur can also be explained in many other ways.

If you have the time, I think it would be useful to make another post on this topic, since most people who read the original article will probably not see the detailed discussions in the comments or even notice the Addendum. You really should cite MacKay. His paper does provide a theoretical explanation for what happens in the simulations, if you look at the equations and how they are derived.