Interestingly, you can have unboundedly many children with only quadratic population growth, so long as they are exponentially spaced. For example, give each newborn sentient a resource token, which can be used after the age of maturity (say, 100 years or so) to fund a child. Additionally, in the years 2^i every living sentient is given an extra resource token. One can show there is at most quadratic growth in the number of resource tokens. By adjusting the exponent in 2^i we can get growth O(n^{1+p}) for any nonnegative real p.
Nick Hay
Phil: Yes. CEV completely replaces and overwrites itself, by design. Before this point it does not interact with the external world to change it in a significant sense (it cannot avoid all change; e.g. its computer will add tiny vibrations to the Earth, as all computers do). It executes for a while then overwrites itself with a computer program (skipping every intermediate step here). By default, and if anything goes wrong, this program is “shutdown silently, wiping the AI system clean.”
(When I say “CEV” I really mean a FAI which satisfies the spirit behind the extremely partial specification given in the CEV document. The CEV document says essentially nothing of how to implement this specification.)
Personally, I prefer the longer posts.
guest: right, so with those definitions you are overconfident if you are suprised more than you expected, underconfident if you are suprised less, calibration being how close your suprisal is to your expectation of it.
I think there’s a sign error in my post—C(x0) = \log p(x0) + H(p) it should be.
Anon: no, I mean the log probability. In your example, the calibratedness will generally be high: - \log 0.499 - H(p) ~= 0.00289 each time you see tails, and—log 0.501 - H(p) ~= − 0.00289 each time you come up tails. It’s continuous.
Let’s be specific. We have H(p) = - \sum_x p(x) \log p(x), where p is some probability distribution over a finite set. If we observe x0, the say the predictor’s calibration is
C(x0) = \sum_x p(x) \log p(x) - \log p(x0) = - \log p(x0) - H(p)
so the expected calibration is 0 by the definition of H(p). The calibration is continuous in p. If \log p(x0) is higher then the expected value of \log p(x) then we are underconfident and C(x0) < 0; if \log p(x0) is lower than expected we are overconfident, and C>0.
With q = p(x) d(x,x0) the non-normalised probability distribution that assigns value only x0, we have
C = D(p||q)
so this is a relative entropy of sorts.
Anon: well-calibrated means roughly that in the class of all events you think have probability p to being true, the proportion of them that turn out to be true is p.
More formally, suppose you have a probability distribution over something you are going to observe. If the log probability of the event which actually occurs is equal to the entropy of your distribution, you are well calibrated. If it is above you are over confident, if it is below you are under confident. By this measure, assigning every possibility equal probability will always be calibrated.
This is related to relative entropy.
Just in case it’s not clear from the above: there are uncountably many degrees of freedom to an arbitrary complex function on the real line, since you can specify its value at each point independently.
A continuous function, however, has only countably many degrees of freedom: it is uniquely determined by its values on the rational numbers (or any dense set).
Eliezer: poetic and informative. I like it.
Tiiba:
The hypothesis is actual immortality, to which nonzero probability is being assigned. For example, suppose under some scenario your probability of dying at each time decreases by a factor of 1⁄2. Then, your total probability of dying is 2 times the probability of dying at the very first step, which we can assume far less than 1⁄2.
Felix: Yes, for example see http://en.wikipedia.org/wiki/NC_%28complexity%29
Eliezer: “You could see someone else’s engine operating materially, through material chains of cause and effect, to compute by “pure thought” that 1 + 1 = 2. How is observing this pattern in someone else’s brain any different, as a way of knowing, from observing your own brain doing the same thing? When “pure thought” tells you that 1 + 1 = 2, “independently of any experience or observation”, you are, in effect, observing your own brain as evidence.”
Richard: “It’s just fundamentally mistaken to conflate reasoning with “observing your own brain as evidence”.”
Eliezer: “If you view it as an argument, yes. The engines yield the same outputs.”
Richard: “What does the latter have to do with rationality?”
Pure thought is something your brain does. If you consider having successfully determined a conclusion from pure thought evidence that that thought is correct, then you must consider the output of your brain (i.e. its, that is your, internal representation of this conclusion) as valid evidence for the conclusion. Otherwise you have no reason to trust your conclusion is correct, because this conclusion is exactly the output of your brain after reasoning.
If you consider your own brain as evidence, and someone else’s brain works in the same way, computing the same answers as yours, observing their brain is the same as observing your brain is the same as observing your own thoughts. You could know abstractly that “Bob, upon contempating X for 10 minutes, would consider it a priori true iff I would”, perhaps from knowledge of both of your brains compute whether something is a priori true. If you then found out that “Bob thinks X a priori true” you could derive that X was a priori true without having to think about it: you know your output would be the same (“X is a priori true”) without having to determine it.
One reason is Cox’s theorem, which shows any quantitative measure of plausibility must obey the axioms of probability theory. Then this result, conservation of expected evidence, is a theorem.
What is the “confidence level”? Why is 50% special here?
Perhaps this formulation is nice:
0 = (P(H|E)-P(H))P(E) + (P(H|~E)-P(H))P(~E)
The expected change in probability is zero (for if you expected change you would have already changed).
Since P(E) and P(~E) are both positive, to maintain balance if P(H|E)-P(H) < 0 then P(H|~E)-P(H) > 0. If P(E) is large then P(~E) is small, so (P(H|~E)-P(H)) must be large to counteract (P(H|E)-P(H)) and maintain balance.
It seems the point of the exercise is to think of non-obvious cognitive strategies, ways of thinking, for improving things. The chronophone translation is both a tool both for finding these strategies by induction, and a rationality test to see if the strategies are sufficiently unbiased and meta.
But what would I say? The strategy of searching for and correcting biases in thought, failures of rationality, would improve things. But I think I generated that suggestion by thinking of “good ideas to transmit” which isn’t meta enough. Perhaps if I discussed various biases I was concerned about, gave a stream of thought analysis of how to improve a particular bias (say, anthropomorphism), this would be invoking the strategy rather than referencing it, thus passing the filter. Hmmm.
Ian C: neither group is changing human values as it is referred to here: everyone is still human, no one is suggesting neurosurgery to change how brains compute value. See the post value is fragile.