Daniel Kokotajlo answers The unfalsifiable belief in (doomsday) superintelligence scenarios?

Daniel Kokotajlo 18 Jan 2022 4:25 UTC
8 points
There is no clear line of argument from beginning to end but rather a disjunctive list of possibilities that all lead to similar extinction events. This leads the entire theory to not be falsifiable, cut off one road toward superintelligence or a related extinction outcome and a new one will pop up. Whatever amount of evidence mounted against it will never be enough.
Every sentence in the above is false, even if you condition on the previous sentences being true. On clarity, I guess it’s a matter of subjective judgment but Superintelligence seems significantly more clear than the average book, even the average academic book, IMO. If you don’t think it’s clear enough, check out Joseph Carlsmith’s report on power-seeking AI, which literally does it as a big conjunctive premise-conclusion-form argument. On falsifiability, this is just not how falsifiability works, as Lukas Gloor points out.* On “whatever amount of evidence mounted against it will never be enough...” well, even if it were unfalsifiable, you could still in principle provide enough evidence to change people’s minds about it, unless people are being super stubborn, which is totally possible (happens all the time, people are closed-minded about things) but not true of Bostrom et al IMO.
*I do agree that Superintelligence would be even more impressive if it had stuck its neck out and made a bunch of bold near-term predictions which had then turned out to be true. In that sense it was weakly unfalsifiable, in the same way that almost every book about almost everything is. The only bold near-term prediction I recall it making was that the recent surge of progress and interest in artificial intelligence was not going to subside into another AI winter, but rather would continue to grow and grow. At the time this was a pretty bold prediction IMO, since back then the chorus of talking heads forecasting AI Winter was louder than it is now. I don’t have my copy of Superintelligence with me now so I can’t check whether this prediction was actually made, sorry.
- Hickey 18 Jan 2022 19:35 UTC
  0 points
  Parent
  If on the onset there is a rejection of binary falsifiability then the argumentation Bostrom uses of disjunctive arguments with conjecture makes total sense, since every disjunction can only add to the total probability of it being true. Disproving each independent argument can then also not be done in a binary way, i.e. we can only decrease its probability.
  you could still in principle provide enough evidence to change people’s minds about it
  Changing the minds would be to decrease the probability of the (collective) argument to a point where it becomes not worth considering, yet, as Templarrr stated, any nonzero chance of extinction (for which preventative action could be undertaken) would be worth considering. Looking at it from this perspective there must be binary falsification because any chance greater than 0 makes the argument ‘valid’, i.e. worth considering.
  I am assuming there are a lot, perhaps contrived, cases of nonzero chance of extinction with possible preventative action which would sound preposterous to undertake compared to AI alignment (either for their absurdly low chance or absurdly high cost to undertake). Those do not interest me; rather I wonder if this is perceived as an actual problem and why/why not? I have no clue why it would not be a problem (maybe that’s where the contrived examples come in) and maybe it would not be a problem as it is definitive proof that they are right. The latter point I find very unconvincing, so I hope there are some better refutations at hand within the community.
  P.S. thanks for the recommendation, I will check what Joseph Carlsmith has written.
  - Donald Hobson 18 Jan 2022 22:39 UTC
    4 points
    Parent
    Looking at it from this perspective there must be binary falsification because any chance greater than 0 makes the argument ‘valid’, i.e. worth considering.
    This reasoning is absurd. You are letting utilities flow back and effect your epistemics.
    I think the notion of “falsification” as you state it is confused. In baysian probability theory, 0 and 1 are not probabilities, and nothing is ever certain. You start with some prior about the chance of AI doom, you read Nick Bostrums book and find evidence that updates these probabilities upwards.
    How you act on those beliefs is down to expected utility.
    Disjunctive style arguments tend to be more reliable than conjunctive arguments, for the same argument quality.
    Like we know earth is round because.
    Pictures from space.
    Shadow on moon during lunar eclipse.
    Combination of surface geographic measurements.
    A sphere is the stable state of Newtonian attraction + pressure.
    Ships going over horizon ⇒ Positive curvature. Only shape in euclidean 3d with positive curvature everywhere are topologically spheres. (Rules out doughnuts, not rounded cubes)
    Other celestial objects observed to be sphere.
    That is a disjunctive argument. A couple of the lines of evidence are weaker than others. Does this mean the theory is “unfalsifiable”. No. It means we have multiple reliable lines of evidence all saying the same thing.
    Yes I have seen those creationist “100 proofs god exists”. The problem with those is not the disjunctive argument style, its the low quality of each individual argument.
    - Hickey 19 Jan 2022 14:40 UTC
      1 point
      Parent
      I think me using the term “valid” was a very poor choice and saying “worth considering” was confusing. I agree that how you act on your beliefs/evidence should be down to the maximum expected utility and I think this is where the problems lie.
      Definition below taken from Artificial Intelligence: A Modern Approach by Russell and Norvig.
      The probability of outcome $s^{'}$ , given evidence observations $e$ , where $a$ stands for the event that action $a$ is executed. The agent’s preferences are captured by a utility function, $U (s)$ , which assigns a single number to express the desirability of a state. Expected utility of an action given the evidence, $E U (a | e)$ .
      $E U (a | e) = Σ_{s^{'}} P (R e s u l t (a) = s^{'} | a, e) U (s^{'})$
      The principle of maximum expected utility (MEU) says that a rational agent should choose
      the action that maximizes the agent’s expected utility: $a c t i o n = a r g m a x_{a} E U (a | e)$ .
      If we use this definition what would we fill in to be the utility of the outcome of going extinct? Probably something like $U (e x t i n c t) = 0$ ; the associated action might be something like not doing anything about AI alignment in this case. What would be enough (counter)evidence such that the action following from the principle of MEU would be to ‘risk’ the extinction? Unless I just overlooked something, I believe that $e$ has to be 0 which is, as you said, not a probability in Bayesian probability theory. I hope this makes it more clear what I was trying to get at.
      Your example of disjunctive style argument is very helpful. I guess you would state that none of them are 100% ‘proof’ of the earth being round but add (varying degrees of) probability to that hypothesis being true. That would mean that there is some very small probability that it might be flat. So then we would, with above expected utility function, never fly an airplane with associated actions for a flat earth as we would deem it very likely to crash and burn.
      I would add to your last creationist point
      low quality of each individual argument given the extreme burden of proof associated.
      - Donald Hobson 20 Jan 2022 13:42 UTC
        4 points
        Parent
        I think that the first paragraph after the block quote is highly confused.
        Your actions depend on your utility function, the actions you have available and the probabilities you assign to various outcomes, conditional on various actions. Lets look at a few examples. (Numbers contrived and made up.)
        These examples are deliberately constructed to show that expected utility theory doesn’t blindly output “Work on AI risk” regardless of input. Other assumptions would favour working on AI risk.
        You are totally selfish, and are old. The field of AI is moving slowly enough that it looks like not much will happen in your lifetime. You have a strong dislike of doing anything resembling AI safety work, and there isn’t much you could do. If you were utterly confidant AI wouldn’t come in your lifetime, you would have no reason to care. But, probabilities aren’t 0. So lets say you think there is a 1% chance of AI in your lifetime, and a 1 in a million chance that your efforts will make the difference between aligned and unaligned AI. U(Rest of life doing AI safety)=1. U(wiped out by killer AI)=0, U(Rest of life having fun)=2 and U(Living in FAI utopia)=10. Then the expected utility of having fun is 2*0.99+0.01*x*10 and the expected utility of AI safety work is 1*0.99+0.01*(x+0.000001)*10 where x is the chance of FAI. The latter expected utility is lower.
        You are a perfect total utilitarian, and highly competent. You estimate that the difference between galactic utopia and extinction is so large that all other bits of utility are negligible in comparison. You estimate that if you work on Biotech safety, there is a 6% chance of AI doom, a 5% chance of bioweapon doom, and the remaining 89% chance of galactic utopia. You also estimate that if you work on AI safety there is a 5.9% chance of AI doom and a 20% chance of bioweapon doom, leaving only a 74.1% chance of galactic utopia. (You are really good at biosafety in particular) You choose to work on the biotech.
        You are an average utilitarian, taking your utility function to be U=pleasure/(pleasure+suffering) over all minds you consider to be capable of such feelings. If a galactic utopia occurs, its size is huge enough to wash out everything that has happened on earth so far leaving a utility of basically 1. You thing there is a 0.1% chance of this happening. You think humans on average experience 2x as much pleasure as suffering, and farm animals on average experience 2x as much suffering as pleasure, and there are an equal number of each. Hence in the 99.9% case where AI wipes us out, the utility is exactly 0.5. However, you have a chance to reduce the number of farm animals to ever exist by 10%, leaving a utility of (2+0.9)/(2+0.9+ 1+1.8)=0.509. This increases your expected utility by 0.009. An opportunity to increase the chance of FAI galactic utopia from 0.1% to 1.1% is only worth 0.005, (a 1% chance of going from U=0.5 to U=1) Therefore reducing the number of farm animals to exist takes priority.
        Hickey 22 Jan 2022 11:26 UTC
        4 points
        Parent
        Thank you for those examples. I think this shows that the way I used a utility function but without placing it in a ‘real’ situation, i.e. not some locked-off situation without much in terms of viable alternative actions with some utility, is a fallacy.
        I suppose then that I conflated the “What can I know?” with the “What must I do?”, separating a belief from an associated action (I think) resolves most of the conflicts that I saw.
      - JBlack 20 Jan 2022 2:11 UTC
        4 points
        Parent
        Utilities in decision theory are both scale and translation invariant. It makes no sense to ask what the utility of going extinct “would be” in isolation from the utilities of every other outcome. All that matters are ratios of differences of utilities, since those are all that are relevant to finding the argmax of the linear combination of utilities.
        I’m not sure what you mean by “I believe that e has to be 0”, since e is a set of observations, not a number. Maybe you meant P(e) = 0? But this makes no sense either since then conditional probabilities are undefined.
        Hickey 22 Jan 2022 11:27 UTC
        1 point
        Parent
        I meant P(e) = 0 and the point was to show that that does not make sense. But I think Donald has shown me exactly where I went wrong. You cannot have a utility function and then not place it in a context within which you have other feasible actions. See my response to Hobson.