Not Relevant comments on MIRI announces new “Death With Dignity” strategy

Not Relevant 2 Apr 2022 13:29 UTC
31 points
Something that would be of substantial epistemic help to me is if you (Eliezer) would be willing to estimate a few conditional probabilities (coarsely, I’m not asking you to superforecast) about the contributors to P(doom). Specifically:
- timelines (when will we get AGI)
- alignment research (will we have a scheme that seems ~90% likely to work for ~slightly above human level AGI), and
- governance (will we be able to get everyone to use that or an equivalently promising alignment scheme).
For example, it seems plausible that a large fraction of your P(doom) is derived from your belief that P(10 year timelines) is large and both P(insufficient time for any alignment scheme| <10 year timelines ) and P(insufficient time for the viability of consensus-requiring governance schemes | <10 year timelines) are small. OR it could be that even given 15-20 year timelines, your probability of a decent alignment scheme emerging is ~equally small, and that fact dominates all your prognoses. It’s probably some mix of both, but the ratios are important.

Why would others care? Well, from an epistemic “should I defer to someone who’s thought about it more than me” perspective, I consider you a much greater authority on the hardness of alignment given time, i.e. your knowledge of the probabilities f(hope-inducing technical solution | x years until AGI, at least y serious researchers working for z fraction of those years) for different values of x, y, and z. On the other hand, I might consider you less of a world-expert in AI timelines, or assessing the viability of governance interventions (e.g. mass popularization campaigns). I’m not saying that a rando would have better estimates, but a domain expert could plausibly not need to heavily update off your private beliefs even after evaluating your public arguments.

So, to be specific about the probabilities that would be helpful:

P(alignment ~solution | <10 years to AGI)

P(alignment ~solution | 15-20 years to AGI) (You can interpolate expand these ranges if you have time)

P(alignment ~solution | 15-20 years to AGI, 100x size of alignment research field within 5 years)

A few other probabilities could also be useful for sanity checks to illustrate how your model cashes out to <1%, though I know you’ve preferred to avoid some of these in the past:

P(governance solution | 15-20 years to AGI)

P(<10 years to AGI)

P(15-20 years)

Background for why I care: I can think of/work on many governance schemes that have good odds of success given 20 years but not 10 (where success means buying us another ~10 years), and separately can think of/work on governance-ish interventions that could substantially inflate the # of good alignment researchers within ~5 years (Eg from 100 → 5000), but this might only be useful given >5 additional years after that, so that those people actually have time to do work. (Do me the courtesy of suspending disbelief in our ability to accomplish those objectives.)

I have to assume you’ve thought of these schemes, and so I can’t tell whether you think they won’t work because you’re confident in short timelines or because of your inside view that “alignment is hard and 5,000 people working for ~15 years are still <10% likely to make meaningful progress and buy themselves more time to do more work”.
- Rob Bensinger 2 Apr 2022 18:51 UTC
  18 points
  Parent
  I have to assume you’ve thought of these schemes, and so I can’t tell whether you think they won’t work because you’re confident in short timelines or because of your inside view that “alignment is hard and 5,000 people working for ~15 years are still <10% likely to make meaningful progress and buy themselves more time to do more work”.
  I don’t know Eliezer’s views here, but the latter sounds more Eliezer-ish to my ears. My Eliezer-model is more confident that alignment is hard (and that people aren’t currently taking the problem very seriously) than he is confident about his ability to time AGI.
  I don’t know the answer to your questions, but I can cite a thing Eliezer wrote in his dialogue on biology-inspired AGI timelines:
  I consider naming particular years to be a cognitively harmful sort of activity; I have refrained from trying to translate my brain’s native intuitions about this into probabilities, for fear that my verbalized probabilities will be stupider than my intuitions if I try to put weight on them. What feelings I do have, I worry may be unwise to voice; AGI timelines, in my own experience, are not great for one’s mental health, and I worry that other people seem to have weaker immune systems than even my own. But I suppose I cannot but acknowledge that my outward behavior seems to reveal a distribution whose median seems to fall well before 2050.
  Also, you wrote:
  governance (will we be able to get everyone to use that or an equivalently promising alignment scheme).
  This makes it sound like you’re imagining an end-state of ‘AGI proliferates freely, but we find some way to convince everyone to be cautious and employ best practices’. If so, that’s likely to be an important crux of disagreement between you and Eliezer; my Eliezer-model says that ‘AGI proliferates freely’ means death, and the first goal of alignment is to execute some pivotal act that safely prevents everyone and their grandmother from being able to build AGI. (Compare ‘everyone and their grandmother has a backyard nuclear arsenal’.)
  - Not Relevant 2 Apr 2022 22:55 UTC
    133 points
    Parent
    Hey Rob, thanks for your reply. If it makes you guys feel better, you can rationalize the following as my expression of the Bargaining stage of grief.
    I don’t know Eliezer’s views here, but the latter sounds more Eliezer-ish to my ears. My Eliezer-model is more confident that alignment is hard (and that people aren’t currently taking the problem very seriously) than he is confident about timing AGI.
    Consider me completely convinced that alignment is hard, and that a lot of people aren’t taking it seriously enough, or are working on the wrong parts of the problem. That is fundamentally different from saying that it’s unlikely to be solved even if we get 100 $\times$ as many people working on it (albeit for a shorter time), especially if you believe that geniuses are outliers and thus that the returns on sampling for more geniuses remain large even after drawing many samples (especially if we’ve currently sampled <500 over the lifetime of the field). To get down to <1% probability of success, you need a fundamentally different argument structure. Here are some examples.
    “We have evidence that alignment will absolutely necessitate a lot of serial research. This means that even if lots more people join, the problem by its nature cannot be substantially accelerated by dramatically increasing the number of researchers (and consequently with high probability increasing the average quality of the top 20 researchers).”
    I would love to see the structure of such an argument.
    “We have a scheme for comprehensively dividing up all plausible alignment approaches. For each class of approach, we have hardness proofs, or things that practically serve as hardness proofs such that we do not believe 100 smart people thinking about it for a decade are at all likely to make more progress than we have in the previous decade.”
    Needless to say, if you had such a taxonomy (even heuristically) it would be hugely valuable to the field—if for no other reason than that it would serve as an excellent communication mechanism to skeptics about the flaws in their approaches.
    This would also be massively important from a social-coordination perspective. Consider how much social progress ELK made in building consensus around the hardness of the ontology-mismatch problem. What if we did that, but for every one of your hardness pseudo-results, and made the prize $10M for each hardness result instead of $50k and broadcasted it to the top 200 CS departments worldwide? It’d dramatically increase the salience of alignment as a real problem that no one seems able to solve, since if someone already could they’d have made $10M.
    “We are the smartest the world has to offer; even if >50% of theoretical computer scientists and >30% of physicists and >30% of pure mathematicians at the top 100 American universities were to start working on these problems 5 years from now, they would be unlikely to find much we haven’t found.”
    I’m not going to tell you this is impossible, but I haven’t seen the argument made yet. From an outside-view, the thing that makes Eliezer Yudkowsky get to where he is is (1) being dramatic-outlier-good at generalist reasoning, and (2) being an exceptional communicator to a certain social category (nerds). Founding the field is not, by itself, a good indicator of being dramatic-outlier-exceptional at inventing weird higher-level math. Obviously still MIRI are pretty good at it! But the best? Of all those people out there?
    It would be really really helpful to have a breakdown of why MIRI is so pessimistic, beyond just “we don’t have any good ideas about how to build an off-switch; we don’t know how to solve ontology-mismatch; we don’t know how to prevent inner misalignment; also even if you solve them you’re probably wrong in some other way, based on our priors about how often rockets explode”. I agree those are big real unsolved problems. But, like, I myself have thought of ideas previously-unmentioned and near the research frontier on inner misalignment, and it wasn’t that hard! It did not inspire me with confidence that no amount of further thinking by newbs is likely to make any headway on these problems. Also, “alignment is like building a rocket except we only get one shot” was just as true a decade ago; why were you more optimistic before? Is it all just the hardness of the off-switch problem specifically?
    This makes it sound like you’re imagining an end-state of ‘AGI proliferates freely, but we find some way to convince everyone to be cautious and employ best practices’. If so, that’s likely to be an important crux of disagreement between you and Eliezer; my Eliezer-model says that ‘AGI proliferates freely’ means death, and the first goal of alignment is to execute some pivotal act that safely prevents everyone and their grandmother from being able to build AGI. (Compare ‘everyone and their grandmother has a backyard nuclear arsenal’.)
    I agree that proliferation would spell doom, but the supposition that the only possible way to prevent proliferation is via building an ASI and YOLOing to take over the world is, to my mind, a pretty major reduction of the options available. Arguably the best option is compute governance; if you don’t have extremely short (<10 year) timelines, it seems probable (>20%) that it will take many, many chips to train an AGI, let alone an ASI. In any conceivable world, these chips are coming from a country under either the American or Chinese nuclear umbrella. (This is because fabs are comically expensive and complex, and EUV lithography machines expensive and currently a one-firm monopoly, though a massively-funded Chinese competitor could conceivably arise someday. To appreciate just how strong this chokepoint is, the US military itself is completely incapable of building its own fabs, if the Trusted Foundry Program is any indication.) If China and NATO were worried that randos training large models had a 10% chance of ending the world, they would tell them to quit it. The fears about “Facebook AI Research will just build an AGI” sound much less plausible if you have 15/20-year timelines, because if the US government tells Facebook they can’t do that, Facebook stops. Any nuclear-armed country outside China/NATO can’t be controlled this way, but then they just won’t get any chips. “Promise you won’t build AGI, get chips, betray US/China by building AGI anyway and hope to get to ASI fast enough to take over the world” is hypothetically possible, but the Americans and Chinese would know that and could condition the sale of chips on as many safeguards as we could think of. (Or just not sell the chips, and make India SSH into US-based datacenters.)
    Addressing possible responses:
    It’s impossible to know where compute goes once it leaves the fabs.
    Impossible? Or just, like, complicated and would require work? I will grant that it’s impossible to know where consumer compute (like iPhones) ends up, but datacenter-scale compute seems much more likely to be trackable. Remember that in this world, the Chinese government is selling you chips and actually doesn’t want you building AGI with them. If you immediately throw your hands up and say you are confident there is no logistical way to do that, I think you are miscalibrated.
    Botnets (a la Gwern):
    You will note that in the Gwern story, the AGI had to build its own botnet; the initial compute needed to “ascend and break loose” was explicitly sanctioned by the government, despite a history of accidents. What if those two governments could be convinced about the difficulty of AI alignment, and actually didn’t let anyone run random code connected to the internet?
    What if the AGI is trained on an existing botnet, a la Folding@Home, or some darknet equivalent run by a terrorist group/nation state? It’s possible; we should be thinking of monitoring techniques. The capabilities of botnets to undetectably leverage hypercompute are not infinite, and with real political will, I don’t know why it would be intractable to make it hard.
    We don’t trust the US/Chinese governments to be able to correctly assess alignment approaches, when the time comes. The incentives are too extreme in favor of deployment.
    This is a reasonable concern. But the worst version of this, where the governments just okay something dumb with clear counterarguments, is only possible if you believe there remains a lack of consensus around the even-minor possibility of a catastrophic alignment problem. No American or Chinese leader has, in their lifetimes, needed to make a direct decision that had even a 10% chance of killing ten million Americans. (COVID vaccine buildout is a decent response, but sins of omission and commission are different to most people.)
    Influencing the government is impossible.
    We’re really only talking about convincing 2 bureaucracies; we might fail, but “it’s impossible” is an unfounded assumption. The climate people did it, and that problem has way more powerful entrenched opponents. (They didn’t get everything they want yet, but they’re asking for a lot more than we would be, and it’s hard to argue the people in power don’t think climate science is real.)
    As of today in the US, “don’t build AGI until you’re sure it won’t turn on you and kill everyone” has literally no political opponents, other than optimistic techno-futurists, and lots of supporters for obvious and less-obvious (labor displacement) reasons. I struggle to see where the opposition would come from in 10 years, either, especially considering that this would be regulation of a product that didn’t exist yet and thus had no direct beneficiaries.
    While Chinese domestic sentiments may be less anti-AI, the CCP doesn’t actually make decisions based on what its people think. It is an engineering-heavy elite dictatorship; if you convince enough within-China AI experts, there is plenty of reason to believe you could convince the CCP.
    This isn’t a stable equilibrium; something would go wrong and someone would push the button eventually.
    That’s probably true! If I had to guess, I think it could probably last for a decade, and probably not for two. That’s why it matters a lot whether the alignment problem is “too hard to make progress on in 2 decades with massive investment” or just “really hard and we’re not on track to solve it.”
    You may also note that the only data point we have about “Will a politician push a button that with some probability ends the world, and the rest of the time their country becomes a hegemon?” is the Cuban Missile Crisis. There was no mature second strike capability; if Kennedy had pushed the button, he wasn’t sure the other side could have retaliated. Do I want to replay the 1950s-60s nuclear standoff? No thank you. Would I trade that for racing to build an unaligned superintelligence first and then YOLOing? Yes please.
    You will note that every point I’ve made here has a major preceding causal variable: enough people taking the hardness of the alignment problem seriously that we can do big-kid interventions. I really empathize with the people who feel burnt out about this. You have literally been doing your best to save the world, and nobody cares, so it feels intuitively likely that nobody will care. But there are several reasons I think this is pessimism, rather than calibrated realism:
    The actual number of people you need to convince is fairly small. Why? Because this is a technology-specific question and the only people who will make the relevant decisions are technical experts or the politicians/leaders/bureaucrats they work with, who in practice will defer to them when it comes to something like “the hardness of alignment”.
    The fear of “politicians will select those experts which recite convenient facts” is legitimate. However, this isn’t at all inevitable; arguably the reason this both-sidesing happened so much within climate science is that the opponents’ visibility was heavily funded by entrenched interests—which, again, don’t really exist for AGI.
    Given that an overwhelming majority of people dismiss the alignment problem primarily on the basis that their timelines are really long, every capability breakthrough makes shorter timelines seem more likely (and also makes the social cost of shorter timelines smaller, as everyone else updates on the same information). You can already see this to some extent with GPT-3 converting people; I for one had very long timelines before then. So strategies that didn’t work 10 years ago are meaningfully more likely to work now, and that will become even more true.
    Social influence is not a hard technical problem! It is hard, but there are entire industries of professionals who are actually paid to convince people of stuff. AI alignment is not funding constrained; all we’d need is money!
    On the topic of turning money into social influence, people really fail to appreciate how much money there is out there for AI alignment, especially if you could convince technical AI researchers. Guess who really doesn’t like an AI apocalypse? Every billionaire with a family office who doesn’t like giving to philanthropy! Misaligned AI is one of the only things that could meaningfully hurt the expected value of billionaires’ children; if scientists start telling billionaires this is real, it is very likely you can unlock orders of magnitude more money than the ~$5B that FTX + OpenPhil seem on track to spend. On that note, money can be turned into social influence in lots of ways. Give the world’s thousand most respected AI researchers $1M each to spend 3 months working on AI alignment, with an extra $100M if by the end they can propose a solution alignment researchers can’t shoot down. I promise you that other than like 20 industry researchers who are paid silly amounts, every one of them would take the million. They probably won’t make any progress, but from then on when others ask them whether they think alignment is a real unsolved problem, they will be way more likely to say yes. That only costs you a billion dollars! I literally think I could get someone reading this the money to do this (at least at an initially moderate scale) - all it needs is a competent person to step up.
    The other point that all of my arguments depend on, is that we have, say, at least until 2035. If not, a lot of these ideas become much less likely to work, and I start thinking much more that “maybe it really will just be DeepMind YOLOing ASI” and dealing with attendant strategies. So again, if Eliezer has private information that makes him really confident relative to everyone else, that >50% of the probability mass is on sooner than 2030, it sure would be great if I knew how seriously to take that, and whether he thinks a calibrated actor would abandon the other strategies and focus on Hail Marys.
    What links here?
    Convincing All Capability Researchers by Logan Riggs (8 Apr 2022 17:40 UTC; 120 points)
    - Yitz 3 Apr 2022 4:39 UTC
      20 points
      Parent
      This is the best counter-response I’ve read on the thread so far, and I’m really interested what responses will be. Commenting here so I can easily get back to this comment in the future.
      - Adam Zerner 3 Apr 2022 22:05 UTC
        19 points
        Parent
        
        Commenting here so I can easily get back to this comment in the future.
        
        FWIW if you click the three vertical dots at the top right of a comment, it opens a dropdown where you can “Subscribe to comment replies”.
    - Ben Pace 3 Apr 2022 5:03 UTC
      3 points
      Parent
      If it makes you guys feel better, you can rationalize the following as my expression of the Bargaining stage of grief.
      This is an interesting name for the cognition I’ve seen a lot of people do.
    - johnlawrenceaspden 8 Apr 2022 12:50 UTC
      2 points
      Parent
      This is a great comment, maybe make it into a top-level post?
      - Not Relevant 8 Apr 2022 15:46 UTC
        5 points
        Parent
        I’m not sure how to do that?
        Also, unfortunately, since posting this comment, the last week’s worth of evidence does make me think that 5-15 year timelines are the most plausible, and so I am much more focused on those.
        Specifically, I think it’s time to pull the fire alarm and do mass within-elite advocacy.
        johnlawrenceaspden 8 Apr 2022 19:08 UTC
        1 point
        Parent
        Cut and paste? But yes, it’s panic or death. And probably death anyway. Nice to get a bit of panic in first if we can though! Good luck with stirring it.
        Dan Valentine 8 Apr 2022 16:06 UTC
        1 point
        Parent
        I found your earlier comment in this thread insightful and I think it would be really valuable to know what evidence convinced you of these timelines. If you don’t have time to summarize in a post, is there anything you could link to?
        Not Relevant 8 Apr 2022 16:15 UTC
        2 points
        Parent
        Yep, just posted it: https://www.lesswrong.com/posts/wrkEnGrTTrM2mnmGa/it-s-time-for-ea-leadership-to-pull-the-fast-takeoff-fire
        Not Relevant 10 Apr 2022 3:43 UTC
        1 point
        Parent
        Note also that I would still endorse these actions (since they’re still necessary even with shorter timelines) but they need to be done much faster and so we need to be much more aggressive.