Rob Bensinger comments on AGI Ruin: A List of Lethalities

Rob Bensinger 6 Jun 2022 6:05 UTC
LW: 37 AF: 4
15
AF
Yes, please do rewrite the post, or make your own version of a post like this!! :) I don’t suggest trying to persuade arbitrary policymakers of AGI risk, but I’d be very keen on posts like this optimized to be clear and informative to different audiences. Especially groups like ‘lucid ML researchers who might go into alignment research’, ‘lucid mathematicians, physicists, etc. who might go into alignment research’, etc.
- Thane Ruthenis 6 Jun 2022 10:31 UTC
  LW: 38 AF: 4
  8
  AF Parent
  Suggestion: make it a CYOA-style interactive piece, where the reader is tasked with aligning AI, and could choose from a variety of approaches which branch out into sub-approaches and so on. All of the paths, of course, bottom out in everyone dying, with detailed explanations of why. This project might then evolve based on feedback, adding new branches that counter counter-arguments made by people who played it and weren’t convinced. Might also make several “modes”, targeted at ML specialists, general public, etc., where the text makes different tradeoffs regarding technicality vs. vividness.
  I’d do it myself (I’d had the idea of doing it before this post came out, and my preliminary notes covered much of the same ground, I feel the need to smugly say), but I’m not at all convinced that this is going to be particularly useful. Attempts to defeat the opposition by building up a massive evolving database of counter-arguments have been made in other fields, and so far as I know, they never convinced anybody.
  The interactive factor would be novel (as far as I know), but I’m still skeptical.
  (A… different implementation might be to use a fine-tuned language model for this; make it an AI Dungeon kind of setup, where it provides specialized counter-arguments for any suggestion. But I expect it to be less effective than a more coarse hand-written CYOA, since the readers/players would know that the thing they’re talking to has no idea what it’s talking about, so would disregard its words.)
  - Eliezer Yudkowsky 6 Jun 2022 19:37 UTC
    LW: 14 AF: 9
    4
    AF Parent
    Arbital was meant to support galaxy-brained attempts like this; Arbital failed.
    What links here?
    trevor's comment on Some reflections on the LW community after several months of active engagement by M. Y. Zuo (25 Jun 2022 17:30 UTC; 5 points)
    - Thane Ruthenis 6 Jun 2022 20:35 UTC
      6 points
      7
      Parent
      Failed as a platform for hosting galaxy-brained attempts, or failed as in every similar galaxy-brained attempt on it failed? I haven’t spent a lot of time there, but my impression is that Arbital is mostly a wiki-style collection of linked articles, not a dumping ground of standalone esoterically-structured argumentative pieces. And while a wiki is conceptually similar, presentation matters a lot. A focused easily-traversable tree of short-form arguments in a wrapper that encourages putting yourself in the shoes of someone trying to fix the problem may prove more compelling.
      (Not to make it sound like I’m particularly attached to the idea after all. But there’s a difference between “brilliant idea that probably won’t work” and “brilliant idea that empirically failed”.)
      - Rob Bensinger 6 Jun 2022 21:36 UTC
        11 points
        6
        Parent
        Arbital was a very conjunctive project, trying to do many different things, with a specific team, at a specific place and time. I wouldn’t write off all Arbital-like projects based on that one data point, though I update a lot more if there are lots of other Arbital-ish things that also failed.
        ESRogs 10 Jun 2022 5:13 UTC
        5 points
        0
        Parent
        As a person who worked on Arbitral, I agree with this.
  - CronoDAS 7 Jun 2022 2:55 UTC
    4 points
    0
    Parent
    
    All of the paths, of course, bottom out in everyone dying, with detailed explanations of why.
    
    A strange game. The only winning move is not to play. ;)
    - Thane Ruthenis 7 Jun 2022 4:26 UTC
      4 points
      0
      Parent
      I guess we should also kidnap people and force them to play it, and if they don’t succeed we kill them? For realism? Wait, there’s something wrong with this plan.
      More seriously, yeah, if you’re implementing it more like a game and less like an interactive article, it’d need to contain some promise of winning. Haven’t considered how to do it without compromising the core message.
      - AdamB 15 Jun 2022 13:23 UTC
        5 points
        0
        Parent
        What if “winning” consists of finding a new path not already explored-and-foreclosed? For example, each time you are faced with a list of choices of what to do, there’s a final choice “I have an idea not listed here” where you get to submit a plan of action. This goes into a moderation engine where a chain of people get to shoot down the idea or approve it to pass up the chain. If the idea gets convincingly shot down (but still deemed interesting), it gets added to the story as a new branch. If it gets to the top of the moderation chain and makes EY go “Hm, that might work” then you win the game.
        Thane Ruthenis 15 Jun 2022 23:00 UTC
        4 points
        1
        Parent
        Mmm. If the CYOA idea is implemented as a quirky-but-primarily-educational article, then sure, integrating the “adapt to feedback” capability like this would be worthwhile. Might also attach a monetary prize to submitting valuable ideas, by analogy to the ELK contest.
        For a game-like implementation, where you’d be playing it partly for the fun/challenge of it, that wouldn’t suffice. The feedback loop’s too slow, and there’d be an ugh-field around the expectation that submitting a proposal would then require arguing with the moderators about it, defending it. It wouldn’t feel like a game.
        It’d make the upkeep cost pretty high, too, without a corresponding increase in the pay-off.
        Just making it open-ended might work, even without the moderation engine? Track how many branches the player explored, once they’ve explored a lot (i. e., are expected to “get” the full scope of the problem), there appears an option for something like “I really don’t know what to do, but we should keep trying”, leading to some appropriately-subtle and well-integrated call to support alignment research?
        Not excited about this approach either.
- Celenduin 7 Jun 2022 15:49 UTC
  3 points
  0
  AF Parent
  I wonder if we could be much more effective in outreach to these groups?
  
  Like making sure that Robert Miles is sufficiently funded to have a professional team +20% (if that is not already the case). Maybe reaching out to Sabine Hossenfelder and sponsoring a video, or maybe collaborate with her for a video about this. Though I guess given her attitude towards the physics community, the work with her might be a gamble and two-edged sword. Can we get market research on what influencers have a high number of followers of ML researches/physicists/mathematicians and then work with them / sponsor them?
  
  Or maybe micro-target this demographic with facebook/google/github/stackexchange ads and point them to something?
  
  I don’t know, I’m not a marketing person, but I feel like I would have seen much more of these things if we were doing enough of them.
  
  Not saying that this should be MIRI’s job, rather stating that I’m confused because I feel like we as a community are not taking an action that would seem obvious to me. Especially given how recent advances in published AI capabilities seem to make the problem even much legible. Is the reason for not doing it really just that we’re all a bunch of nerds who are bad at this kind of thing, or is there more to it that I’m missing?
  
  While I see that there is a lot of risk associated with such outreach increasing the amount of noise, I wonder if that tradeoff might be shifting the shorter the timelines are getting and given that we don’t seem to have better plans than “having a diverse set of smart people come up with novel ideas of their own in the hope that one of those works out”. So taking steps to entice a somewhat more diverse group of people into the conversation might be worth it?
  - Vaniver 7 Jun 2022 16:36 UTC
    LW: 18 AF: 9
    13
    AF Parent
    Not saying that this should be MIRI’s job, rather stating that I’m confused because I feel like we as a community are not taking an action that would seem obvious to me.
    I wrote about this a bit before, but in the current world my impression is that actually we’re pretty capacity-limited, and so the threshold is not “would be good to do” but “is better than my current top undone item”. If you see something that seems good to do that doesn’t have much in the way of unilateralist risk, you doing it is probably the right call. [How else is the field going to get more capacity?]
    - Rob Bensinger 8 Jun 2022 2:40 UTC
      LW: 4 AF: 2
      2
      AF Parent
      +1
    - Celenduin 8 Jun 2022 16:23 UTC
      1 point
      0
      Parent
      🤔
      
      Not sure if I’m the right person, but it seems worth thinking about how one would maybe approach this if one were to do it.
      
      So the idea is to have an AI-Alignment PR/Social Media org/group/NGO/think tank/company that has the goal to contribute to a world with a more diverse set of high-quality ideas about how to safely align powerful AI. The only other organization roughly in this space that I can think of would be 80,000 hours, which is also somewhat more general in its goals and more conservative in its strategies.
      
      I’m not a sales/marketing person, but as I understand it, the usual metaphor to use here is a funnel?
      
      Starting with maybe ads / sponsoring trying to reach the right people[0] (e.g. I saw Jane Street sponsor Matt Parker)
      then more and more narrowing down first with introducing people to why this is an issue (orthogonality, instrumental convergence)
      hopefully having them realize for themselves, guided by arguments, that this is an issue that genuinely needs solving and maybe their skills would be useful
      increasing the math as needed
      finally, somehow selecting for self-reliance and providing a path for how to get started with thinking about this problem by themselves / model building / independent research
      or otherwise improving the overall situation (convince your congress member of something? run for congress? …)
      
      Probably that would include copy writing (or hiring copywriters or contracting them) to go over a number of our documents to make them more digestible and actionable.
      
      So, I’m probably not the right person to get this off the ground, because I don’t have a clue about any of this (not even entrepreneurship in general), but it does seem like a thing worth doing and maybe like an initiative that would get funding from whoever funds such things these days?
      
      [0] Though, maybe we should also look into a better understanding about who “the right people” are? Given that our current bunch of ML researchers/physicists/mathematicians were not able to solve it, maybe it would be time to consider broadening our net in a somehow responsible way.
      - Celenduin 8 Jun 2022 16:28 UTC
        6 points
        4
        Parent
        On second thought: Don’t we have orgs that work on AI governance/policy? I would expect them to have more likely the skills/expertise to pull this off, right?
        Vaniver 8 Jun 2022 18:32 UTC
        14 points
        5
        Parent
        So, here’s a thing that I don’t think exists yet (or, at least, it doesn’t exist enough that I know about it to link it to you). Who’s out there, what ‘areas of responsibility’ do they think they have, what ‘areas of responsibility’ do they not want to have, what are the holes in the overall space? It probably is the case that there are lots of orgs that work on AI governance/policy, and each of them probably is trying to consider a narrow corner of space, instead of trying to hold ‘all of it’.
        So if someone says “I have an idea how we should regulate medical AI stuff—oh, CSET already exists, I should leave it to them”, CSET’s response will probably be “what? We focus solely on national security implications of AI stuff, medical regulation is not on our radar, let alone a place we don’t want competition.”
        I should maybe note here there’s a common thing I see in EA spaces that only sometimes make sense, and so I want to point at it so that people can deliberately decide whether or not to do it. In selfish, profit-driven worlds, competition is the obvious thing to do; when someone else has discovered that you can make profits by selling lemonade, you should maybe also try to sell lemonade to get some of those profits, instead of saying “ah, they have lemonade handled.” In altruistic, overall-success-driven worlds, competition is the obvious thing to avoid; there are so many undone tasks that you should try to find a task that no one is working on, and then work on that.
        One downside is this means the eventual allocation of institutions / people to roles is hugely driven by inertia and ‘who showed up when that was the top item in the queue’ instead of ‘who is the best fit now’. [This can be sensible if everyone ‘came in as a generalist’ and had to skill up from scratch, but still seems sort of questionable; even if people are generalists when it comes to skills, they’re probably not generalists when it comes to personality.]
        Another downside is that probably it makes more sense to have a second firm attempting to solve the biggest problem before you get a first firm attempting to solve the twelfth biggest problem. Having a sense of the various values of the different approaches—and how much they depend on each other, or on things that don’t exist yet—might be useful.
      - Vaniver 8 Jun 2022 18:04 UTC
        3 points
        1
        Parent
        Not sure if I’m the right person
        ...yet!