Thane Ruthenis comments on AGI Ruin: A List of Lethalities

Thane Ruthenis 6 Jun 2022 10:31 UTC
LW: 38 AF: 4
8
AF
Suggestion: make it a CYOA-style interactive piece, where the reader is tasked with aligning AI, and could choose from a variety of approaches which branch out into sub-approaches and so on. All of the paths, of course, bottom out in everyone dying, with detailed explanations of why. This project might then evolve based on feedback, adding new branches that counter counter-arguments made by people who played it and weren’t convinced. Might also make several “modes”, targeted at ML specialists, general public, etc., where the text makes different tradeoffs regarding technicality vs. vividness.
I’d do it myself (I’d had the idea of doing it before this post came out, and my preliminary notes covered much of the same ground, I feel the need to smugly say), but I’m not at all convinced that this is going to be particularly useful. Attempts to defeat the opposition by building up a massive evolving database of counter-arguments have been made in other fields, and so far as I know, they never convinced anybody.
The interactive factor would be novel (as far as I know), but I’m still skeptical.
(A… different implementation might be to use a fine-tuned language model for this; make it an AI Dungeon kind of setup, where it provides specialized counter-arguments for any suggestion. But I expect it to be less effective than a more coarse hand-written CYOA, since the readers/players would know that the thing they’re talking to has no idea what it’s talking about, so would disregard its words.)
- Eliezer Yudkowsky 6 Jun 2022 19:37 UTC
  LW: 14 AF: 9
  4
  AF Parent
  Arbital was meant to support galaxy-brained attempts like this; Arbital failed.
  What links here?
  - trevor's comment on Some reflections on the LW community after several months of active engagement by M. Y. Zuo (25 Jun 2022 17:30 UTC; 5 points)
  - Thane Ruthenis 6 Jun 2022 20:35 UTC
    6 points
    7
    Parent
    Failed as a platform for hosting galaxy-brained attempts, or failed as in every similar galaxy-brained attempt on it failed? I haven’t spent a lot of time there, but my impression is that Arbital is mostly a wiki-style collection of linked articles, not a dumping ground of standalone esoterically-structured argumentative pieces. And while a wiki is conceptually similar, presentation matters a lot. A focused easily-traversable tree of short-form arguments in a wrapper that encourages putting yourself in the shoes of someone trying to fix the problem may prove more compelling.
    (Not to make it sound like I’m particularly attached to the idea after all. But there’s a difference between “brilliant idea that probably won’t work” and “brilliant idea that empirically failed”.)
    - Rob Bensinger 6 Jun 2022 21:36 UTC
      11 points
      6
      Parent
      Arbital was a very conjunctive project, trying to do many different things, with a specific team, at a specific place and time. I wouldn’t write off all Arbital-like projects based on that one data point, though I update a lot more if there are lots of other Arbital-ish things that also failed.
      - ESRogs 10 Jun 2022 5:13 UTC
        5 points
        0
        Parent
        As a person who worked on Arbitral, I agree with this.
- CronoDAS 7 Jun 2022 2:55 UTC
  4 points
  0
  Parent
  
  All of the paths, of course, bottom out in everyone dying, with detailed explanations of why.
  
  A strange game. The only winning move is not to play. ;)
  - Thane Ruthenis 7 Jun 2022 4:26 UTC
    4 points
    0
    Parent
    I guess we should also kidnap people and force them to play it, and if they don’t succeed we kill them? For realism? Wait, there’s something wrong with this plan.
    More seriously, yeah, if you’re implementing it more like a game and less like an interactive article, it’d need to contain some promise of winning. Haven’t considered how to do it without compromising the core message.
    - AdamB 15 Jun 2022 13:23 UTC
      5 points
      0
      Parent
      What if “winning” consists of finding a new path not already explored-and-foreclosed? For example, each time you are faced with a list of choices of what to do, there’s a final choice “I have an idea not listed here” where you get to submit a plan of action. This goes into a moderation engine where a chain of people get to shoot down the idea or approve it to pass up the chain. If the idea gets convincingly shot down (but still deemed interesting), it gets added to the story as a new branch. If it gets to the top of the moderation chain and makes EY go “Hm, that might work” then you win the game.
      - Thane Ruthenis 15 Jun 2022 23:00 UTC
        4 points
        1
        Parent
        Mmm. If the CYOA idea is implemented as a quirky-but-primarily-educational article, then sure, integrating the “adapt to feedback” capability like this would be worthwhile. Might also attach a monetary prize to submitting valuable ideas, by analogy to the ELK contest.
        For a game-like implementation, where you’d be playing it partly for the fun/challenge of it, that wouldn’t suffice. The feedback loop’s too slow, and there’d be an ugh-field around the expectation that submitting a proposal would then require arguing with the moderators about it, defending it. It wouldn’t feel like a game.
        It’d make the upkeep cost pretty high, too, without a corresponding increase in the pay-off.
        Just making it open-ended might work, even without the moderation engine? Track how many branches the player explored, once they’ve explored a lot (i. e., are expected to “get” the full scope of the problem), there appears an option for something like “I really don’t know what to do, but we should keep trying”, leading to some appropriately-subtle and well-integrated call to support alignment research?
        Not excited about this approach either.