jbash comments on MIRI announces new “Death With Dignity” strategy

jbash 2 Apr 2022 2:13 UTC
1 point
Coordination on what, exactly?
- rank-biserial 2 Apr 2022 2:23 UTC
  4 points
  Parent
  - Coordination (cartelization) so that AI capabilities are not a race to the bottom
  - Coordination to indefinitely halt semiconductor supply chains
  - Coordination to shun and sanction those who research AI capabilities (compare: coordination against embyronic human gene editing)
  - Coordination to deliberately turn Moore’s Law back a few years (yes, I’m serious)
  - Eliezer Yudkowsky 2 Apr 2022 2:29 UTC
    9 points
    Parent
    And do you think if you try that, you’ll succeed, and that the world will then be saved?
    - rank-biserial 2 Apr 2022 2:32 UTC
      6 points
      Parent
      These are all strategies to buy time, so that alignment efforts may have more exposure to miracle-risk.
      - Eliezer Yudkowsky 2 Apr 2022 2:47 UTC
        6 points
        Parent
        And what do you think are the chances that those strategies work, or that the world lives after you hypothetically buy three or six more years that way?
        rank-biserial 2 Apr 2022 2:59 UTC
        5 points
        Parent
        I’m not well calibrated on sub 1% probabilities. Yeah, the odds are low.
        
        There are other classes of Hail Mary. Picture a pair of reseachers, one of whom controls an electrode wired to the pleasure centers of the other. Imagine they have free access to methamphetamine and LSD. I don’t think research output is anywhere near where it could be.
        Eliezer Yudkowsky 2 Apr 2022 3:15 UTC
        43 points
        Parent
        So—just to be very clear here—the plan is that you do the bad thing, and then almost certainly everybody dies anyways even if that works?
        I think at that level you want to exhale, step back, and not injure the reputations of the people who are gathering resources, finding out what they can, and watching closely for the first signs of a positive miracle. The surviving worlds aren’t the ones with unethical plans that seem like they couldn’t possibly work even on the open face of things; the real surviving worlds are only injured by people who imagine that throwing away their ethics surely means they must be buying something positive.
        rank-biserial 2 Apr 2022 3:22 UTC
        27 points
        Parent
        Fine. What do you think about the human-augmentation cluster of strategies? I recall you thought along very similar lines circa ~2001.
        Eliezer Yudkowsky 2 Apr 2022 4:58 UTC
        46 points
        Parent
        I don’t think we’ll have time, but I’d favor getting started anyways. Seems a bit more dignified.
        What links here?
        How can a layman contribute to AI Alignment efforts, given shorter timeline/doomier scenarios? by AprilSR (2 Apr 2022 4:34 UTC; 12 points)
        rank-biserial 2 Apr 2022 5:15 UTC
        9 points
        Parent
        Great! If I recall correctly, you wanted genetically optimized kids to be gestated and trained.
        
        I suspect that akrasia is a much bigger problem than most people think, and to be truly effective, one must outsource part of their reward function. There could be massive gains.
        
        What do you think about the setup I outlined, where a pair of reseachers exist such that one controls an electrode embedded in the other’s reward center? Think Focus from Vinge’s A Deepness In The Sky.
        jayterwahl 2 Apr 2022 6:23 UTC
        8 points
        Parent
        (I predict that would help with AI safety, in that it would swiftly provide useful examples of reward hacking and misaligned incentives)
        Marvin 4 Apr 2022 9:47 UTC
        6 points
        Parent
        I think memetically ‘optimized’ kids (and adults?) might be an interesting alternative to explore. That is, more scalable and better education for the ‘consequentialists’ (I have no clue how to teach people that are not ‘consequentialist’, hopefully someone else can teach those) may get human thought-enhancement results earlier, and available to more people. There has been some work in this space and some successes, but I think that in general, the “memetics experts” and the “education experts” haven’t been cooperating properly as much as they should. I think it would seem dignified to me to try bridging this gap. If this is indeed dignified, then that would be good, because I’m currently in the early stages of a project trying to bridge this gap.
        Logan Riggs 5 Apr 2022 23:52 UTC
        1 point
        Parent
        The better version then reward hacking I can think of is inducing a state of jhana (basically a pleasure button) in alignment researchers. For example, use neuro-link to get the brain-process of ~1000 people going through the jhanas at multiple time-steps, average them in a meaningful way, induce those brainwaves in other people.
        The effect is people being satiated with the feeling of happiness (like being satiated with food/water), and are more effective as a result.
        Expand this thread
        rank-biserial 6 Apr 2022 0:09 UTC
        0 points
        Parent
        The “electrode in the reward center” setup has been proven to work in humans, whereas jhanas may not tranfer over Neuralink.
        Deep brain stimulation is FDA-approved in humans, meaning less (though nonzero) regulatory fuckery will be required.
        Happiness is not pleasure; wanting is not liking. We are after reinforcement.
        Logan Riggs 7 Apr 2022 18:40 UTC
        2 points
        Parent
        Could you link the proven part?
        Jhana’s seem much healthier, though I’m pretty confused imagining your setup so I don’t have much confidence. Say it works and gets past the problems of generalizing reward (eg the brain only rewards for specific parts of research and not others) and ignoring downward spiral effects of people hacking themselves, then we hopefully have people who look forward to doing certain parts of research.
        If you model humans as multi-agents, it’s making a certain type of agent (the “do research” one) have a stronger say in what actions get done. This is not as robust as getting all the agents to agree and not fight each other. I believe jhana gets part of that done because some sub-agents are pursuing the feeling of happiness and you can get that any time.
        rank-biserial 7 Apr 2022 19:41 UTC
        1 point
        Parent
        https://en.wikipedia.org/wiki/Brain_stimulation_reward
        
        https://doi.org/10.1126/science.140.3565.394
        
        https://sci-hub.hkvisa.net/10.1126/science.140.3565.394
        
        In our earliest work with a single lever it was noted that while the subject would lever-press at a steady rate for stimulation to various brain sites, the current could be turned off entirely and he would continue lever-pressing at the same rate (for as many as 2000 responses) until told to stop.
        
        It is of interest that the introduction of an attractive tray of food produced no break in responding, although the subject had been without food for 7 hours, was noted to glance repeatedly at the tray, and later indicated that he knew he could have stopped to eat if he wished. Even under these conditions he continued to respond without change in rate after the current was turned off, until finally instructed to stop, at which point he ate heartily.
        
        jayterwahl 2 Apr 2022 6:26 UTC
        0 points
        Parent
        Is the average human life experientially negative, such that buying three more years of existence for the planet is ethically net-negative?
        Richard_Kennaway 2 Apr 2022 18:57 UTC
        20 points
        −5
        Parent
        People’s revealed choice in tenaciously staying alive and keeping others alive suggests otherwise. This everyday observation trumps all philosophical argument that fire does not burn, water is not wet, and bears do not shit in the woods.
        Oliver Sourbut 12 Apr 2022 23:50 UTC
        2 points
        Parent
        I’m not immediately convinced (I think you need another ingredient).
        
        Imagine a kind of orthogonality thesis but with experiential valence on one axis and ‘staying aliveness’ on the other. I think it goes through (one existence proof for the experientially-horrible-but-high-staying-aliveness quadrant might be the complex of torturer+torturee).
        
        Another ingredient you need to posit for this argument to go through is that, as humans are constituted, experiential valence is causally correlated with behaviour in a way such that negative experiential valence reliably causes not-staying-aliveness. I think we do probably have this ingredient, but it’s not entirely clear cut to me.
        Richard_Kennaway 13 Apr 2022 11:00 UTC
        1 point
        Parent
        Unlike jayterwahl, I don’t consider experiential valence, which I take to mean mental sensations of pleasure and pain in the immediate moment, as of great importance in itself. It may be a sign that I am doing well or badly at life, but like the score on a test, it is only a proxy for what matters. People also have promises to keep, and miles to go before they sleep.
        Vaniver 2 Apr 2022 17:07 UTC
        18 points
        Parent
        I think many of the things that you might want to do in order to slow down tech development are things that will dramatically worsen human experiences, or reduce the number of them. Making a trade like that in order to purchase the whole future seems like it’s worth considering; making a trade like that in order to purchase three more years seems much more obviously not worth it.
        Vaniver 2 Apr 2022 17:34 UTC
        8 points
        Parent
        I will note that I’m still a little confused about Butlerian Jihad style approaches (where you smash all the computers, or restrict them to the capability available in 1999 or w/e); if I remember correctly Eliezer has called that a ‘straightforward loss’, which seems correct from a ‘cosmic endowment’ perspective but not from a ‘counting up from ~10 remaining years’ perspective.
        My guess is that the main response is “look, if you can coordinate to smash all of the computers, you can probably coordinate on the less destructive-to-potential task of just not building AGI, and the difficulty is primarily in coordinating at all instead of the coordination target.”