Eliezer Yudkowsky comments on MIRI announces new “Death With Dignity” strategy

Eliezer Yudkowsky 2 Apr 2022 4:58 UTC
46 points
I don’t think we’ll have time, but I’d favor getting started anyways. Seems a bit more dignified.
What links here?
- How can a layman contribute to AI Alignment efforts, given shorter timeline/doomier scenarios? by AprilSR (2 Apr 2022 4:34 UTC; 12 points)
- rank-biserial 2 Apr 2022 5:15 UTC
  9 points
  Parent
  Great! If I recall correctly, you wanted genetically optimized kids to be gestated and trained.
  
  I suspect that akrasia is a much bigger problem than most people think, and to be truly effective, one must outsource part of their reward function. There could be massive gains.
  
  What do you think about the setup I outlined, where a pair of reseachers exist such that one controls an electrode embedded in the other’s reward center? Think Focus from Vinge’s A Deepness In The Sky.
  - jayterwahl 2 Apr 2022 6:23 UTC
    8 points
    Parent
    (I predict that would help with AI safety, in that it would swiftly provide useful examples of reward hacking and misaligned incentives)
  - Marvin 4 Apr 2022 9:47 UTC
    6 points
    Parent
    I think memetically ‘optimized’ kids (and adults?) might be an interesting alternative to explore. That is, more scalable and better education for the ‘consequentialists’ (I have no clue how to teach people that are not ‘consequentialist’, hopefully someone else can teach those) may get human thought-enhancement results earlier, and available to more people. There has been some work in this space and some successes, but I think that in general, the “memetics experts” and the “education experts” haven’t been cooperating properly as much as they should. I think it would seem dignified to me to try bridging this gap. If this is indeed dignified, then that would be good, because I’m currently in the early stages of a project trying to bridge this gap.
  - Logan Riggs 5 Apr 2022 23:52 UTC
    1 point
    Parent
    The better version then reward hacking I can think of is inducing a state of jhana (basically a pleasure button) in alignment researchers. For example, use neuro-link to get the brain-process of ~1000 people going through the jhanas at multiple time-steps, average them in a meaningful way, induce those brainwaves in other people.
    The effect is people being satiated with the feeling of happiness (like being satiated with food/water), and are more effective as a result.
    - rank-biserial 6 Apr 2022 0:09 UTC
      0 points
      Parent
      The “electrode in the reward center” setup has been proven to work in humans, whereas jhanas may not tranfer over Neuralink.
      Deep brain stimulation is FDA-approved in humans, meaning less (though nonzero) regulatory fuckery will be required.
      Happiness is not pleasure; wanting is not liking. We are after reinforcement.
      - Logan Riggs 7 Apr 2022 18:40 UTC
        2 points
        Parent
        Could you link the proven part?
        Jhana’s seem much healthier, though I’m pretty confused imagining your setup so I don’t have much confidence. Say it works and gets past the problems of generalizing reward (eg the brain only rewards for specific parts of research and not others) and ignoring downward spiral effects of people hacking themselves, then we hopefully have people who look forward to doing certain parts of research.
        If you model humans as multi-agents, it’s making a certain type of agent (the “do research” one) have a stronger say in what actions get done. This is not as robust as getting all the agents to agree and not fight each other. I believe jhana gets part of that done because some sub-agents are pursuing the feeling of happiness and you can get that any time.
        rank-biserial 7 Apr 2022 19:41 UTC
        1 point
        Parent
        https://en.wikipedia.org/wiki/Brain_stimulation_reward
        
        https://doi.org/10.1126/science.140.3565.394
        
        https://sci-hub.hkvisa.net/10.1126/science.140.3565.394
        
        In our earliest work with a single lever it was noted that while the subject would lever-press at a steady rate for stimulation to various brain sites, the current could be turned off entirely and he would continue lever-pressing at the same rate (for as many as 2000 responses) until told to stop.
        
        It is of interest that the introduction of an attractive tray of food produced no break in responding, although the subject had been without food for 7 hours, was noted to glance repeatedly at the tray, and later indicated that he knew he could have stopped to eat if he wished. Even under these conditions he continued to respond without change in rate after the current was turned off, until finally instructed to stop, at which point he ate heartily.