ESRogs comments on Solving Math Problems by Relay

ESRogs 18 Jul 2020 6:54 UTC
2 points
I think humans are unaligned
Unaligned with each other? Or… would you not consider you to be aligned with yourself?

(Btw, see my edit at the bottom of my comment above if you hadn’t notice it.)
- habryka 18 Jul 2020 18:39 UTC
  6 points
  Parent
  I think also unaligned with yourself?
  Like, most humans when given massive power over the universe would probably accidentally destroy themselves, and possibly all of humanity with it (Eliezer talks a bit about this in HPMOR in sections I don’t want to reference because of spoilers, and Wei Dai has talked about this a bit in a bunch of comments related to “the human alignment problem”). I think that maybe I could avoid doing that, but only because I am really mindful of the risk, and I don’t think me from 5 years ago would have been safe to drastically scale up, even with respect to just my own values.
- Raemon 18 Jul 2020 18:56 UTC
  4 points
  Parent
  I share habryka’s concerns re: “unaligned with yourself”, but, I think I was missing (or had forgotten) that part of the idea here was you’re using.… an uploaded clone of yourself, so you’re at least more likely to be aligned with yourself even if when scaled up you’re not aligned with anyone else.
  - ESRogs 18 Jul 2020 19:11 UTC
    2 points
    Parent
    an uploaded clone of yourself
    Not sure if you were just being poetic, but FWIW I believe the idea (in HCH for example), is to use an ML system trained to produce the same answers that a human would produce, which is not strictly-speaking an upload (unless the only way to imitate is actually to simulate in detail, s.t. the ML system ends up growing an upload inside it?).
    - Raemon 18 Jul 2020 19:19 UTC
      4 points
      Parent
      Is it “a human” or “you specifically?”
      If it’s “a human”, I’m back to “humans are unfriendly by default” territory.
      [Edit: But, I had in fact also not been tracking that it’s not a strict upload, it’s a trained on human actions. I think I recall reading that earlier but had forgotten. I did leave the ”...” in my summary because I wasn’t quite sure if upload was the right word though. That all said, being merely trained on human actions, whether mine or someone else’s, I think makes it even more likely to be unfriendly than an upload]
      - Pongo 19 Jul 2020 1:25 UTC
        1 point
        Parent
        To get sufficient training data, it must surely be “a human” (in generic, smushed together, ‘modelling an ensemble of humans’ sense)