Eugene D comments on AGI Safety FAQ / all-dumb-questions-allowed thread

Eugene D 17 Jun 2022 12:21 UTC
6 points
0
Why do we suppose it is even logical that control / alignment of a superior entity would be possible?
(I’m told that “we’re not trying to outsmart AGI, bc, yes, by definition that would be impossible”, and I understand that we are the ones who “create it” (so I’m told, therefore, we have the upper-hand bc of this—somehow in building it that provides the key benefit we need for corrigibility…
What am I missing, in viewing a superior entity as something you can’t simply “use” ? Does it depend on the fact that the AGI is not meant to have a will like humans do, and therefore we wouldn’t be imposing upon it? But doesn’t that go out the window the moment we provide some goal for it to perform for us?
thanks much!
- Aleksi Liimatainen 17 Jun 2022 12:38 UTC
  1 point
  0
  Parent
  One has the motivations one has, and one would be inclined to defend them if someone tried to rewire the motivations against one’s will. If one happened to have different motivations, then one would be inclined to defend those instead.
  
  The idea is that once a superintelligence gets going, its motivations will be out of our reach. Therefore, the only window of influence is before it gets going. If, at the point of no return, it happens to have the right kinds of motivations, we survive. If not, it’s game over.
  - Eugene D 17 Jun 2022 12:42 UTC
    1 point
    0
    Parent
    thank you. Make some sense...but does “rewriting its own code” (the very code we thought would perhaps permanently influence it before it got-going) nullify our efforts at hardcoding our intentions?
    - Kaj_Sotala 17 Jun 2022 13:42 UTC
      3 points
      0
      Parent
      I’m not a psychopath, and if I got the opportunity to rewrite my own source code to become a psychopath, I wouldn’t do it.
      
      At the same time, it’s the evolutionary and cultural programming in my source code that contains the desire not to become a psychopath.
      
      In other words, once the desire to not become a psychopath is there in my source code, I will do my best not to become one, even if I have the ability to modify my source code.
      - Eugene D 17 Jun 2022 14:33 UTC
        3 points
        2
        Parent
        That makes sense. My intention was not to argue from the position of it becoming a psychopath though (my apologies if it came out that way)...but instead from a perspective of an entity which starts-out as supposedly Aligned (centered-on human safety, let’s say), but then, bc it’s orders of magnitude smarter than we are (by definition), it quickly develops a different perspective. But you’re saying it will remain ‘aligned’ in some vitally-important way, even when it discovers ways the code could’ve been written differently?
    - Aleksi Liimatainen 17 Jun 2022 13:19 UTC
      1 point
      0
      Parent
      The AI would be expected to care about preserving its motivations under self-modification for similar reasons as it would care about defending them against outside intervention. There could be a window where the AI operates outside immediate human control but isn’t yet good at keeping its goals stable under self-modification. It’s been mentioned as a concern in the past; I don’t know what the state of current thinking is.

Eugene D comments on AGI Safety FAQ /​ all-dumb-questions-allowed thread

Eugene D comments on AGI Safety FAQ / all-dumb-questions-allowed thread