Steven Byrnes comments on Instruction-following AGI is easier and more likely than value aligned AGI

Steven Byrnes 17 May 2024 21:02 UTC
4 points
0
I’m not so sure it’s the same—my interpretation was something like:
- Yudkowsky plan: Make an AI that designs a certain kind of nanobot
- Seth plan: Make an AI that does what I tell it to do, and then I will tell it to design a certain kind of nanobot
For example, in this comment, @Rob Bensinger was brainstorming nanobot-specific things that one might put into the source code. (Warning that Rob is not Eliezer.) (Related.)
- Seth Herd 17 May 2024 22:36 UTC
  4 points
  0
  Parent
  I’m sure it’s not the same, particularly since neither one has really been fully fleshed out and thought through. In particular, Yudkowsky doesn’t focus on the advantages of instructing the AGI to tell you the truth, and interacting with it as it gets smarter. I’d guess that’s because he was still anticipating a faster takeoff than network-based AGI affords.
  But to give credit where it’s due, I think that literal instruction-following was probably part of (but not the whole of) his conception of task-based AGI. From the discussion thread w Paul Christiano following the task directed AGI article on Greater Wrong:
  The AI is getting short-term objectives from humans and carrying them out under some general imperative to do things conservatively or with ‘low unnecessary impact’ in some sense of that, and describes plans and probable consequences that are subject to further human checking, and then does them, and then the humans observe the results and file more requests.
  And the first line of that article:
  A task-based AGI is an AGI intended to follow a series of human-originated orders, with these orders each being of limited scope [...]
  These sections, in connection with the lack of reference to instructions and checking for most of the presentation, suggest to me that he probably was thinking of things like hard-coding it to design nanotech, melt down GPUs (or whatever) and then delete itself, but also of more online, continuous instruction-following AGI more similar to my conception of likely AGI projects. Bensinger may have been pursuing one part of that broader conception.