Oleg Trott comments on Alignment: “Do what I would have wanted you to do”

Oleg Trott 13 Jul 2024 2:13 UTC
3 points
0
Quoting from the CEV link:
The main problems with CEV include, firstly, the great difficulty of implementing such a program—“If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004.
Neither problem seems relevant to what I’m proposing. My implementation is just a prompt. And there is no explicit optimization (after the LM has been trained).
Has anyone proposed exactly what I’m proposing? (slightly different wording is OK, of course)
- Seth Herd 13 Jul 2024 11:29 UTC
  3 points
  0
  Parent
  I don’t think anyone has proposed this. I think the most similar proposal is my instructions-following AGI (particularly since I’m also mostly thinking of just such a text prompt in a language model agent as the implementation).
  
  My proposal with its checking emphasis is aimed more at the range where the AGI is human level and above, where yours seems more aimed at the truly super intelligent range. Mine keeps the human in charge of figuring out what they would’ve wanted in case the AGI gets that wrong.
  
  Other related work is linked in that post.
  
  The above objections to CEV partly apply to your proposal. There is probably not just one thing X would’ve wanted with more consideration, since conclusions may depend on circumstances.
  
  I’m not sure that breaks the proposal; it could be that any of the several things X might’ve wanted would serve adequately.
  What links here?
  - Oleg Trott's comment on Recursion in AI is scary. But let’s talk solutions. by Oleg Trott (16 Jul 2024 21:43 UTC; 1 point)