Roman Leventov comments on Plan for mediocre alignment of brain-like [model-based RL] AGI

Roman Leventov 14 Mar 2023 11:28 UTC
LW: 1 AF: 1
0
AF

the alignment story for LLMs seems significantly more straightforward, even given all the shoggoth concerns

Could you please elaborate what do you mean by “alignment story for LLMs” and “shoggoth concerns” here? Do you mean the “we can use nearly value-neutral simulators as we please” story here, or refer to the fact that in a way LLMs are way more understandable to humans than more general RL agents because they use human language, or you refer to something yet different?
- Vladimir_Nesov 17 Mar 2023 19:13 UTC
  LW: 2 AF: 1
  0
  AF Parent
  See this thread (including my reply) for a bit on the shoggoth issue.