Leon Lang comments on Daniel Kokotajlo’s Shortform

Leon Lang 30 Sep 2024 19:10 UTC
6 points
0
Agreed.

To understand your usage of the term “outer alignment” a bit better: often, people have a decomposition in mind where solving outer alignment means technically specifying the reward signal/model or something similar. It seems that to you, the writeup of a model-spec or constitution also counts as outer alignment, which to me seems like only part of the problem. (Unless perhaps you mean that model specs and constitutions should be extended to include a whole training setup or similar?)

If it doesn’t seem too off-topic to you, could you comment on your views on this terminology?
- Daniel Kokotajlo 1 Oct 2024 22:17 UTC
  6 points
  0
  Parent
  Good points. I probably should have said “the midas problem” (quoting Cold Takes) instead of “outer alignment.” Idk. I didn’t choose my terms carefully.