To understand your usage of the term “outer alignment” a bit better: often, people have a decomposition in mind where solving outer alignment means technically specifying the reward signal/model or something similar.
It seems that to you, the writeup of a model-spec or constitution also counts as outer alignment, which to me seems like only part of the problem. (Unless perhaps you mean that model specs and constitutions should be extended to include a whole training setup or similar?)
If it doesn’t seem too off-topic to you, could you comment on your views on this terminology?
Good points. I probably should have said “the midas problem” (quoting Cold Takes) instead of “outer alignment.” Idk. I didn’t choose my terms carefully.
Agreed.
To understand your usage of the term “outer alignment” a bit better: often, people have a decomposition in mind where solving outer alignment means technically specifying the reward signal/model or something similar. It seems that to you, the writeup of a model-spec or constitution also counts as outer alignment, which to me seems like only part of the problem. (Unless perhaps you mean that model specs and constitutions should be extended to include a whole training setup or similar?)
If it doesn’t seem too off-topic to you, could you comment on your views on this terminology?
Good points. I probably should have said “the midas problem” (quoting Cold Takes) instead of “outer alignment.” Idk. I didn’t choose my terms carefully.