I think this is an excellent post which hints at some of the key pitfalls and mistakes brought on by outer/inner alignment thinking, and many concepts upstream and downstream of that frame, including many (but not all) invocations of:
Goodhart’s law
Selection pressures and arguments
Objective misspecification
Outer alignment failures
The “necessity” of superhuman oversight to train aligned models.
I’ll have a post out (probably in the next month) laying out my thoughts on the outer / inner alignment frame and why that frame should be avoided.
An aside:
It’s just not a very good model of a growing human to see them as a path-independent search over policies that you have to be perfectly cautious about ever, even temporarily, incentivizing in a way you wouldn’t want to see superintelligently optimized.
I’d say “intelligently optimized”, since a human will hardly be expected to grow into a “superintelligent” optimizer (by definition). Not that it really matters for your point.
EDIT:
But in fact, there’ll be a lot more path dependence in AI training than this
I think I wouldn’t call this a “fact”, without more arguments. Although I really expect it to be true. I think “fact” should be reserved for actual, known, heap-of-undeniable-evidence facts.
But in fact, there’ll be a lot more path dependence in AI training than this
I think I wouldn’t call this a “fact”, without more arguments. Although I really expect it to be true. I think “fact” should be reserved for actual, known, heap-of-undeniable-evidence facts.
Entirely fair!
I recently decided to be better about separately flagging my observations and my inferences… but totally forgot about that guiding maxim when writing this.
I think this is an excellent post which hints at some of the key pitfalls and mistakes brought on by outer/inner alignment thinking, and many concepts upstream and downstream of that frame, including many (but not all) invocations of:
Goodhart’s law
Selection pressures and arguments
Objective misspecification
Outer alignment failures
The “necessity” of superhuman oversight to train aligned models.
I’ll have a post out (probably in the next month) laying out my thoughts on the outer / inner alignment frame and why that frame should be avoided.
An aside:
I’d say “intelligently optimized”, since a human will hardly be expected to grow into a “superintelligent” optimizer (by definition). Not that it really matters for your point.
EDIT:
I think I wouldn’t call this a “fact”, without more arguments. Although I really expect it to be true. I think “fact” should be reserved for actual, known, heap-of-undeniable-evidence facts.
Entirely fair!
I recently decided to be better about separately flagging my observations and my inferences… but totally forgot about that guiding maxim when writing this.