Btw, this post also views Paul’s agenda through the lens of constructing imitations of humans.
Right, so I think I wasn’t really making a new observation, but just clearing up a confusion on my own part, where for a long time I didn’t understand how the idea of approval-directed agency fits into IDA because people switched from talking about approval-directed agency to imitation learning (or were talking about them interchangeably) and I didn’t catch the connection. So at this point I understand Paul’s trajectory of views as follows:
goal-directed agent ⇒ approval-directed agent ⇒ use IDA to scale up approval-direct agent ⇒ approval-directed agency as a form of imitation learning / generalize to other forms of imitation learning ⇒ generalize IDA to safely scale up other (including more goal-directed / consequentialist) forms of ML (see An Unaligned Benchmark which I think represents his current views)
(Someone please chime in if this still seems wrong or confused.)
Right, so I think I wasn’t really making a new observation, but just clearing up a confusion on my own part, where for a long time I didn’t understand how the idea of approval-directed agency fits into IDA because people switched from talking about approval-directed agency to imitation learning (or were talking about them interchangeably) and I didn’t catch the connection. So at this point I understand Paul’s trajectory of views as follows:
goal-directed agent ⇒ approval-directed agent ⇒ use IDA to scale up approval-direct agent ⇒ approval-directed agency as a form of imitation learning / generalize to other forms of imitation learning ⇒ generalize IDA to safely scale up other (including more goal-directed / consequentialist) forms of ML (see An Unaligned Benchmark which I think represents his current views)
(Someone please chime in if this still seems wrong or confused.)