On the other hand: somehow lots of people kept being like “You think RLHF is terrible? You must just not be thinking about the value of an iterative design loop!” and I’m like ”… I am not the one here who has not thought through strategy around iterative design loops”.
Like, one major thing which prompted this post was Richard Ngo outright saying “I take this comment as evidence that John would fail an intellectual turing test for people who have different views than he does about how valuable incremental empiricism is.”. I promptly and by his own admission proved him wrong about that, but my major update from that exchange (and a few similar exchanges on that and adjacent threads) was: a lot of people just have some vague halo around iterative development, and have not actually thought through the gears and the failure-modes (especially in the context of AI).
(Also I’d be interested to hear @Richard_Ngo’s response to this. He did leave a comment on OP which seemed to me like a somewhat-confused tangent at the time, but I haven’t heard him respond to what I’d consider the central point of the post: iterative design loops are not vague magic good things, they have specific predictable failure modes, and a bunch of stuff like RLHF which people say is good because handwave iterative design handwave in fact look quite terrible when we actually think about the failure modes of iterative design.)
Doesn’t say anything particularly novel but presents it in a very clear and elegant manner which has value in and of itself.
On the one hand: I know, right?
On the other hand: somehow lots of people kept being like “You think RLHF is terrible? You must just not be thinking about the value of an iterative design loop!” and I’m like ”… I am not the one here who has not thought through strategy around iterative design loops”.
Like, one major thing which prompted this post was Richard Ngo outright saying “I take this comment as evidence that John would fail an intellectual turing test for people who have different views than he does about how valuable incremental empiricism is.”. I promptly and by his own admission proved him wrong about that, but my major update from that exchange (and a few similar exchanges on that and adjacent threads) was: a lot of people just have some vague halo around iterative development, and have not actually thought through the gears and the failure-modes (especially in the context of AI).
(Also I’d be interested to hear @Richard_Ngo’s response to this. He did leave a comment on OP which seemed to me like a somewhat-confused tangent at the time, but I haven’t heard him respond to what I’d consider the central point of the post: iterative design loops are not vague magic good things, they have specific predictable failure modes, and a bunch of stuff like RLHF which people say is good because handwave iterative design handwave in fact look quite terrible when we actually think about the failure modes of iterative design.)
I’m actually somewhat surprised. Maybe this idea has saturated my water supply to the point where it seems trivial.