Mark Xu comments on dxu’s Shortform

Mark Xu 13 Sep 2024 23:54 UTC
8 points
1
I think “basically obviates” is too strong. imitation of human-legible cognitive strategies + RL seems liable to produce very different systems that would been produced with pure RL. For example, in the first case, RL incentizes the strategies being combine in ways conducive to accuracy (in addition to potentailly incentivizing non-human-legible cognitive strategies), whereas in the second case you don’t get any incentive towards productively useing human-legible cogntive strategies.