TurnTrout comments on Counting arguments provide no evidence for AI doom

TurnTrout 11 Mar 2024 22:45 UTC
LW: 7 AF: 5
3
AF
Sorry, I do think you raised a valid point! I had read your comment in a different way.
I think I want to have said: aggressively training AI directly on outcome-based tasks (“training it to be agentic”, so to speak) may well produce persistently-activated inner consequentialist reasoning of some kind (though not necessarily the flavor historically expected). I most strongly disagree with arguments which behave the same for a) this more aggressive curriculum and b) pretraining, and I think it’s worth distinguishing between these kinds of argument.
- evhub 12 Mar 2024 0:29 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Sure—I agree with that. The section I linked from Conditioning Predictive Models actually works through at least to some degree how I think simplicity arguments for deception go differently for purely pre-trained predictive models.