Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 10 Jul 2024 18:04 UTC
LW: 8 AF: 6
0
AF
Rereading this classic by Ajeya Cotra: https://www.planned-obsolescence.org/july-2022-training-game-report/

I feel like this is an example of a piece that is clear, well-argued, important, etc. but which doesn’t seem to have been widely read and responded to. I’d appreciate pointers to articles/posts/papers that explicitly (or, failing that, implicitly) respond to Ajeya’s training game report. Maybe the ‘AI Optimists?’
- peterbarnett 10 Jul 2024 20:15 UTC
  9 points
  4
  Parent
  I think the post Deceptive Alignment is <1% Likely by Default attempts to argue that deceptive alignment is very unlikely given the training setup that Ajeya lays out.
- Bogdan Ionut Cirstea 10 Jul 2024 18:42 UTC
  5 points
  0
  Parent
  Quick take, having read that report a long time ago: I think the development model was mostly off, looking at current AIs. The focus was on ‘human feedback on diverse tasks (HFDT)’, but there’s a lot of cumulated evidence that most of the capabilities of current models seem to be coming from pre-training (with a behavior cloning objective; not RL) and, AFAICT, current scaling plans still mostly seem to assume that to hold in the near future, at least.
  Though maybe things will change and RL will become more important.
  - Daniel Kokotajlo 10 Jul 2024 22:04 UTC
    10 points
    2
    Parent
    On the contrary, I think the development model was bang on the money basically. As peterbarnett says Ajeya did forecast that there’d be a bunch of pre-training before RL. It even forecast that there’d be behavior cloning too after the pretraining and before the RL. And yeah, RL isn’t happening on a massive scale yet (as far as we know) but I and others predict that’ll change in the next few years.
  - peterbarnett 10 Jul 2024 18:55 UTC
    7 points
    0
    Parent
    The report does say that the AI will likely be trained with a bunch of pre-training before the RL:
    Even before Alex is ever trained with RL, it already has a huge amount of knowledge and understanding of the world from its predictive and imitative pretraining step.
    The HFDT is what makes it a “generally competent creative planner” and capable of long-horizon open-ended tasks.
    Do you think most of future capabilities will continue to come from scaling pretraining, rather than something like HFDT? (There is obviously some fuzziness when talking about where “most capabilities come from”, but I think the capability to do long-horizon open-ended tasks will reasonably be thought of as coming from the HFDT or a similar process rather than the pretraining)
    - Bogdan Ionut Cirstea 10 Jul 2024 19:55 UTC
      7 points
      2
      Parent
      The HFDT is what makes it a “generally competent creative planner” and capable of long-horizon open-ended tasks.
      
      I’m not entirely sure how to interpret this, but my impression from playing with LMs (which also seems close to something like folk wisdom) is that they are already creative enough and quite competent at coming up with high-level plans, they’re just not reliable enough for long-horizon open-ended tasks.
      
      Do you think most of future capabilities will continue to come from scaling pretraining, rather than something like HFDT? (There is obviously some fuzziness when talking about where “most capabilities come from”, but I think the capability to do long-horizon open-ended tasks will reasonably be thought of as coming from the HFDT or a similar process rather than the pretraining)
      I would probably expect a mix of more single-step reliability mostly from pre-training (at least until running out of good quality text data) + something like self-correction / self-verification, where I’m more unsure where most of the gains would come from and could see e.g. training on synthetic data with automated verification contributing more.