johnswentworth comments on Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

johnswentworth 25 Apr 2022 15:57 UTC
6 points
In your toy model there’s 100% chance that we’re doomed.
The toy model says there’s 100% chance of doom if the only way we find problems is by iteratively trying things and seeing what visibly goes wrong. A core part of my view here is that there’s lots of problems which will not be noticed by spending any amount of time iterating on a black box, but will be found if we can build the mathematical tools to open the black box. I do think it’s possible to build sufficiently-good mathematical tools that literally all the problems are found (see the True Names thing).
More time does help with building those tools, but more time experimenting with weak AI systems doesn’t matter so much. Experimenting with AI systems does provide some feedback for the theory-building, but we can get an about-as-good feedback signal from other agenty systems in the world already. So the slow/fast takeoff question isn’t particularly relevant.
I need something weaker; just that we should put some probability on there not being fatal problems which will not be found given more time.
Man, it would be one hell of a miracle if the number of fatal problems which would not be found by any amount of iterating just so happened to be exactly zero. Probabilities are never literally zero, but that does seem to me unlikely enough as to be strategically irrelevant.
- Tom Davidson 27 Apr 2022 15:46 UTC
  1 point
  Parent
  It sounds like the crux is whether having time with powerful (compared to today) but sub-AGI systems will make the time we have for alignment better spent. Does that sound right?
  
  I’m thinking it will because i) you can better demonstrate AI alignment problems empirically to convince top AI researchers to prioritise safety work, ii) you can try out different alignment proposals and do other empirical work with powerful AIs, iii) you can try to leverage powerful AIs to help you do alignment research itself.
  
  Whereas you think these things are so unlikely to help that getting more time with powerful AIs is strategically irrelevant
  - johnswentworth 27 Apr 2022 16:39 UTC
    2 points
    Parent
    Yeah, that’s right. Of your three channels for impact:
    i) you can better demonstrate AI alignment problems empirically to convince top AI researchers to prioritise safety work, ii) you can try out different alignment proposals and do other empirical work with powerful AIs, iii) you can try to leverage powerful AIs to help you do alignment research itself
    … (i) and (ii) both work ~only to the extent that the important problems are visible. Demonstrating alignment problems empirically ~only matters if they’re visible and obvious. Trying out different alignment proposals also ~only matters if their failure modes are actually detectable.
    (iii) fails for a different reason, namely that by the time AIs are able to significantly accelerate the hard parts of alignment work, they’ll already have foomed. Reasoning: there’s generally a transition point between “AI is worse than human at task, so task is mostly done by human” and “AI is comparable to human or better, so task is mostly done by AI”. Foom occurs roughly when AI crosses that transition point for AI research itself. And alignment is technically similar enough to AI research more broadly that I expect the transition to be roughly-simultaneous for capabilities and alignment research.
    - Tom Davidson 3 May 2022 3:36 UTC
      1 point
      Parent
      Quick responses to your argument for (iii).
      
      If AI automates 50% of both alignment work and capabilities research, it could help with alignment before foom (while also bringing foom forward in time)
      A leading project might choose to use AIs for alignment rather for fooming
      AI might be more useful for alignment work than for capabilities work
      fooming may require may compute than certain types of alignment work