Ben Pace comments on Chris Olah’s views on AGI safety

Ben Pace 1 Nov 2019 23:15 UTC
LW: 20 AF: 9
AF
This post is fascinating, thank you very much Evan for writing it.
It seems like everyone has very different takes on how to figure out whether to keep working on something.
My sense reading this post is that Chris feels that making progress on the understandability of ML systems is something he’s found a lot of traction on, and doesn’t see a principled argument for why he won’t continue to find traction until we reach full understandability.
My sense reading this post by Jessica Taylor is that she feels that making progress on the understandability of ML systems is something she failed to find a lot of traction on, and doesn’t see a principled argument for why she would be able to reach full understandability.
And Paul says here
And I think there’s some basic research intuition about how much a problem– suppose you poke at a problem a few times, and you’re like ‘Agh, seems hard to make progress’. How much do you infer that the problem’s really hard? And I’m like, not much. As a person who’s poked at a bunch of problems, let me tell you, that often doesn’t work and then you solve in like 10 years of effort.
- Christopher Olah 2 Nov 2019 21:12 UTC
  LW: 39 AF: 15
  AF Parent
  I think that’s a fair characterization of my optimism.
  
  I think the classic response to me is “Sure, you’re making progress on understanding vision models, but models with X are different and your approach won’t work!” Some common values of X are not having visual features, recurrence, RL, planning, really large size, and language-based. I think that this is a pretty reasonable concern (more so for some Xs than others). Certainly, one can imagine worlds where this line of work hits a wall and ends up not helping with more powerful systems. However, I would offer a small consideration in the other direction: In 2013 I think no one thought we’d make this much progress on understanding vision models, and in fact many people thought really understanding them was impossible. So I feel like there’s some risk of distorting our evaluation of tractability by moving the goal post in these conversations.
  
  I’m not surprised by other people feeling like they have less traction. I feel like the first three or so years I spent trying to understand the internals neural networks involved a lot of false starts with approaches that ended up being dead ends (eg. visualizing really small networks, or focusing on dimensionality reduction). DeepDream was very exciting, but it retrospect I feel like it took me another two or so years to really digest what it meant and how one could really use it as a scientific tool. And this is with the benefit of amazing collaborators and multiple very supportive environments.
  
  One final thing I’d add is that, if I’m honest, I’m probably more motivated by aesthetics than optimism. I’ve spent almost seven years obsessed with the question of what goes on inside neural networks and I find the crazy partial answers we learn every year tantalizingly beautiful. I think this is pretty normal for early research directions; Kuhn talks about it a fair amount in The Structure of Scientific Revolutions.
  What links here?
  - Modeling the impact of safety agendas by Ben Cottier (5 Nov 2021 19:46 UTC; 51 points)