Rohin Shah comments on An Analytic Perspective on AI Alignment

Rohin Shah 24 Mar 2020 1:55 UTC
LW: 4 AF: 3
0
AF
I guess I don’t understand why linear scaling would imply this—in fact, I’d guess that training should probably be super-linear, since each backward pass takes linear time, but the more neurons you have, the bigger the parameter space, and so the greater number of gradient steps you need to take to reach the optimum, right?
Yeah, that’s plausible. This does mean the mechanistic transparency cost could scale sublinearly w.r.t compute cost, though I doubt it (for the other reasons I mentioned).
If that estimate comes from OpenAI’s efforts to understand image recognition, I think it’s too high, since we presumably learned a bunch about what to look for from their efforts.
Nah, I just pulled a number out of nowhere. The estimate based on existing efforts would be way higher. Back of the envelope: it costs ~$50 to train on ImageNet (see here). Meanwhile, there have been probably around 10 person-years spent on understanding one image classifier? At $250k per person-year, that’s $2.5 million on understanding, making it 50,000x more expensive to understand it than to train it.
Things that would move this number down:
- Including the researcher time in the cost to train on ImageNet. I think that we will soon (if we haven’t already) enter the regime where researcher cost < compute cost, so that would only change the conclusion by a factor of at most 2.
- Using the cost for an unoptimized implementation, which would probably be > $50. I’d expect those optimizations to already be taken for the systems we care about—it’s way more important to get a 2x cost reduction when your training run costs $100 million than when your training run costs under $1000.
- Including the cost of hyperparameter tuning. This also seems like a thing we will cause to be no more than a factor of 2, e.g. by using population-based training of hyperparameters.
- Including the cost of data collection. This seems important, future data collection probably will be very expensive (even if simulating, there’s the compute cost of the simulation), but idk how to take it into account. Maybe decrease the estimate by a factor of 10?
Once you have a model of a module such that if the module worked according to your model things would be fine, you can just train the module to better fit your model.
You could also just use the model, if it’s fast. It would be interesting to see how well this works. My guess is that abstractions are leaky because there are no good non-leaky abstractions, which would predict that this doesn’t work very well.
I think this is a minor benefit. In most domains, specialists will understand the meanings of input data to their systems
I think this is basically just the same point as “the problem gets harder when the AI system is superhuman”, except the point is that the AI system becomes superhuman way faster on domains that are not native to humans, e.g. DNA, drug structures, protein folding, math intuition, relative to domains that are native to humans, like image classification.