Is your model produced by optimization for accuracy, or by optimizing local pieces for local accuracy (e.g. for your lane prediction algorithm accurately predicting lane boundaries), or by human engineering (e.g. pieces of the model reflecting facts that humans know about the world)? Most practical systems will do some of each.
Is your policy divided into pieces (like a generative model, inference algorithm, planner) which are optimized for local objectives, or is it all optimized end-to-end for performance? If most of your compute goes into inference and planning, then the causal model inside an AI with a given level of performance is likely to be much smaller (and have simpler concepts) than a neural net with the same performance.
Those are kind of similar to each other; they are both talking about how much you optimize pieces to do a job defined by human expectations (with human reasoning about how the pieces fit together) vs optimizing end to end.
I think these are real distinctions that increase how far you can scale before running into catastrophic misalignment. They may come with big performance tradeoffs, since end-to-end optimization is quite powerful. You may be able to claw back some runtime inefficiency by distilling a slow inference or planning process into a faster neural network, but this probably only gets you so far and reintroduces some alignment issues.
As far as I can tell, people who feel like probabilistic models are better than deep learning are mostly optimistic because of this kind of distinction.
Great points. I definitely agree with your argument quantitatively: these distinctions mean that a probabilistic model will be quantitatively more interpretable for the same system, or be able to handle more complex systems for a given interpretability metric (like e.g. “running into catastrophic misalignment”).
That said, it does seem like the vast majority of interpretability for both probabilistic and ML systems is in “how does this internal stuff correspond to stuff in the world”. So qualitatively, it seems like the central interpretability problem is basically the same for both.
Yeah, I agree that if you learn a probabilistic model then you mostly have a difference in degree rather than difference in kind with respect to interpretability. It’s not super clear that the difference in degree is large or important (it seems like it could be, just not clear). And if you aren’t willing to learn a probabilistic model, then you are handicapping your system in a way that will probably eventually be a big deal.
Two potentially relevant distinctions:
Is your model produced by optimization for accuracy, or by optimizing local pieces for local accuracy (e.g. for your lane prediction algorithm accurately predicting lane boundaries), or by human engineering (e.g. pieces of the model reflecting facts that humans know about the world)? Most practical systems will do some of each.
Is your policy divided into pieces (like a generative model, inference algorithm, planner) which are optimized for local objectives, or is it all optimized end-to-end for performance? If most of your compute goes into inference and planning, then the causal model inside an AI with a given level of performance is likely to be much smaller (and have simpler concepts) than a neural net with the same performance.
Those are kind of similar to each other; they are both talking about how much you optimize pieces to do a job defined by human expectations (with human reasoning about how the pieces fit together) vs optimizing end to end.
I think these are real distinctions that increase how far you can scale before running into catastrophic misalignment. They may come with big performance tradeoffs, since end-to-end optimization is quite powerful. You may be able to claw back some runtime inefficiency by distilling a slow inference or planning process into a faster neural network, but this probably only gets you so far and reintroduces some alignment issues.
As far as I can tell, people who feel like probabilistic models are better than deep learning are mostly optimistic because of this kind of distinction.
Great points. I definitely agree with your argument quantitatively: these distinctions mean that a probabilistic model will be quantitatively more interpretable for the same system, or be able to handle more complex systems for a given interpretability metric (like e.g. “running into catastrophic misalignment”).
That said, it does seem like the vast majority of interpretability for both probabilistic and ML systems is in “how does this internal stuff correspond to stuff in the world”. So qualitatively, it seems like the central interpretability problem is basically the same for both.
Yeah, I agree that if you learn a probabilistic model then you mostly have a difference in degree rather than difference in kind with respect to interpretability. It’s not super clear that the difference in degree is large or important (it seems like it could be, just not clear). And if you aren’t willing to learn a probabilistic model, then you are handicapping your system in a way that will probably eventually be a big deal.