Well, I haven’t seen even a blog post’s worth of effort put into doing something like what I suggested.
I think blog posts are potentially weird measures of effort, here. I also think that this is something that people are interested in doing—I think it’s a component of MIRI’s strategic sketch here, as part 8--but isn’t the sort of thing where we have anything particularly worthwhile to show for it yet.
Perhaps it makes sense to sketch an argument for why none of the standard paradigms satisfy some desideratum? This is kind of what AI Safety Gridworlds did. But it’s more the thing where, say, gradient boosted random forests have more of the ‘transparency’ property in a particular, legalistic way (it’s easier to figure out blame for any particular classification than it would be with a neural net) but not in the way that we actually care about (looking at a gradient boosted random forest, we could figure out if it’s thinking about things in the way that we want it to be thinking about), which might actually be easier with a neural net (because we could look at what neuron activations correspond to).
I think blog posts are potentially weird measures of effort, here. I also think that this is something that people are interested in doing—I think it’s a component of MIRI’s strategic sketch here, as part 8--but isn’t the sort of thing where we have anything particularly worthwhile to show for it yet.
Perhaps it makes sense to sketch an argument for why none of the standard paradigms satisfy some desideratum? This is kind of what AI Safety Gridworlds did. But it’s more the thing where, say, gradient boosted random forests have more of the ‘transparency’ property in a particular, legalistic way (it’s easier to figure out blame for any particular classification than it would be with a neural net) but not in the way that we actually care about (looking at a gradient boosted random forest, we could figure out if it’s thinking about things in the way that we want it to be thinking about), which might actually be easier with a neural net (because we could look at what neuron activations correspond to).