The difficulty of jointly representing fθ1 and Hθ2 motivates my recent proposal, which avoids any such explicit representation. Instead it separately specifies θ1 and θ2, and then “gets back” bits by imposing a consistency condition that would have been satisfied only for a very small fraction of possible θ2’s (roughly exp(−|θ1|) of them).
But thinking about this neural network case also makes it easy to talk about why my recent proposal could run into severe computational problems:
In order to calculate this loss function we need to evaluate how “special” θ2 is, i.e. how small is the fraction of θ2’s that are consistent with θ1
In order to evaluate how special θ2 is, we basically need to do the same process of SGD that produces—θ2then we can compare the actual iterates to all of the places that it could have gone in a different direction, and conclude that almost all of the different settings of the parameters would have been much less consistent with θ1.
The implicit hope of my proposal is that the outer neural network is learning its human model using something like SGD, and so it can do this specialness-calculation for free—it will be considering lots of different human-models, and it can observe that almost all of them are much less consistent with θ1.
But the outer neural network could learn to model humans in a very different way, which may not involve representing a serious of iterates of “plausible alternative human models.” For example, suppose that in each datapoint we observe a few of the bits of θ2 directly (e.g. by looking at a brain scan), and we fill in much of θ2 in this way before we ever start making good predictions about human behavior. Then we never need to consider any other plausible human-models.
So in order to salvage a proposal like this, it seems like (at a minimum) the “specialness evaluation” needs to take place separately from the main learning of the human model, using a very different process (where we consider lots of different human models and see that it’s actually quite hard to find one that is similarly-consistent with θ1). This would take place at the point where the outer model started actually using its human model Hθ2 in order to answer questions.
I don’t really know what that would look like or if it’s possible to make anything like that work.
The difficulty of jointly representing fθ1 and Hθ2 motivates my recent proposal, which avoids any such explicit representation. Instead it separately specifies θ1 and θ2, and then “gets back” bits by imposing a consistency condition that would have been satisfied only for a very small fraction of possible θ2’s (roughly exp(−|θ1|) of them).
But thinking about this neural network case also makes it easy to talk about why my recent proposal could run into severe computational problems:
In order to calculate this loss function we need to evaluate how “special” θ2 is, i.e. how small is the fraction of θ2’s that are consistent with θ1
In order to evaluate how special θ2 is, we basically need to do the same process of SGD that produces—θ2then we can compare the actual iterates to all of the places that it could have gone in a different direction, and conclude that almost all of the different settings of the parameters would have been much less consistent with θ1.
The implicit hope of my proposal is that the outer neural network is learning its human model using something like SGD, and so it can do this specialness-calculation for free—it will be considering lots of different human-models, and it can observe that almost all of them are much less consistent with θ1.
But the outer neural network could learn to model humans in a very different way, which may not involve representing a serious of iterates of “plausible alternative human models.” For example, suppose that in each datapoint we observe a few of the bits of θ2 directly (e.g. by looking at a brain scan), and we fill in much of θ2 in this way before we ever start making good predictions about human behavior. Then we never need to consider any other plausible human-models.
So in order to salvage a proposal like this, it seems like (at a minimum) the “specialness evaluation” needs to take place separately from the main learning of the human model, using a very different process (where we consider lots of different human models and see that it’s actually quite hard to find one that is similarly-consistent with θ1). This would take place at the point where the outer model started actually using its human model Hθ2 in order to answer questions.
I don’t really know what that would look like or if it’s possible to make anything like that work.