The concrete generative model I had in mind was the one I used as an example in the document (page 1 under section “Simplest implementation”):
Train a conditional VAE to generate all the variables of the decoder of the predictor, condition on the [question, answer] pair. Use L2 regularization on the decoder of our model as a proxy for complexity (since the computation time of VAEs is constant).
An autoregressive model is probably the single worst model you could choose. It forces an order of generation of the latents, which breaks the arguments I wrote in my proposal. And for the purposes of how I expect the intended model to work (which I will explain below), it is very very deep. During training we can generate everything in parallel, but at inference time we need to generate each token after the previous one, which is a very deep computational graph.
I don’t understand what you mean by considering a single question-answer pair. If for a given scenario we have multiple questions and answers, we just feed each [question, answer, activations] triple into our model as training data.
Does that sound right?
It does, that is the distribution it is modeling.
The intended behavior is the one I wrote in my reply to Thomas:
both the “direct generator” and the “generator of worlds that a human thinks are consistent with a QA pair” need to generate the activations that represent the world, but the “direct generator” does it directly (e.g. “diamond” → representation of a diamond), while the other one performs additional computation to determine what worlds a human thinks are consistent with a QA pair (e.g. “diamond” → it might be a diamond, but it might also be just a screen in front of the camera and what is really in the vault is…), in the training set where the labels are always correct the direct generator is preferred.
If we have a dataset where what is seen on screen always corresponds (simply) with reality, I expect the model to generate the “world” first (using a direct map like I said above) and then use it to generate the images (I argued for this order in the document). If we use this to answer questions, it will care about what is really happening in reality (this is just the Markov property, see the last row of my drawing if that isn’t clear).
If for some reason this fails and the model generates both the images and the world each conditioned mostly independently on the QA pair, then using that to answer questions will get us a system that cares about both what is happening on screen and what is happening in reality. Given that the data it had access to is consistent with both hypotheses, I don’t fault it for doing that. Using it to optimize the world so that the answer to “Is the diamond in the vault?” is “Yes”, will give us a system that cares about “both” diamonds. Not a terrible failure mode.
If on the other hand we have a dataset where the vault has an operator, so that it is hard for another person to answer correctly, the ordering argument becomes stronger. For simple enough models QA->world->images works but in QA->images->world the QA information is lost in the first step and doesn’t get to the world (since the images are modeled as independent from the true answer). And in this case if this fails for some reason and the model generates the world and the images independently, using it to answer questions will give us a model that cares strongly about reality. Because the correlation of the QA pair is stronger with reality (by construction, at least for simple models) than with the images. Here you can’t use the argument of “but what if performing perfect inference on the human simulation is still cheaper than the direct reporter”, since both models need to be big enough to generate all the data.
If you can think of another algorithm as simple as a direct generator that performs well in the training set, describe it.
Even if you think I’m wrong, do you at least understand why I currently think this might work?
The concrete generative model I had in mind was the one I used as an example in the document (page 1 under section “Simplest implementation”):
Ah, I was looking at “submission #2” for this whole discussion and didn’t even notice that submission #5 was very similar, perhaps this explains much of the confusion.
I agree that a VAE with L2 on the decoder is the most promising version of this approach.
If for a given scenario we have multiple questions and answers, we just feed each [question, answer, activations] triple into our model as training data.
I meant: do you condition on a single question-answer pair at a time, or do you condition on a long list of them? It sounds like you want to condition on a single one. This seems simplest and I’m happy to focus on it, but it’s a little bit scary because the log loss reduction from conditioning on just one answer is so tiny (and so it’s not clear if it’s worth the model spending complexity to implement conditioning at all, since the no-conditioning model is extremely simple and gets almost exactly the same loss, though I’m happy to bracket this issue for now / assume we set hyperparameters so that it’s worthwhile to integrate the QA pair).
Even if you think I’m wrong, do you at least understand why I currently think this might work?
Yeah, I see the appeal of this approach and agree that (at least right now) it seems more promising than the versions of predicting latent state that can be defeated with steganography.
If you can think of another algorithm as simple as a direct generator that performs well in the training set, describe it.
The generator learns to fill in the latent state / observations in an adaptive order. (It generally fills them in the same order as the predictor but not always. For the purpose of this counterexample we can imagine it always fills them in in order.)
It operates under the assumption that the data will appear in training and that a human gives the given answer.
This leads it to assume that no covert tampering will occur. But maybe not 100% confidence, just huge updates against tampering.
Sometimes it generates the tampering observations before what’s actually happening in the world (e.g. because the tampering observations are physically prior and what happens in the real world depends on them).
Once it observes that covert tampering did in fact occur, it stops assuming that the human will be correct. (Since the most likely explanation is either that the human messed up, or that the model underestimated human abilities.) It seems like it won’t end up assuming that both tampering occurred to show a diamond and that the diamond was actually present.
It currently seems to me like this kind of counterexample would work, but this bulleted list is not yet a formal description (and it does seem somewhat harder to counterexample than depending on downstream variables). I’ll think about it a bit more.
Once it observes that covert tampering did in fact occur, it stops assuming that the human will be correct. (Since the most likely explanation is either that the human messed up, or that the model underestimated human abilities.) It seems like it won’t end up assuming that both tampering occurred to show a diamond and that the diamond was actually present.
But the neat thing is that there is no advantage, either to the size of the computational graph or in predictive accuracy, to doing that. In the training set the human is always right. Regular reporters make mistakes because what is seen on camera is a non-robust feature that generalizes poorly, here we have no such problems.
But I might have misunderstood, pseudocode would be useful to check that we can’t just remove “function calls” and get a better system.
Ah, a conditional VAE!
Small question: Am I the only one that reserves ‘z’ for the latent variables of the autoencoder? You seem to be using it as the ‘predictor state’ input. Or am I reading it wrong?
Now I understand your P(z|Q,A) better, as it’s just the conditional generator. But, how do you get P(A|Q)? That distribution need not be the same for the human known set and the total set.
I was wondering what happens in deployment when you meet a z that’s not in your P(z,Q,A) (ie very small p). Would you be sampling P(z|Q,A) forever?
You aren’t the only one, z is usually used for the latent, I just followed Paul’s notation to avoid confusion.
P(A|Q) comes just from training on the QA pairs. But I did say “set any reasonable prior over answers”, because I expect P(z|Q,A) to be orders of magnitude higher for the right answer. Like I said in another comment, an image generator (that isn’t terrible) is incredibly unlikely to generate a cat from the input “dog”, so even big changes to the prior probably won’t matter much. That being said, machine learning rests on the IID assumption, regular reporters are no exception, they also incorporate P(A|Q), it’s just that here it is explicit.
The whole point of VAEs is that the estimation of the probability of a sample is efficient (see section 2.1 here: https://arxiv.org/abs/1606.05908v1), so I don’t expect it to be a problem.
The concrete generative model I had in mind was the one I used as an example in the document (page 1 under section “Simplest implementation”):
An autoregressive model is probably the single worst model you could choose. It forces an order of generation of the latents, which breaks the arguments I wrote in my proposal. And for the purposes of how I expect the intended model to work (which I will explain below), it is very very deep. During training we can generate everything in parallel, but at inference time we need to generate each token after the previous one, which is a very deep computational graph.
I don’t understand what you mean by considering a single question-answer pair. If for a given scenario we have multiple questions and answers, we just feed each [question, answer, activations] triple into our model as training data.
It does, that is the distribution it is modeling.
The intended behavior is the one I wrote in my reply to Thomas:
If we have a dataset where what is seen on screen always corresponds (simply) with reality, I expect the model to generate the “world” first (using a direct map like I said above) and then use it to generate the images (I argued for this order in the document). If we use this to answer questions, it will care about what is really happening in reality (this is just the Markov property, see the last row of my drawing if that isn’t clear).
If for some reason this fails and the model generates both the images and the world each conditioned mostly independently on the QA pair, then using that to answer questions will get us a system that cares about both what is happening on screen and what is happening in reality. Given that the data it had access to is consistent with both hypotheses, I don’t fault it for doing that. Using it to optimize the world so that the answer to “Is the diamond in the vault?” is “Yes”, will give us a system that cares about “both” diamonds. Not a terrible failure mode.
If on the other hand we have a dataset where the vault has an operator, so that it is hard for another person to answer correctly, the ordering argument becomes stronger. For simple enough models QA->world->images works but in QA->images->world the QA information is lost in the first step and doesn’t get to the world (since the images are modeled as independent from the true answer). And in this case if this fails for some reason and the model generates the world and the images independently, using it to answer questions will give us a model that cares strongly about reality. Because the correlation of the QA pair is stronger with reality (by construction, at least for simple models) than with the images. Here you can’t use the argument of “but what if performing perfect inference on the human simulation is still cheaper than the direct reporter”, since both models need to be big enough to generate all the data.
If you can think of another algorithm as simple as a direct generator that performs well in the training set, describe it.
Even if you think I’m wrong, do you at least understand why I currently think this might work?
Ah, I was looking at “submission #2” for this whole discussion and didn’t even notice that submission #5 was very similar, perhaps this explains much of the confusion.
I agree that a VAE with L2 on the decoder is the most promising version of this approach.
I meant: do you condition on a single question-answer pair at a time, or do you condition on a long list of them? It sounds like you want to condition on a single one. This seems simplest and I’m happy to focus on it, but it’s a little bit scary because the log loss reduction from conditioning on just one answer is so tiny (and so it’s not clear if it’s worth the model spending complexity to implement conditioning at all, since the no-conditioning model is extremely simple and gets almost exactly the same loss, though I’m happy to bracket this issue for now / assume we set hyperparameters so that it’s worthwhile to integrate the QA pair).
Yeah, I see the appeal of this approach and agree that (at least right now) it seems more promising than the versions of predicting latent state that can be defeated with steganography.
Right now I’m mostly worried about something like the counterexample to “penalize depending on downstream variables.” So:
The generator learns to fill in the latent state / observations in an adaptive order. (It generally fills them in the same order as the predictor but not always. For the purpose of this counterexample we can imagine it always fills them in in order.)
It operates under the assumption that the data will appear in training and that a human gives the given answer.
This leads it to assume that no covert tampering will occur. But maybe not 100% confidence, just huge updates against tampering.
Sometimes it generates the tampering observations before what’s actually happening in the world (e.g. because the tampering observations are physically prior and what happens in the real world depends on them).
Once it observes that covert tampering did in fact occur, it stops assuming that the human will be correct. (Since the most likely explanation is either that the human messed up, or that the model underestimated human abilities.) It seems like it won’t end up assuming that both tampering occurred to show a diamond and that the diamond was actually present.
It currently seems to me like this kind of counterexample would work, but this bulleted list is not yet a formal description (and it does seem somewhat harder to counterexample than depending on downstream variables). I’ll think about it a bit more.
But the neat thing is that there is no advantage, either to the size of the computational graph or in predictive accuracy, to doing that. In the training set the human is always right. Regular reporters make mistakes because what is seen on camera is a non-robust feature that generalizes poorly, here we have no such problems.
But I might have misunderstood, pseudocode would be useful to check that we can’t just remove “function calls” and get a better system.
Ah, a conditional VAE! Small question: Am I the only one that reserves ‘z’ for the latent variables of the autoencoder? You seem to be using it as the ‘predictor state’ input. Or am I reading it wrong?
Now I understand your P(z|Q,A) better, as it’s just the conditional generator. But, how do you get P(A|Q)? That distribution need not be the same for the human known set and the total set.
I was wondering what happens in deployment when you meet a z that’s not in your P(z,Q,A) (ie very small p). Would you be sampling P(z|Q,A) forever?
You aren’t the only one, z is usually used for the latent, I just followed Paul’s notation to avoid confusion.
P(A|Q) comes just from training on the QA pairs. But I did say “set any reasonable prior over answers”, because I expect P(z|Q,A) to be orders of magnitude higher for the right answer. Like I said in another comment, an image generator (that isn’t terrible) is incredibly unlikely to generate a cat from the input “dog”, so even big changes to the prior probably won’t matter much. That being said, machine learning rests on the IID assumption, regular reporters are no exception, they also incorporate P(A|Q), it’s just that here it is explicit.
The whole point of VAEs is that the estimation of the probability of a sample is efficient (see section 2.1 here: https://arxiv.org/abs/1606.05908v1), so I don’t expect it to be a problem.