The common way to evaluate model accuracy in machine learning contexts is that you have a bunch of samples of the “ground truth” that is to be predicted; e.g. classified images for supervised learning. And then you evaluate the model on those samples. That is the sort of accuracy measure I had in mind when writing the post, because that is what gets used in practice.
(In particular, models get directly optimized to perform better on this accuracy measure, but it might not reach an optimum, so the question is what happens when the model doesn’t reach an optimum.)
Given the setup, one obvious probability distribution is: questions asked by the actor in the process of deciding what to do. This seems, in fact, to be the only such probability distribution already present in the setup, rather than an additional benchmark brought in for as-yet unstated reasons.
Of course, there’s a problem: the system as a whole doesn’t know what questions the actor will ask. Or, if it knows the actor will only ask “what action will maximize u”, we’ve just shifted things into the model; it will need to ask sub-questions to give a good answer.
Still, we need to get the questions that the model will be optimized to answer from somewhere. There actually is no such thing as a “best” (or even a “most accurate”) model in the abstract. So we’ll need to have a model of what questions the actor will ask, and what sub-questions we’ll need to answer those. (To avoid Infinite regress, the purpose of that model is to predict what questions the world-model will need to answer.)
Since I have already have an idea of what distribution the accuracy gets evaluated on, it won’t be about the question the actor asks. However, the problem you mention here comes up in a different way, in that in e.g. reinforcement learning contexts, the distribution of data the AI faces, which we use to evaluate the model’s accuracy, may depend on the decisions of the AI and so by transitivity also on the model’s predictions. This prevents there from being a straightforwardly best option.
(In fact, it’s a problem similar to this that led me to want to better understand partially optimized models.)
The common way to evaluate model accuracy in machine learning contexts is that you have a bunch of samples of the “ground truth” that is to be predicted; e.g. classified images for supervised learning. And then you evaluate the model on those samples. That is the sort of accuracy measure I had in mind when writing the post, because that is what gets used in practice.
That’s what gets used for supervised or unsupervised learning, but your post started out with “Suppose we want to create an agent AI”, and there’s no straightforward way of interpreting systems trained with those techniques as agents. Perhaps you intended for some such system to be used as the “model” subsystem of an agent AI, but in that case I think the problem really is basically what I said: the actor should be defining what information it wants to get out of the model, and the model should be optimized to supply that information, and if it isn’t, that model won’t do as well at providing the information the actor needs.
I don’t think “amount of information contained” even sounds like a property of a model that anyone would think they should care about, absent some detail about what that information is about. Otherwise a model that knows nothing but a sufficiently massive number of digits of pi would be better than one that can answer any question you have about the real world but knows pi to only 50 decimal places. “Percent of questions in the test set answered correctly” does sound possibly useful, if you want to get answers to questions drawn from the same distribution. “Percent of questions I actually ask, weighted by how much value I get from having that particular question answered correctly” would be an even better metric (with the defect of being impossible to directly optimize for), of course, but the long book about who lives where and the library describing the death chamber don’t even seem to live up to the minimal “this answers the kind of questions I want to ask” criterion.
I mean you can consider something like Dreamer, an RL agent I’ve seen. It trains a model to predict the dynamics of a system, and then trains the behavior using that model. I don’t see how this RL agent is compatible with your comment.
The common way to evaluate model accuracy in machine learning contexts is that you have a bunch of samples of the “ground truth” that is to be predicted; e.g. classified images for supervised learning. And then you evaluate the model on those samples. That is the sort of accuracy measure I had in mind when writing the post, because that is what gets used in practice.
(In particular, models get directly optimized to perform better on this accuracy measure, but it might not reach an optimum, so the question is what happens when the model doesn’t reach an optimum.)
Since I have already have an idea of what distribution the accuracy gets evaluated on, it won’t be about the question the actor asks. However, the problem you mention here comes up in a different way, in that in e.g. reinforcement learning contexts, the distribution of data the AI faces, which we use to evaluate the model’s accuracy, may depend on the decisions of the AI and so by transitivity also on the model’s predictions. This prevents there from being a straightforwardly best option.
(In fact, it’s a problem similar to this that led me to want to better understand partially optimized models.)
That’s what gets used for supervised or unsupervised learning, but your post started out with “Suppose we want to create an agent AI”, and there’s no straightforward way of interpreting systems trained with those techniques as agents. Perhaps you intended for some such system to be used as the “model” subsystem of an agent AI, but in that case I think the problem really is basically what I said: the actor should be defining what information it wants to get out of the model, and the model should be optimized to supply that information, and if it isn’t, that model won’t do as well at providing the information the actor needs.
I don’t think “amount of information contained” even sounds like a property of a model that anyone would think they should care about, absent some detail about what that information is about. Otherwise a model that knows nothing but a sufficiently massive number of digits of pi would be better than one that can answer any question you have about the real world but knows pi to only 50 decimal places. “Percent of questions in the test set answered correctly” does sound possibly useful, if you want to get answers to questions drawn from the same distribution. “Percent of questions I actually ask, weighted by how much value I get from having that particular question answered correctly” would be an even better metric (with the defect of being impossible to directly optimize for), of course, but the long book about who lives where and the library describing the death chamber don’t even seem to live up to the minimal “this answers the kind of questions I want to ask” criterion.
I mean you can consider something like Dreamer, an RL agent I’ve seen. It trains a model to predict the dynamics of a system, and then trains the behavior using that model. I don’t see how this RL agent is compatible with your comment.