I’m confused about what it means for a hypothesis to “want” to score better, to change its predictions to get a better score, to print manipulative messages, and so forth. In probability theory each hypothesis is just an event, so is static, cannot perform actions, etc. I’m guessing you have some other formalism in mind but I can’t tell what it is.
I interpreted it as an ensemble of expert models, weighted in a Bayesian fashion based on past performance. But because of the diagnostic logs, the type signature is a little different; the models output both whatever probability distributions over queries / events and arbitrary text in some place.
Then there’s a move that I think of as the ‘intentional stance move’, where you look at a system that rewards behavior of a particular type (when updating the weights based on past success, you favor predictions that thought an event was more likely than its competitors did), and so pretend that the things in the system “want” to do the behavior that’s rewarded. [Like, even in this paragraph, ‘reward’ is this sort of mental shorthand; it’s not like any of the models have an interior preference to have high weight in the ensemble, it’s just that the ensemble’s predictions are eventually more like the predictions of the models that did things that happened to lead to having higher weight.]
Yeah, in probability theory you don’t have to worry about how everything is implemented. But for implementations of Bayesian modeling with a rich hypothesis class, each hypothesis could be something like a blob of code which actually does a variety of things.
As for “want”, sorry for using that without unpacking it. What it specifically means is that hypotheses like that will have a tendency to get more probability weight in the system, so if we look at the weighty (and thus influential) hypotheses, they are more likely to implement strategies which achieve those ends.
I’m confused about what it means for a hypothesis to “want” to score better, to change its predictions to get a better score, to print manipulative messages, and so forth. In probability theory each hypothesis is just an event, so is static, cannot perform actions, etc. I’m guessing you have some other formalism in mind but I can’t tell what it is.
I interpreted it as an ensemble of expert models, weighted in a Bayesian fashion based on past performance. But because of the diagnostic logs, the type signature is a little different; the models output both whatever probability distributions over queries / events and arbitrary text in some place.
Then there’s a move that I think of as the ‘intentional stance move’, where you look at a system that rewards behavior of a particular type (when updating the weights based on past success, you favor predictions that thought an event was more likely than its competitors did), and so pretend that the things in the system “want” to do the behavior that’s rewarded. [Like, even in this paragraph, ‘reward’ is this sort of mental shorthand; it’s not like any of the models have an interior preference to have high weight in the ensemble, it’s just that the ensemble’s predictions are eventually more like the predictions of the models that did things that happened to lead to having higher weight.]
Yeah, in probability theory you don’t have to worry about how everything is implemented. But for implementations of Bayesian modeling with a rich hypothesis class, each hypothesis could be something like a blob of code which actually does a variety of things.
As for “want”, sorry for using that without unpacking it. What it specifically means is that hypotheses like that will have a tendency to get more probability weight in the system, so if we look at the weighty (and thus influential) hypotheses, they are more likely to implement strategies which achieve those ends.