evhub comments on Safe Predictive Agents with Joint Scoring Rules

evhub 10 Oct 2024 19:27 UTC
LW: 3 AF: 3
0
AF
I’m interested in figuring out what a realistic training regime would look like that leverages this. Some thoughts:
- Maybe this lends itself nicely to market-making? It’s pretty natural to imagine lots of traders competing with each other to predict what the market will believe at the end and rewarding the traders based on their relative performance rather than their absolute performance (in fact that’s pretty much how real markets work!). I’d be really interested in seeing a concrete fleshed-out proposal there.
- Is there some way to incorporate these ideas into pre-training? The thing that’s weird there is that the model in fact has no ability to control anything during the pre-training process itself—it’s just a question of whether the model learns to think of its objective as one which involves generalizing to predicting futures/counterfactuals that could then be influenced by its own actions. So the problem there is that the behavior we’re worried about doesn’t arise from a direct incentive during training, so it’s not clear that this is that helpful in that case, though maybe I’m missing something.
- Rubi J. Hudson 12 Oct 2024 6:34 UTC
  LW: 3 AF: 3
  0
  AF Parent
  I think the tie-in to market-making, and other similar approaches like debate, is in interpreting the predictions. While the examples in this post were only for the two-outcome case, we would probably want predictions over orders of magnitude more outcomes for the higher informational density. Since evaluating distributions over a double digit number of outcomes already starts posing problems (sometimes even high single digits), a process to direct a decision maker’s attention is necessary.
  I’ve been thinking of a proposal like debate, where both sides go back and forth proposing clusters of outcomes based on shared characteristics. Ideally, in equilibrium, the first debater should propose the fewest number of clusters such that splitting them further doesn’t change the decision maker’s mind. This could also be thought of in terms of market-making, where rather than the adversary proposing a string, they propose a further subdivision of existing clusters.
  I like the use case of understanding predictions for debate/market-making, because the prediction itself acts as a ground truth. Then, there’s no need to ancitipate/reject a ton of counterarguments based on potential lies, rather arguments are limited to selectively revealing the truth. It is probably important that the predictors are separate models from the analyzer to avoid contamination of the objectives. The proof of Theorem 6, which skips to the end of the search process, needs to use a non-zero sum prediction for that result.
  As an aside, I also did some early work on decision markets, distinct from your post on market-making, since the Othman and Sandholm had an impossibility result for those too. However, but the results were ultimately trivial. Once you can use zero-sum competition to costlessly get honest conditional predictions, then as soon as you can pair off entrants to the market it becomes efficient. But the question then arises of why use a decision market in the first place instead of just querying experts?
  With respect to pre-training, I agree that it’s not easy to incorporate. I’m not sure how any training regime that only trains on data where the prediction has no effect can imbue incentives that generalize in the desired way to situations where predictions do affect the outcome. If you do get a performative predictor out of pretraining, then as long as it’s myopic you might be able to train the performativity out of it in safely controlled scenarios (and if it’s not myopic, it’s a risk whether it’s performative or not). That was part of my reasoning for the second experiment, checking how well performativity could be trained out.
  To incorporate into an ongoing pre-training process, human decisions are likely too expensive, but the human is probably not the important part. Instead, predictions where performativity is possible by influencing simple AI decision makers could be mixed into the pre-training process. Defining a decision problem environment of low or medium complexity is not too difficult, and I suspect previous-generation models would be able to do a good job generating many examples. A danger arises that the model learns only to not predict performatively in those scenarios (same with untraining afterwards only applying to the controlled environments), though I think that’s a somewhat unnatural generalization.