Stephen Bennett comments on My PhD thesis: Algorithmic Bayesian Epistemology

Stephen Bennett 28 Mar 2024 7:07 UTC
4 points
0
Congratulations! I wish we could have collaborated while I was in school, but I don’t think we were researching at the same time. I haven’t read your actual papers, so feel free to answer “you should check out the paper” to my comments.

For chapter 4: From the high level summary here it sounds like you’re offloading the task of aggregation to the forecasters themselves. It’s odd to me that you’re describing this as arbitrage. Also, I have frequently seen the scoring rule be used with some intermediary function to determine monetary rewards. For example, when I worked with IARPA on geopolitical forecasting, our forecasters would get financial rewards depending on what percentile they were in relative to other forecasters. One would imagine that this would eliminate the incentive to report the aggregate as your own answer, but there’s a reason we (the researcher/platform/website) aggregate individual forecasts! It’s actually just more accurate under typical conditions. In theory an individual forecaster could improve that aggregate by forming their own independent forecast before seeing the work of others, and then aggregating, but in practice the impact of an individual forecast is quite small. I’ll have to read about QA pooling, it’s surprising to me that you could disincentivize forecasters from reporting the aggregate as their individual forecast.

For chapter 7: It seems to me that under sufficiently pessimistic conditions, there would be no good way to aggregate those two forecasts. For example, if Alice and Bob are forecasting “Will AI cause human extinction in the next 100 years?”, they both might individually forecast ~0% for different reasons. Alice believes it is impossible for AI to get powerful enough to cause human extinction, but if it were capable of acting it would kill us all. Bob believes any agent smart enough to be that powerful would necessarily be morally upstanding and believes it’s extremely likely that it will be built. Any reasonable aggregation strategy will put the aggregate at ~0% because each individual forecast is ~0%, but if they were to communicate with one another they would likely arrive at a much higher number. I suspect that you address this in the assumptions of the model in the actual paper.

Congrats again, I enjoyed your high level summary and might come back for a more detailed read of your papers.
- Eric Neyman 28 Mar 2024 21:02 UTC
  7 points
  0
  Parent
  Thanks! Here are some brief responses:
  From the high level summary here it sounds like you’re offloading the task of aggregation to the forecasters themselves. It’s odd to me that you’re describing this as arbitrage.
  Here’s what I say about this anticipated objection in the thesis:
  For many reasons, the expert may wish to make arbitrage impossible. First, the principal may wish to know whether the experts are in agreement: if they are not, for instance, the principal may want to elicit opinions from more experts. If the experts collude to report an aggregate value (as in our example), the principal does not find out whether they originally agreed. Second, even if the principal only seeks to act based on some aggregate of the experts’ opinions, their method of aggregation may be different from the one that experts use to collude. For instance, the principal may have a private opinion on the trustworthiness of each expert and wishes to average the experts’ opinions with corresponding weights. Collusion among the experts denies the principal this opportunity. Third, a principal may wish to track the accuracy of each individual expert (to figure out which experts to trust more in the future, for instance), and collusion makes this impossible. Fourth, the space of collusion strategies that constitute arbitrage is large. In our example above, any report in [0.546, 0.637] would guarantee a profit; and this does not even mention strategies in which experts report different probabilities. As such, the principal may not even be able to recover basic information about the experts’ beliefs from their reports.
  For example, when I worked with IARPA on geopolitical forecasting, our forecasters would get financial rewards depending on what percentile they were in relative to other forecasters.
  This would indeed be arbitrage-free, but likely not proper: it wouldn’t necessarily incentivize each expert to report their true belief; instead, an expert’s optimal report is going to be some sort of function of the expert’s belief about the joint probability distribution over the experts’ beliefs. (I’m not sure how much this matters in practice—I defer to you on that.)
  It’s surprising to me that you could disincentivize forecasters from reporting the aggregate as their individual forecast.
  In Chapter 4, we are thinking of experts as having immutable beliefs, rather than beliefs that change upon hearing other experts’ beliefs. Is this a silly model? If you want, you can think of these beliefs as each expert’s belief after talking to the other experts a bunch. In theory(?) the experts’ beliefs should converge (though I’m not actually clear what happens if the experts are computationally bounded); but in practice, experts often don’t converge (see e.g. the FRI adversarial collaboration on AI risk).
  It seems to me that under sufficiently pessimistic conditions, there would be no good way to aggregate those two forecasts.
  Yup—in my summary I described “robust aggregation” as “finding an aggregation strategy that works as well as possible in the worst case over a broad class of possible information structures.” In fact, you can’t do anything interesting in the worse case over all information structures. The assumption I make in the chapter in order to get interesting results is, roughly, that experts’ information is substitutable rather than complementary (on average over the information structure). The sort of scenario you describe in your example is the type of example where Alice and Bob’s information might be complementary.