Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 30 Apr 2023 19:30 UTC
16 points
$100 bet between me & Connor Leahy:

(1) Six months from today, Paul Christiano (or ARC with Paul Christiano’s endorsement) will NOT have made any public statements drawing a ‘red line’ through any quantitative eval (anything that has a number attached to it, that is intended to measure an AI risk relevant factor, whether or not it actually succeeds at actually measuring that factor well), e.g. “If a model achieves X score on the Y benchmark, said model should not be deployed and/or deploying said model would be a serious risk of catastrophe.” Connor at 95%, Daniel at 45%
(2) If such a ‘red line’ is produced, GPT4 will be below it this year. Both at 95%, for an interpretation of GPT-4 that includes AutoGPT stuff (like what ARC did) but not fine-tuning.
(3) If such a ‘red line’ is produced, and GPT4 is below it on first evals, but later tests show it to actually be above (such as by using different prompts or other testing methodology), the red line will be redefined or the test declared faulty rather than calls made for GPT4 to be pulled from circulation. Connor at 80%, Daniel at 40%, for same interpretation of GPT-4.
(4) If ARC calls for GPT4 to be pulled from circulation, OpenAI will not comply. Connor at 99%, Daniel at 40%, for same interpretation of GPT-4.

All of these bets expire at the end of 2024, i.e. if the “if” condition hasn’t been met by the end of 2024, we call off the whole thing rather than waiting to see if it gets met later.

Help wanted: Neither of us has a good idea of how to calculate fair betting odds for these things. Since Connor’s credences are high and mine are merely middling, presumably it shouldn’t be the case that either I pay him $100 or he pays me $100. We are open to suggestions about what the fair betting odds should be.
- Olli Järviniemi 30 Apr 2023 19:42 UTC
  9 points
  Parent
  Regarding betting odds: are you aware of this post? It gives a betting algorithm that satisfies both of the following conditions:
  - Honesty: participants maximize their expected value by being reporting their probabilities honestly.
  - Fairness: participants’ (subjective) expected values are equal.
  The solution is “the ‘loser’ pays the ‘winner’ the difference of their Brier scores, multiplied by some pre-determined constant C”. This constant C puts an upper bound on the amount of money you can lose. (Ideally C should be fixed before bettors give their odds, because otherwise the honesty desideratum above could break, but I don’t think that’s a problem here.)
  - Daniel Kokotajlo 30 Apr 2023 20:43 UTC
    4 points
    Parent
    I was not aware, but I strongly suspected that someone on LW had asked and answered the question before, hence why I asked for help. Prayers answered! Thank you! Connor, are you OK with Scott’s algorithm, using C = $100?
    - Connor Leahy 1 May 2023 8:24 UTC
      5 points
      Parent
      Looks good to me, thank you Loppukilpailija!
- niplav 7 Mar 2024 15:14 UTC
  2 points
  0
  Parent
  Bet (1) resolved in Connor’s favor, right?
  - Daniel Kokotajlo 7 Mar 2024 15:40 UTC
    6 points
    0
    Parent
    Yep! & I already paid out. I thought I had made some sort of public update but I guess I forgot. Thanks for the reminder.