Raemon comments on MIRI announces new “Death With Dignity” strategy

Raemon 7 May 2022 18:57 UTC
6 points
I think it makes it easier to communicate between people with very different background views about alignment; it seems much easier to agree about whether something reduces doom by 1 bit, than to agree about whether it cuts doom by 50%.
I’m a bit surprised by this. (I was also a bit confused about what work “log odds” vs other probability framing was doing in the first place, and so were some others I’ve chatted with). Have you run into people/conversations when this was helpful?
(I guess same question for Eliezer. I had a vague sense of “log odds put the question into units that were more motivating to reason about”, but was fuzzy on the details. I’m not fluent in log odds, I have some sense that if were were actually measuring precise probabilities it might matter for making math actually easier, but I haven’t seen anyone make arguments about success chance that were grounded in something rigorous enough that doing actual math to it was useful)
- Zack_M_Davis 7 May 2022 19:38 UTC
  20 points
  Parent
  Likelihood ratios can be easier to evaluate than absolute probabilities insofar as you can focus on the meaning of a single piece of evidence, separately from the context of everything else you believe.
  
  Suppose we’re trying to pick a multiple-dial combination lock (in the dark, where we can’t see how many dials there are). If we somehow learn that the first digit is 3, we can agree that that’s $lg 10 \approx 3.32$ bits of progress towards cracking the lock, even if you think there’s three dials total (such that we only need 6.64 more bits) and I think there’s 10 dials total (such that we need 29.88 more bits).
  
  Similarly, alignment pessimists and optimists might be able to agree that reinforcement learning from human feedback techniques are a good puzzle piece to have (we don’t have to write a reward function by hand! that’s progress!), even if pessimists aren’t particularly encouraged overall (because Goodhart’s curse, inner-misaligned mesa-optimizers, &c. are still going to kill you).
  - Raemon 7 May 2022 20:19 UTC
    4 points
    Parent
    Thanks, I found it useful to have this explanation alongside Paul’s. (I think each explanation would have helped a bit on it’s own but having it explained in two different language gave me a clearer sense of how the math worked and how to conceptualize it)
    - Zack_M_Davis 6 Jun 2022 6:09 UTC
      15 points
      Parent
      More intuitive illustration with no logarithms: your plane crashed in the ocean. To survive, you must swim to shore. You know that the shore is west, but you don’t know how far.
      
      The optimist thinks the shore is just over the horizon; we only need to swim a few miles and we’ll almost certainly make it. The pessimist thinks the shore is a thousand miles away and we will surely die. But the optimist and pessimist can both agree on how far we’ve swum up to this point, and that the most dignified course of action is “Swim west as far as you can.”
- paulfchristiano 7 May 2022 19:48 UTC
  11 points
  Parent
  Suppose that Eliezer thinks there is a 99% risk of doom, and I think there is a 20% risk of doom.
  Suppose that we solve some problem we both think of as incredibly important, like we find a way to solve ontology identification and make sense of the alien knowledge a model has about the world and about how to think, and it actually looks pretty practical and promising and suggests an angle of attack on other big theoretical problems and generally suggests all these difficulties may be more tractable than we thought.
  If that’s an incredible smashing success maybe my risk estimate has gone down from 20% to 10%, cutting risk in half.
  And if it’s an incredible smashing success maybe Eliezer thinks that risk has gone down from 99% to 98%, cutting risk by ~1%.
  I think there are basically just two separate issues at stake:
  - How much does this help solve the problem? I think mostly captured by bits of log odds reduction, and not where the real disagreement is.
  - How much are we doomed anyway so it doesn’t matter?
- Thomas Kwa 7 May 2022 19:13 UTC
  7 points
  Parent
  Suppose that to solve alignment the quality of our alignment research effort has to be greater than some threshold. If the distribution of possible output quality is logistic, and research moves the mean of the distribution, then I think we gain a constant amount of log-odds per unit of research quality, regardless of where we think the threshold is.