Likelihood ratios can be easier to evaluate than absolute probabilities insofar as you can focus on the meaning of a single piece of evidence, separately from the context of everything else you believe.
Suppose we’re trying to pick a multiple-dial combination lock (in the dark, where we can’t see how many dials there are). If we somehow learn that the first digit is 3, we can agree that that’s lg10≈3.32 bits of progress towards cracking the lock, even if you think there’s three dials total (such that we only need 6.64 more bits) and I think there’s 10 dials total (such that we need 29.88 more bits).
Thanks, I found it useful to have this explanation alongside Paul’s. (I think each explanation would have helped a bit on it’s own but having it explained in two different language gave me a clearer sense of how the math worked and how to conceptualize it)
More intuitive illustration with no logarithms: your plane crashed in the ocean. To survive, you must swim to shore. You know that the shore is west, but you don’t know how far.
The optimist thinks the shore is just over the horizon; we only need to swim a few miles and we’ll almost certainly make it. The pessimist thinks the shore is a thousand miles away and we will surely die. But the optimist and pessimist can both agree on how far we’ve swum up to this point, and that the most dignified course of action is “Swim west as far as you can.”
Likelihood ratios can be easier to evaluate than absolute probabilities insofar as you can focus on the meaning of a single piece of evidence, separately from the context of everything else you believe.
Suppose we’re trying to pick a multiple-dial combination lock (in the dark, where we can’t see how many dials there are). If we somehow learn that the first digit is 3, we can agree that that’s lg10≈3.32 bits of progress towards cracking the lock, even if you think there’s three dials total (such that we only need 6.64 more bits) and I think there’s 10 dials total (such that we need 29.88 more bits).
Similarly, alignment pessimists and optimists might be able to agree that reinforcement learning from human feedback techniques are a good puzzle piece to have (we don’t have to write a reward function by hand! that’s progress!), even if pessimists aren’t particularly encouraged overall (because Goodhart’s curse, inner-misaligned mesa-optimizers, &c. are still going to kill you).
Thanks, I found it useful to have this explanation alongside Paul’s. (I think each explanation would have helped a bit on it’s own but having it explained in two different language gave me a clearer sense of how the math worked and how to conceptualize it)
More intuitive illustration with no logarithms: your plane crashed in the ocean. To survive, you must swim to shore. You know that the shore is west, but you don’t know how far.
The optimist thinks the shore is just over the horizon; we only need to swim a few miles and we’ll almost certainly make it. The pessimist thinks the shore is a thousand miles away and we will surely die. But the optimist and pessimist can both agree on how far we’ve swum up to this point, and that the most dignified course of action is “Swim west as far as you can.”