I’m surprised you didn’t mention financial solutions. E.g. “write a contract that pays the doctor more for every year that I live”. Although I suppose this might still be vulnerable to goodharting. For example the doctor may keep me “alive” indefinitely in a medical coma.
Thank you for the comment! :) Since this one is the most upvoted one I’ll respond here, although similar points were also brought up in other comments.
I totally agree, this is something that I should have included (or perhaps even focused on). I’ve done a lot of thinking about this prior to writing the post (and lots of people have suggested all kinds of fancy payment schemes to me, f.e. increasing payment rapidly for every year above life expectancy). I’ve converged on believing that all payment schemes that vary as a function of time can probably be goodharted in some way or other (f.e. through medical coma like you suggest, or by just making you believe you have great life quality). But I did not have a great idea for how to get a conceptual handle on that family of strategies, so I just subsumed them under “just pay the doctor, dammit”.
After thinking about it again, (assuming we can come up with something that cannot be goodharted) I have the intuition that all of the time-varying payment schemes are somehow related to assassination markets, since you basically get to pick the date of your own death by fixing the payment scheme (at some point the amount of effort the doctor puts in will be higher than the payment you can offer, at which point the greedy doctor will just give up). So ideally you would want to construct the time-varying payment scheme in exactly that way that pushed the date of assassination as far into the future as possible. When you have a mental model of how the doctor makes decisions, this is just a “simple” optimization process.
But when you don’t have this (since the doctor is smarter), you’re kind of back to square one. And then (I think) it possibly again comes down to setting up multiple doctors to cooperate or compete to force them to be truthful through a time-invariant payment scheme. Not sure at all though.
I’m surprised you didn’t mention financial solutions. E.g. “write a contract that pays the doctor more for every year that I live”. Although I suppose this might still be vulnerable to goodharting. For example the doctor may keep me “alive” indefinitely in a medical coma.
Thank you for the comment! :) Since this one is the most upvoted one I’ll respond here, although similar points were also brought up in other comments.
I totally agree, this is something that I should have included (or perhaps even focused on). I’ve done a lot of thinking about this prior to writing the post (and lots of people have suggested all kinds of fancy payment schemes to me, f.e. increasing payment rapidly for every year above life expectancy). I’ve converged on believing that all payment schemes that vary as a function of time can probably be goodharted in some way or other (f.e. through medical coma like you suggest, or by just making you believe you have great life quality). But I did not have a great idea for how to get a conceptual handle on that family of strategies, so I just subsumed them under “just pay the doctor, dammit”.
After thinking about it again, (assuming we can come up with something that cannot be goodharted) I have the intuition that all of the time-varying payment schemes are somehow related to assassination markets, since you basically get to pick the date of your own death by fixing the payment scheme (at some point the amount of effort the doctor puts in will be higher than the payment you can offer, at which point the greedy doctor will just give up). So ideally you would want to construct the time-varying payment scheme in exactly that way that pushed the date of assassination as far into the future as possible. When you have a mental model of how the doctor makes decisions, this is just a “simple” optimization process.
But when you don’t have this (since the doctor is smarter), you’re kind of back to square one. And then (I think) it possibly again comes down to setting up multiple doctors to cooperate or compete to force them to be truthful through a time-invariant payment scheme. Not sure at all though.