“There’s another interpretation of this, which I think might be better where you can model people like AI_WAIFU as modeling timelines where we don’t win with literally zero value. That there is zero value whatsoever in timelines where we don’t win. AndEliezer, or people like me, are saying, ‘Actually, we should value them in proportion to how close to winning we got’. Because that is more healthy… It’s reward shaping! We should give ourselves partial reward for getting partially the way. He says that in the post, how we should give ourselves dignity points in proportion to how close we get.
And this is, in my opinion, a much psychologically healthier way to actually deal with the problem. This is how I reason about the problem. I expect to die. I expect this not to work out. But hell, I’m going to give it a good shot and I’m going to have a great time along the way. I’m going to spend time with great people. I’m going to spend time with my friends. We’re going to work on some really great problems. And if it doesn’t work out, it doesn’t work out. But hell, we’re going to die with some dignity. We’re going to go down swinging.”
Use the dignity heuristic as reward shaping