1) Logical depth seems super cool to me, and is perhaps the best way I’ve seen for quantifying “interestingness” without mistakenly equating it with “unlikeliness” or “incompressibility”.
2) Despite this, Manfred’s brain-encoding-halting-times example illustrates a way a D(u/h) / D(u) optimized future could be terrible… do you think this future would not obtain, because despite being human-brain-based, would not in fact make much use of being on a human brain? That is, it would have extremely high D(u) and therefore be penalized?
I think it would be easy to rationalize/over-fit our intuitions about this formula to convince ourselves that it matches our intuitions about what is a good future. More realistically, I suspect that our favorite futures have relatively high D(u/h) / D(u) but not the highest value of D(u/h) / D(u).
1) Thanks, that’s encouraging feedback! I love logical depth as a complexity measure. I’ve been obsessed with it for years and it’s nice to have company.
2) Yes, my claim is that Manfred’s doomsday cases would have very high D(u) and would be penalized. That is the purpose of having that term in the formula.
I agree with your suspicion that our favorite future have relatively high D(u/h) / D(u) but not the highest value of D(u/h) / D(u). I suppose I’d defend a weaker claim, that a D(u/h) / D(u) supercontroller would not be an existential threat. One reason for this is that D(u) is so difficult to compute that it would be pretty bogged down....
One reason for making a concrete proposal of an objective function is that if it pretty good, that means maybe it’s a starting point for further refinement.
I agree with your suspicion that our favorite future have relatively high D(u/h) / D(u) but not the highest value of D(u/h) / D(u).
Many utility functions have the same feature. For example, I could give the AI some flying robots with cameras, and teach it to count smiling people in the street by simple image recognition algorithms. That utility function would also assign a high score to our favorite future, but not the highest score. Of course the smile maximizer is one of LW’s recurring nightmares, like the paperclip maximizer.
I suppose I’d defend a weaker claim, that a D(u/h) / D(u) supercontroller would not be an existential threat. One reason for this is that D(u) is so difficult to compute that it would be pretty bogged down...
Any function that’s computationally hard to optimize would have the same feature.
1) Logical depth seems super cool to me, and is perhaps the best way I’ve seen for quantifying “interestingness” without mistakenly equating it with “unlikeliness” or “incompressibility”.
2) Despite this, Manfred’s brain-encoding-halting-times example illustrates a way a D(u/h) / D(u) optimized future could be terrible… do you think this future would not obtain, because despite being human-brain-based, would not in fact make much use of being on a human brain? That is, it would have extremely high D(u) and therefore be penalized?
I think it would be easy to rationalize/over-fit our intuitions about this formula to convince ourselves that it matches our intuitions about what is a good future. More realistically, I suspect that our favorite futures have relatively high D(u/h) / D(u) but not the highest value of D(u/h) / D(u).
1) Thanks, that’s encouraging feedback! I love logical depth as a complexity measure. I’ve been obsessed with it for years and it’s nice to have company.
2) Yes, my claim is that Manfred’s doomsday cases would have very high D(u) and would be penalized. That is the purpose of having that term in the formula.
I agree with your suspicion that our favorite future have relatively high D(u/h) / D(u) but not the highest value of D(u/h) / D(u). I suppose I’d defend a weaker claim, that a D(u/h) / D(u) supercontroller would not be an existential threat. One reason for this is that D(u) is so difficult to compute that it would be pretty bogged down....
One reason for making a concrete proposal of an objective function is that if it pretty good, that means maybe it’s a starting point for further refinement.
Many utility functions have the same feature. For example, I could give the AI some flying robots with cameras, and teach it to count smiling people in the street by simple image recognition algorithms. That utility function would also assign a high score to our favorite future, but not the highest score. Of course the smile maximizer is one of LW’s recurring nightmares, like the paperclip maximizer.
Any function that’s computationally hard to optimize would have the same feature.
What other nice features does your proposal have?