Why is this downvoted? Isn’t this one of the central theses of FAI?
Possible reasons:
I implicitly differentiated between AGI in general and the ability to recursively self-improve (which is usually lumped together on LW). I did this on purpose.
I included the ability to constrain such an AGI as a prerequisite to run it. I did this on purpose because friendliness is not enough if the AGI is free to hunt for vast utilities irregardless of tiny probabilities. Even an AGI equipped with perfect human-friendliness might try to hack the Matrix to support 3^^^^3 people rather than just a galactic civilisation. This problem isn’t solved and therefore, as suggested by Yudkowsky, it needs to be constrained using a “hack”.
I used the phrasing “taking over the universe” which is badly received yet factually correct if you got a fooming AI and want to use it to spawn a positive Singularity.
I said that you can’t expect other humans to be friendly which is not the biggest problem, it is stupidity.
I said one “ought” to concentrate on taking over the universe. I said this on purpose to highlight that I actually believe that to be the only sensible thing to do once fooming AI is possible because if you waste too much time with spatiotemporal bounded versions then someone who is ignorant of friendliness will launch one that isn’t constrained that way.
The comment might have been deemed unhelpful because it added nothing new to the debate.
That’s my analysis of why the comment might have initially been downvoted. Sadly most people who downvote don’t explain themselves, but I decided to stop complaining about that recently.
Awesome, thanks for the response. Do you know if there’s been any progress on the “expected utility maximization makes you do arbitrarily stupid things that won’t work” problem?
Though, stupidity is a form of un-Friendliness, isn’t it?
I only found out about the formalized version of that dilemma around a week ago. As far as I can tell it has not been shown that giving in to a Pascal’s mugging scenario would be irrational. It is merely our intuition that makes us believe that something is wrong with it. I am currently far too uneducated to talk about this in detail. What I am worried about is that basically all probability/utility calculations could be put into the same category (e.g. working to mitigate low-probability existential risks), where do you draw the line? You can be your own mugger if you weigh in enough expected utility to justify taking extreme risks.
What I am worried about is that basically all probability/utility calculations could be put into the same category (e.g. working to mitigate low-probability existential risks), where do you draw the line?
There’s a formalization I gave earlier that distinguishes Pascal’s Mugging from problems that just have big numbers in them. It’s not enough to have a really big utility; a Pascal’s Mugging is when you have a statement provided by another agent, such that just saying a bigger number (without providing additional evidence) increases what you think your expected utility is for some action, without bound.
This question has resurfaced enough times that I’m starting to think I ought to expand that into an article.
Possible reasons:
I implicitly differentiated between AGI in general and the ability to recursively self-improve (which is usually lumped together on LW). I did this on purpose.
I included the ability to constrain such an AGI as a prerequisite to run it. I did this on purpose because friendliness is not enough if the AGI is free to hunt for vast utilities irregardless of tiny probabilities. Even an AGI equipped with perfect human-friendliness might try to hack the Matrix to support 3^^^^3 people rather than just a galactic civilisation. This problem isn’t solved and therefore, as suggested by Yudkowsky, it needs to be constrained using a “hack”.
I used the phrasing “taking over the universe” which is badly received yet factually correct if you got a fooming AI and want to use it to spawn a positive Singularity.
I said that you can’t expect other humans to be friendly which is not the biggest problem, it is stupidity.
I said one “ought” to concentrate on taking over the universe. I said this on purpose to highlight that I actually believe that to be the only sensible thing to do once fooming AI is possible because if you waste too much time with spatiotemporal bounded versions then someone who is ignorant of friendliness will launch one that isn’t constrained that way.
The comment might have been deemed unhelpful because it added nothing new to the debate.
That’s my analysis of why the comment might have initially been downvoted. Sadly most people who downvote don’t explain themselves, but I decided to stop complaining about that recently.
Awesome, thanks for the response. Do you know if there’s been any progress on the “expected utility maximization makes you do arbitrarily stupid things that won’t work” problem?
Though, stupidity is a form of un-Friendliness, isn’t it?
I only found out about the formalized version of that dilemma around a week ago. As far as I can tell it has not been shown that giving in to a Pascal’s mugging scenario would be irrational. It is merely our intuition that makes us believe that something is wrong with it. I am currently far too uneducated to talk about this in detail. What I am worried about is that basically all probability/utility calculations could be put into the same category (e.g. working to mitigate low-probability existential risks), where do you draw the line? You can be your own mugger if you weigh in enough expected utility to justify taking extreme risks.
There’s a formalization I gave earlier that distinguishes Pascal’s Mugging from problems that just have big numbers in them. It’s not enough to have a really big utility; a Pascal’s Mugging is when you have a statement provided by another agent, such that just saying a bigger number (without providing additional evidence) increases what you think your expected utility is for some action, without bound.
This question has resurfaced enough times that I’m starting to think I ought to expand that into an article.
Minor correction: It may need a hack if it remains unsolved.