TL;DR: Worlds which meet our specified criteria but fail to meet some unspecified but vital criteria outnumber (vastly?) worlds that meet both our specified and unspecified criteria.
Is that an accurate recap? If so, I think there’s two things that need to be proven:
There will with high probability be important unspecified criteria in any given predicate.
The nature of the unspecified criteria is such that it is unfulfilled in a large majority of worlds which fulfill the specified criteria.
(1) is commonly accepted here (rightly so, IMO). But (2) seems to greatly depend on the exact nature of the stuff that you fail to specify and I’m not sure how it can be true in the general case.
EDIT: The more I think about this, the more I’m confused. I don’t see how this adds any substance to the claim that we don’t know how to write down our values.
EDIT2: If we get to the stage where this is feasible, we can measure the size of the problem by only providing half of our actual constraints to the oracle AI and measuring the frequency with which the hidden half happen to get fulfilled.
The more I think about this, the more I’m confused. I don’t see how this adds any substance to the claim that we don’t know how to write down our values.
This proposes a way to get an OK result even if we don’t quite write down our values correctly.
The nature of the unspecified criteria is such that it is unfulfilled in a large majority of worlds which fulfill the specified criteria.
That’s not exactly my claim. My claim is that things that are the best optimised for fulfilling our specified criteria are unlikely to satisfy our unspecified ones. It’s not a question of outnumbering (siren and marketing worlds are rare) but of scoring higher on our specified criteria.
TL;DR: Worlds which meet our specified criteria but fail to meet some unspecified but vital criteria outnumber (vastly?) worlds that meet both our specified and unspecified criteria.
Is that an accurate recap? If so, I think there’s two things that need to be proven:
There will with high probability be important unspecified criteria in any given predicate.
The nature of the unspecified criteria is such that it is unfulfilled in a large majority of worlds which fulfill the specified criteria.
(1) is commonly accepted here (rightly so, IMO). But (2) seems to greatly depend on the exact nature of the stuff that you fail to specify and I’m not sure how it can be true in the general case.
EDIT: The more I think about this, the more I’m confused. I don’t see how this adds any substance to the claim that we don’t know how to write down our values.
EDIT2: If we get to the stage where this is feasible, we can measure the size of the problem by only providing half of our actual constraints to the oracle AI and measuring the frequency with which the hidden half happen to get fulfilled.
This proposes a way to get an OK result even if we don’t quite write down our values correctly.
Ah, thank you for the explanation. I have complained about the proposed method in another comment. :)
http://lesswrong.com/lw/jao/siren_worlds_and_the_perils_of_overoptimised/aso6
That’s not exactly my claim. My claim is that things that are the best optimised for fulfilling our specified criteria are unlikely to satisfy our unspecified ones. It’s not a question of outnumbering (siren and marketing worlds are rare) but of scoring higher on our specified criteria.