How can you tell that some program is a fair extrapolation of your morality?
This is the main open question of FAI theory. (Although FAI doesn’t just extrapolate your revealed reliable moral intuitions, it should consider at least the whole mind as source data.)
If we create a program that gives 100% correct answers to all “realistic” moral questions that you deal with in real life, but gives grossly unintuitive and awful-sounding answers to many “unrealistic” moral questions like Torture vs Dustspecks or the Repugnant Conclusion, would you force yourself to trust it over your intuitions?
I don’t suppose agreeing on more reliable moral questions is an adequate criterion (sufficient condition), though I’d expect agreement on such questions to more or less hold. FAI needs to be backed by solid theory, explaining why exactly its answers are superior to moral intuition. That theory is what would force one to accept even counter-intuitive conclusions. Of course, one should be careful not to be fooled by a wrong theory, but being fooled by your own moral intuition is also always a possibility.
I admit I’m confused on this issue, but I feel that our instinctive judgments about unrealistic situations convey some non-zero information about our morality that needs to be preserved, too.
Maybe they do, but how much would you expect to learn about quasars from observations made by staring at the sky with your eyes?
We need better methods that don’t involve relying exclusively on vanilla moral intuitions. What kinds of methods would work, I don’t know, but I do know that moral intuition is not the answer. FAI refers to successful completion of this program, and so represents the answers more reliable than moral intuition.
FAI needs to be backed by solid theory, explaining why exactly its answers are superior to moral intuition.
If by “solid” you mean “internally consistent”, there’s no need to wait—you should adopt expected utilitarianism now and choose torture. If by “solid” you mean “agrees with our intuitions about real life”, we’re back to square one. If by “solid” you mean something else, please explain what exactly. It looks to me like you’re running circles around the is-ought problem without recognizing it.
If by “solid” you mean “internally consistent”, there’s no need to wait—you should adopt expected utilitarianism now and choose torture.
How could I possibly mean “internally consistent”? Being consistent conveys no information about a concept, aside from its non-triviality, and so can’t be a useful characteristic. And choosing specks is also “internally consistent”. Maybe I like specks in others’ eyes.
FAI theory should be reliably convincing and verifiable, preferably on the level of mathematical proofs. FAI theory describes how to formally define the correct answers to moral questions, but doesn’t at all necessarily help in intuitive understanding of what these answers are. It could be a formalization of “what we’d choose if we were smarter, knew more, had more time to think”, for example, which doesn’t exactly show how the answers look.
FAI doesn’t work with “us”, it works with world-states, which include all detail including whatever distinguishes present humans from hypothetical smarter people. A given situation that includes a smarter person is distinct from otherwise the same situation that includes a human person, and so these situations should be optimized differently.
I see your point, but my question still stands. You seem to take it on faith that an extrapolated smarter version of humanity would be friendly to present-day humanity and wouldn’t want to put it in unpleasant situations, or that they would and it’s “okay”. This is not quite as bad as believing that a paperclipper AI will “discover” morality on its own, but it’s close.
You seem to take it on faith that a hypothetical smarter version of humanity would be friendly to present-day humanity and wouldn’t want to put it in unpleasant situations, or that they would and it’s “okay”.
I don’t “take it on faith”, and the example with “if we were smarter” wasn’t supposed to be an actual stab at FAI theory.
On the other hand, if we define “smarter” as also keeping preference fixed (the alternative would be wrong, as a Smiley is also “smarter”, but clearly not what I meant), then smarter versions’ advice is by definition better. This, again, gives no technical guidance on how to get there, hence “formalization” word was essential in my comment. The “smarter” modifier is about as opaque as the whole of FAI.
You define “smarter” as keeping “preference” fixed, but you also define “preference” as the extrapolation of our moral intuitions as we become “smarter”. It’s circular. You’re right, this stuff is opaque.
This is the main open question of FAI theory. (Although FAI doesn’t just extrapolate your revealed reliable moral intuitions, it should consider at least the whole mind as source data.)
I don’t suppose agreeing on more reliable moral questions is an adequate criterion (sufficient condition), though I’d expect agreement on such questions to more or less hold. FAI needs to be backed by solid theory, explaining why exactly its answers are superior to moral intuition. That theory is what would force one to accept even counter-intuitive conclusions. Of course, one should be careful not to be fooled by a wrong theory, but being fooled by your own moral intuition is also always a possibility.
Maybe they do, but how much would you expect to learn about quasars from observations made by staring at the sky with your eyes?
We need better methods that don’t involve relying exclusively on vanilla moral intuitions. What kinds of methods would work, I don’t know, but I do know that moral intuition is not the answer. FAI refers to successful completion of this program, and so represents the answers more reliable than moral intuition.
If by “solid” you mean “internally consistent”, there’s no need to wait—you should adopt expected utilitarianism now and choose torture. If by “solid” you mean “agrees with our intuitions about real life”, we’re back to square one. If by “solid” you mean something else, please explain what exactly. It looks to me like you’re running circles around the is-ought problem without recognizing it.
How could I possibly mean “internally consistent”? Being consistent conveys no information about a concept, aside from its non-triviality, and so can’t be a useful characteristic. And choosing specks is also “internally consistent”. Maybe I like specks in others’ eyes.
FAI theory should be reliably convincing and verifiable, preferably on the level of mathematical proofs. FAI theory describes how to formally define the correct answers to moral questions, but doesn’t at all necessarily help in intuitive understanding of what these answers are. It could be a formalization of “what we’d choose if we were smarter, knew more, had more time to think”, for example, which doesn’t exactly show how the answers look.
Then the FAI risks putting us all in a situation we hate, which we’d love if only we were a bit smarter.
FAI doesn’t work with “us”, it works with world-states, which include all detail including whatever distinguishes present humans from hypothetical smarter people. A given situation that includes a smarter person is distinct from otherwise the same situation that includes a human person, and so these situations should be optimized differently.
I see your point, but my question still stands. You seem to take it on faith that an extrapolated smarter version of humanity would be friendly to present-day humanity and wouldn’t want to put it in unpleasant situations, or that they would and it’s “okay”. This is not quite as bad as believing that a paperclipper AI will “discover” morality on its own, but it’s close.
I don’t “take it on faith”, and the example with “if we were smarter” wasn’t supposed to be an actual stab at FAI theory.
On the other hand, if we define “smarter” as also keeping preference fixed (the alternative would be wrong, as a Smiley is also “smarter”, but clearly not what I meant), then smarter versions’ advice is by definition better. This, again, gives no technical guidance on how to get there, hence “formalization” word was essential in my comment. The “smarter” modifier is about as opaque as the whole of FAI.
You define “smarter” as keeping “preference” fixed, but you also define “preference” as the extrapolation of our moral intuitions as we become “smarter”. It’s circular. You’re right, this stuff is opaque.
It’s a description, connection between the terms, but not a definition (pretty useless, but not circular).