So you can interpret this post as asking again what those preferences ought to be.
I suspect that the actual answer is “whatever they actually are, which is a lot of irreducible data”, so the question is wrong: instead we should ask how to specify a process of extracting the preferences (in the required format) from people-consisting-of-atoms. Thinking of the actual content of values (as opposed to systems for representing arbitrary values) is about as useful as trying to understand which shapes in animal kingdom are the closest to the shapes of Earth’s continents: you may find some “match”, but it won’t be accurate enough to be of any use.
I agree that somebody should be exploring your approach. But a major problem with it that I see is, once you’ve extracted a set of preferences, how do you know those are the right ones? How do you know there isn’t a subtle bug in your theory or code that corrupted the preferences?
Also, what if FAI or AI in general turns out to be infeasible? We humans still need to decide what to do, right?
Oh, also, one motivation for this post was Eliezer’s claim in The “Intuitions” Behind “Utilitarianism” that while the preferences of individual humans are complex, the method of aggregating them together should be simple. I’m arguing against the latter part of that claim. I guess your approach would extract the preferences of humanity-as-a-whole somehow, which perhaps avoids this particular issue.
Strictly speaking, I don’t expect to have “preferences” “extracted” in any explicit sense, so there is no point at which you can look over the result. Rather, the aim is to specify a system that will act according to the required preferences, when instantiated in the environment that provides the necessary info about what that is. This preference-absorbing construction would need to be understood well in itself, probably also on simple specific examples, not debugged on a load of incomprehensible data whose processing is possibly already a point where you have to let go.
Do you have a guess for the limit to the kind of example we’ll be able to understand? The preferences of a hydrogen atom? A DNA molecule? A microbe? Or am I completely misunderstanding you?
No idea, but when you put it like this, an atom may be not simpler than a person. More details appear as you go further (in number of interactions) from the interface of a system that looks for detail (e.g. from what a scientist can see directly and now, to what they can theorize based on indirect evidence, to what they can observe several years in the future), not as you go up from some magical “lowest level”. Lowest level may make sense for human preference, where we can quite confidently assume that most of the macroscopically irrelevant subatomic detail in a sample human doesn’t make any interesting difference for their preference, but in general this assumption won’t hold (e.g. you may imagine a person implemented on a femtocomputer).
Since one can’t know all about the real world, the idea is to minimize the number of assumptions made about it, including laws of physics and a lot of the stuff that culturally, we do know. An AI could be built (as a thought experiment) e.g. if you’ve never left some sandbox computer simulation with no view of the outside, not knowing yourself about what it’s like out there, so that when the AI is completed, it may be allowed on the outside. The process of AI getting on the outside should be according to your preference, that is in some way reflect what you’d do if you yourself would learn of the outside world, with its physics, valuable configurations, and new ways of being implemented in it (your preference is where the problem of induction is redirected: there are no assumptions of the unknown stuff, but your preference confers what is to be done depending on what is discovered). The process of AI getting new knowledge always starts at its implementation, and in terms of how this implementation sees its environment. Simple examples are simple from its point of view, so it should be something inside a closed world (e.g. a computer model with no interaction with the outside, or a self-contained mathematical structure), exactly the limitation that this approach seeks to make unnecessary while retaining knowability.
I only sort of understand what you mean. BTW, we really need to work to overcome this communications barrier between us, and perhaps also with Steve Rayhawk. I can generally understand Steve’s comments much better than yours, but maybe that’s just because may of his ideas are similar to mine. When he introduces ideas that are new to me, I have trouble understanding him as well.
What can we do? Any ideas? Do you guys have similar trouble understanding me?
Back to the topic at hand, I guess I was asking for some assurance that in your FAI approach we’d be able to verify the preference-extraction method on some examples that we can understand before we have to “let go”. I got some information out of what you wrote, but I don’t know if it answers that question.
a self-contained mathematical structure
Every self-contained mathematical structure is also contained within larger mathematical structures. For example, our universe must exist both as a stand-alone mathematical structure, and also as simulations within larger universes, and we have preferences both for the smaller mathematical structure, as well as the larger ones. I’m not sure if you’ve already taken that into account, but thought I’d point it out in case you haven’t.
Every self-contained mathematical structure is also contained within larger mathematical structures. For example, our universe must exist both as a stand-alone mathematical structure, and also as simulations within larger universes, and we have preferences both for the smaller mathematical structure, as well as the larger ones. I’m not sure if you’ve already taken that into account, but thought I’d point it out in case you haven’t.
It’s useless to discuss fine points in an informal description like this. At least, what is meant by “mathematical structures” should be understood, depending on that your point may be correct, wrong, or meaningless. In this case, I simply referred to taking the problem inside a limited universe of discourse, as opposed to freely interacting with the world.
I suspect that the actual answer is “whatever they actually are, which is a lot of irreducible data”, so the question is wrong: instead we should ask how to specify a process of extracting the preferences (in the required format) from people-consisting-of-atoms. Thinking of the actual content of values (as opposed to systems for representing arbitrary values) is about as useful as trying to understand which shapes in animal kingdom are the closest to the shapes of Earth’s continents: you may find some “match”, but it won’t be accurate enough to be of any use.
I agree that somebody should be exploring your approach. But a major problem with it that I see is, once you’ve extracted a set of preferences, how do you know those are the right ones? How do you know there isn’t a subtle bug in your theory or code that corrupted the preferences?
Also, what if FAI or AI in general turns out to be infeasible? We humans still need to decide what to do, right?
Oh, also, one motivation for this post was Eliezer’s claim in The “Intuitions” Behind “Utilitarianism” that while the preferences of individual humans are complex, the method of aggregating them together should be simple. I’m arguing against the latter part of that claim. I guess your approach would extract the preferences of humanity-as-a-whole somehow, which perhaps avoids this particular issue.
Strictly speaking, I don’t expect to have “preferences” “extracted” in any explicit sense, so there is no point at which you can look over the result. Rather, the aim is to specify a system that will act according to the required preferences, when instantiated in the environment that provides the necessary info about what that is. This preference-absorbing construction would need to be understood well in itself, probably also on simple specific examples, not debugged on a load of incomprehensible data whose processing is possibly already a point where you have to let go.
Do you have a guess for the limit to the kind of example we’ll be able to understand? The preferences of a hydrogen atom? A DNA molecule? A microbe? Or am I completely misunderstanding you?
No idea, but when you put it like this, an atom may be not simpler than a person. More details appear as you go further (in number of interactions) from the interface of a system that looks for detail (e.g. from what a scientist can see directly and now, to what they can theorize based on indirect evidence, to what they can observe several years in the future), not as you go up from some magical “lowest level”. Lowest level may make sense for human preference, where we can quite confidently assume that most of the macroscopically irrelevant subatomic detail in a sample human doesn’t make any interesting difference for their preference, but in general this assumption won’t hold (e.g. you may imagine a person implemented on a femtocomputer).
Since one can’t know all about the real world, the idea is to minimize the number of assumptions made about it, including laws of physics and a lot of the stuff that culturally, we do know. An AI could be built (as a thought experiment) e.g. if you’ve never left some sandbox computer simulation with no view of the outside, not knowing yourself about what it’s like out there, so that when the AI is completed, it may be allowed on the outside. The process of AI getting on the outside should be according to your preference, that is in some way reflect what you’d do if you yourself would learn of the outside world, with its physics, valuable configurations, and new ways of being implemented in it (your preference is where the problem of induction is redirected: there are no assumptions of the unknown stuff, but your preference confers what is to be done depending on what is discovered). The process of AI getting new knowledge always starts at its implementation, and in terms of how this implementation sees its environment. Simple examples are simple from its point of view, so it should be something inside a closed world (e.g. a computer model with no interaction with the outside, or a self-contained mathematical structure), exactly the limitation that this approach seeks to make unnecessary while retaining knowability.
I only sort of understand what you mean. BTW, we really need to work to overcome this communications barrier between us, and perhaps also with Steve Rayhawk. I can generally understand Steve’s comments much better than yours, but maybe that’s just because may of his ideas are similar to mine. When he introduces ideas that are new to me, I have trouble understanding him as well.
What can we do? Any ideas? Do you guys have similar trouble understanding me?
Back to the topic at hand, I guess I was asking for some assurance that in your FAI approach we’d be able to verify the preference-extraction method on some examples that we can understand before we have to “let go”. I got some information out of what you wrote, but I don’t know if it answers that question.
Every self-contained mathematical structure is also contained within larger mathematical structures. For example, our universe must exist both as a stand-alone mathematical structure, and also as simulations within larger universes, and we have preferences both for the smaller mathematical structure, as well as the larger ones. I’m not sure if you’ve already taken that into account, but thought I’d point it out in case you haven’t.
It’s useless to discuss fine points in an informal description like this. At least, what is meant by “mathematical structures” should be understood, depending on that your point may be correct, wrong, or meaningless. In this case, I simply referred to taking the problem inside a limited universe of discourse, as opposed to freely interacting with the world.