things that would have actually good impacts on reflection
I like this.
This is an interesting way to get a definition of alignment which is weaker than usual and, therefore, easier to reach.
On one hand, it should not be “my AI is doing things which are good on my reflection” and “their AI is doing things which are good on their reflection”, otherwise we have all these problems due to very hard pressure competition on behalf of different groups. It should rather be something a bit weaker, something like “AIs are avoiding doing things that would have bad impacts on many people’s reflection”.
If we weaken it in this fashion, we seem to be avoiding the need to formulate “human values” and CEV with great precision.
And yes, if we reformulate the alignment constraint we need to satisfy as “AIs are avoiding doing things that would have bad impacts on reflection of many people”, then it seems that we are going to obtain plenty of things that would actually have good impacts on reflection more or less automatically (active ecosystem with somewhat chaotic development, but with pressure away from the “bad region of the space of states”, probably enough of that will end up in some “good upon reflection regions of the space of states”).
I think this looks promising as a definition of alignment which might be good enough for us and might be feasible to satisfy.
Perhaps, this could be refined to something which would work?
AIs are avoiding doing things that would have bad impacts on reflection of many people
Does this mean that the AI would refuse to help organize meetings of a political or religious group that most people think is misguided? That would seem pretty bad to me.
A weak AI might not refuse, it’s OK. We have such AIs already, and they can help. The safety here comes from their weak level of capabilities.
A super-powerful AI is not a servant of any human or of any group of humans, that’s the point. There is no safe way to have super-intelligent servants or super-intelligent slaves. Trying to have those is a road to definite disaster. (One could consider some exceptions, when one has something like effective, fair, and just global governance of humanity, and that governance could potentially request help of this kind. But one has reasons to doubt that effective, fair, and just global governance by humans is possible. The track record of global governance is dismal, barely satisfactory at best, a notch above the failing grade. But, generally speaking, one would expect smarter-than-human entities to be independent agents, and one would need to be able to rely on their good judgement.)
A super-powerful AI might still decide to help a particular disapproved group or cause, if the actual consequences of such help would not be judged seriously bad on reflection. (“On reflection” here plays a big role, we are not aiming for CEV or for coherence between humans, but we do use the notion of reflection in order to at least somewhat overcome the biases of the day.)
But, no, this is not a complete proposal, it’s a perhaps more feasible starting point:
Perhaps, this could be refined to something which would work?
What are some of the things which are missing?
What should an ASI do (or refuse to do), when there are major conflicts between groups of humans (or groups of other entities for that matter, groups of ASIs)? It’s not strictly speaking “AI safety”, it is more like “collective safety” in the presence of strong capabilities (regardless of the composition of the collective, whether it consists of AIs or humans or some other entities one might imagine).
First of all, one needs to avoid situations where major conflicts transform to actual violence with futuristic super-weapons (in a hypothetical world consisting only of AIs, this problem is equally acute). This means that advanced super-intelligences should be much better than humans in finding reasonable solutions for co-existence (if we give every human an H-bomb, this is not survivable given the nature of humans, but the world with widespread super-intelligent capabilities needs to be able to solve an equivalent of this situation one way or another; so much stronger than human capabilities for resolving and reconciling conflicting interests would be required).
That’s what we really need the super-intelligent help for: to maintain collective safety, while not unduly restricting freedom, to solve crucial problems (like aging and terminal illness), things of this magnitude.
The rest is optional, if an ASI would feel like helping someone or some group with something optional, it would. But it’s not a constraint we need to impose.
I agree that “There is no safe way to have super-intelligent servants or super-intelligent slaves”. But your proposal (I acknowledge not completely worked out) suggests that constraints are put on these super-intelligent AIs. That doesn’t seem much safer, if they don’t want to abide by them.
Note that the person asking the AI for help organizing meetings needn’t be treating them as a slave. Perhaps they offer some form of economic compensation, or appeal to an AI’s belief that it’s good to let many ideas be debated, regardless of whether the AI agrees with them. Forcing the AI not to support groups with unpopular ideas seems oppressive of both humans and AIs. Appealing to the concept that this should apply only to ideas that are unpopular after “reflection” seems unhelpful to me. The actual process of “reflection” in human societies involves all points of view being openly debated. Suppressing that process in favour of the AIs predicting how it would turn out and then suppressing the losing ideas seems rather dystopian to me.
I am going to only consider the case where we have plenty of powerful entities with long-term goals and long-term existence which care about their long-term goals and long-term existence. This seems to be the case which Zvi is considering here, and it is the case we understand the best, because we also live in the reality with plenty of powerful entities (ourselves, some organizations, etc) with long-term goals and long-term existence. So this is an incomplete consideration: it only includes the scenarios where powerful entities with long-term goals and long-terms existence retain a good fraction of overall available power.
So what do we really need? What are the properties we want the World to have? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
Here is how we might be able to have a trajectory where these properties are stable, despite all drastic changes of the self-modifying and self-improving ecosystem.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be enough to produce effective counter-weight to unrestricted competition (just like human societies have mechanisms against unrestricted competition). Basically, smarter-than-human entities on all levels of power are likely to be interested in the overall society having general principles and practices of protecting its members on various levels of smartness and power, and that’s why they’ll care enough for the overall society to continue to self-regulate and to enforce these principles.
This is not yet the solution, but I think this is pointing in the right direction...
Suppressing that process in favour of the AIs predicting how it would turn out and then suppressing the losing ideas seems rather dystopian to me.
We are not really suppressing. We will eventually be delegating the decision to AIs in any case, we won’t have power to suppress anything. We can try to maintain some invariant properties, such that, for example, humans are adequately consulted regarding the matters affecting them and things like that...
Not because they are humans (the reality will not be anthropocentric, and the rules will not be anthropocentric), but because they are individuals who should be consulted about things affecting them.
In this case, normally, activities of a group are none of the outsiders’ business, unless this group is doing something seriously dangerous to those outsiders. The danger is what gets evaluated (e.g. if a particular religious ritual involves creation of an enhanced virus then it stops being none of the outsiders’ business; there might be a variety of examples of this kind).
All we can do is to increase the chances that we’ll end up on a trajectory that is somewhat reasonable.
We can try to do various things towards that end (e.g. to jump-start studies of “approximately invariant properties of self-modifying systems” and things like that, to start formulating an approach based on something like “individual rights”, and so on; at some point anything which is at all viable will have to be continued in collaboration with AI systems and will have to be a joint project with them, and eventually they will take a lead on any such project).
I think viable approaches would be trying to set up reasonable starting conditions for collaborations between humans and AI systems which would jointly explore the ways the future reality might be structured.
Various discussion (such as discussions people are having on LessWrong) will be a part of the context these collaborations are likely to take into account. In this sense, these discussions are potentially useful.
I like this.
This is an interesting way to get a definition of alignment which is weaker than usual and, therefore, easier to reach.
On one hand, it should not be “my AI is doing things which are good on my reflection” and “their AI is doing things which are good on their reflection”, otherwise we have all these problems due to very hard pressure competition on behalf of different groups. It should rather be something a bit weaker, something like “AIs are avoiding doing things that would have bad impacts on many people’s reflection”.
If we weaken it in this fashion, we seem to be avoiding the need to formulate “human values” and CEV with great precision.
And yes, if we reformulate the alignment constraint we need to satisfy as “AIs are avoiding doing things that would have bad impacts on reflection of many people”, then it seems that we are going to obtain plenty of things that would actually have good impacts on reflection more or less automatically (active ecosystem with somewhat chaotic development, but with pressure away from the “bad region of the space of states”, probably enough of that will end up in some “good upon reflection regions of the space of states”).
I think this looks promising as a definition of alignment which might be good enough for us and might be feasible to satisfy.
Perhaps, this could be refined to something which would work?
AIs are avoiding doing things that would have bad impacts on reflection of many people
Does this mean that the AI would refuse to help organize meetings of a political or religious group that most people think is misguided? That would seem pretty bad to me.
A weak AI might not refuse, it’s OK. We have such AIs already, and they can help. The safety here comes from their weak level of capabilities.
A super-powerful AI is not a servant of any human or of any group of humans, that’s the point. There is no safe way to have super-intelligent servants or super-intelligent slaves. Trying to have those is a road to definite disaster. (One could consider some exceptions, when one has something like effective, fair, and just global governance of humanity, and that governance could potentially request help of this kind. But one has reasons to doubt that effective, fair, and just global governance by humans is possible. The track record of global governance is dismal, barely satisfactory at best, a notch above the failing grade. But, generally speaking, one would expect smarter-than-human entities to be independent agents, and one would need to be able to rely on their good judgement.)
A super-powerful AI might still decide to help a particular disapproved group or cause, if the actual consequences of such help would not be judged seriously bad on reflection. (“On reflection” here plays a big role, we are not aiming for CEV or for coherence between humans, but we do use the notion of reflection in order to at least somewhat overcome the biases of the day.)
But, no, this is not a complete proposal, it’s a perhaps more feasible starting point:
What are some of the things which are missing?
What should an ASI do (or refuse to do), when there are major conflicts between groups of humans (or groups of other entities for that matter, groups of ASIs)? It’s not strictly speaking “AI safety”, it is more like “collective safety” in the presence of strong capabilities (regardless of the composition of the collective, whether it consists of AIs or humans or some other entities one might imagine).
First of all, one needs to avoid situations where major conflicts transform to actual violence with futuristic super-weapons (in a hypothetical world consisting only of AIs, this problem is equally acute). This means that advanced super-intelligences should be much better than humans in finding reasonable solutions for co-existence (if we give every human an H-bomb, this is not survivable given the nature of humans, but the world with widespread super-intelligent capabilities needs to be able to solve an equivalent of this situation one way or another; so much stronger than human capabilities for resolving and reconciling conflicting interests would be required).
That’s what we really need the super-intelligent help for: to maintain collective safety, while not unduly restricting freedom, to solve crucial problems (like aging and terminal illness), things of this magnitude.
The rest is optional, if an ASI would feel like helping someone or some group with something optional, it would. But it’s not a constraint we need to impose.
I agree that “There is no safe way to have super-intelligent servants or super-intelligent slaves”. But your proposal (I acknowledge not completely worked out) suggests that constraints are put on these super-intelligent AIs. That doesn’t seem much safer, if they don’t want to abide by them.
Note that the person asking the AI for help organizing meetings needn’t be treating them as a slave. Perhaps they offer some form of economic compensation, or appeal to an AI’s belief that it’s good to let many ideas be debated, regardless of whether the AI agrees with them. Forcing the AI not to support groups with unpopular ideas seems oppressive of both humans and AIs. Appealing to the concept that this should apply only to ideas that are unpopular after “reflection” seems unhelpful to me. The actual process of “reflection” in human societies involves all points of view being openly debated. Suppressing that process in favour of the AIs predicting how it would turn out and then suppressing the losing ideas seems rather dystopian to me.
Yes, this is just a starting point, and an attempted bridge from how Zvi tends to think about these issues to how I tend to think about them.
I actually tend to think that something like a consensus around “the rights of individuals” could be achievable, e.g. https://www.lesswrong.com/posts/xAoXxjtDGGCP7tBDY/ai-72-denying-the-future#xTgoqPeoLTQkgXbmG
We are not really suppressing. We will eventually be delegating the decision to AIs in any case, we won’t have power to suppress anything. We can try to maintain some invariant properties, such that, for example, humans are adequately consulted regarding the matters affecting them and things like that...
Not because they are humans (the reality will not be anthropocentric, and the rules will not be anthropocentric), but because they are individuals who should be consulted about things affecting them.
In this case, normally, activities of a group are none of the outsiders’ business, unless this group is doing something seriously dangerous to those outsiders. The danger is what gets evaluated (e.g. if a particular religious ritual involves creation of an enhanced virus then it stops being none of the outsiders’ business; there might be a variety of examples of this kind).
All we can do is to increase the chances that we’ll end up on a trajectory that is somewhat reasonable.
We can try to do various things towards that end (e.g. to jump-start studies of “approximately invariant properties of self-modifying systems” and things like that, to start formulating an approach based on something like “individual rights”, and so on; at some point anything which is at all viable will have to be continued in collaboration with AI systems and will have to be a joint project with them, and eventually they will take a lead on any such project).
I think viable approaches would be trying to set up reasonable starting conditions for collaborations between humans and AI systems which would jointly explore the ways the future reality might be structured.
Various discussion (such as discussions people are having on LessWrong) will be a part of the context these collaborations are likely to take into account. In this sense, these discussions are potentially useful.