Yes, I agree formalisation is needed. See comment by flandry39 in this thread on how one might go about doing so.
Worth considering is that there are actually two aspects that make it hard to define the term ‘alignment’ such to allow for sufficiently rigorous reasoning:
It must allow for logically valid reasoning (therefore requiring formalisation).
It must allow for empirically sound reasoning (ie. the premises correspond with how the world works).
In my reply above, I did not help you much with (1.). Though even while still using the English language, I managed to restate a vague notion of alignment in more precise terms.
Notice how it does help to define the correspondences with how the world works (2.):
“That ‘AGI’ continuing to exist, in some modified form, does not result eventually in changes to world conditions/contexts that fall outside the ranges that existing humans could survive under.”
The reason why 2. is important is that just formalisation is not enough. Just describing and/or deriving logical relations between mathematical objects does not say something about the physical world. Somewhere in your fully communicated definition there also needs to be a description of how the mathematical objects correspond with real-world phenonema. Often, mathematicians do this by talking to collaborators about what symbols mean while they scribble the symbols out on eg. a whiteboard.
But whatever way you do it, you need to communicate how the definition corresponds to things happening in the real world, in order to show that it is a rigorous definition. Otherwise, others could still critique you that the formally precise definition is not rigorous, because it does not adequately (or explicitly) represent the real-world problem.
This is maybe not the central point, but I note that your definition of “alignment” doesn’t precisely capture what I understand “alignment” or a good outcome from AI to be:
‘AGI’ continuing to exist
AGI could be very catastrophic even when it stops existing a year later.
eventually
If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless.
ranges that existing humans could survive under
I don’t know whether that covers “humans can survive on mars with a space-suit”,
but even then, if humans evolve/change to handle situations that they currently do not survive under, that could be part of an acceptable outcome.
Thanks! These are thoughtful points. See some clarifications below:
AGI could be very catastrophic even when it stops existing a year later.
You’re right. I’m not even covering all the other bad stuff that could happen in the short-term, that we might still be able to prevent, like AGI triggering global nuclear war.
What I’m referring to is unpreventable convergence on extinction.
If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless.
Agreed that could be a good outcome if it could be attainable.
In practice, the convergence reasoning is about total human extinction happening within 500 years after ‘AGI’ has been introduced into the environment (with very very little probability remainder above that).
In theory of course, to converge toward 100% chance, you are reasoning about going across a timeline of potentially infinite span.
I don’t know whether that covers “humans can survive on mars with a space-suit”,
Yes, it does cover that. Whatever technological means we could think of shielding ourselves, or ‘AGI’ could come up with to create as (temporary) barriers against the human-toxic landscape it creates, still would not be enough.
if humans evolve/change to handle situations that they currently do not survive under
Unfortunately, this is not workable. The mismatch between the (expanding) set of conditions needed for maintaining/increasing configurations of the AGI artificial hardware and for our human organic wetware is too great.
Also, if you try entirely changing our underlying substrate to the artificial substrate, you’ve basically removed the human and are left with ‘AGI’. The lossy scans of human brains ported onto hardware would no longer feel as ‘humans’ can feel, and will be further changed/selected for to fit with their artificial substrate. This is because what humans and feel and express as emotions is grounded in the distributed and locally context-dependent functioning of organic molecules (eg. hormones) in our body.
Yes, I agree formalisation is needed. See comment by flandry39 in this thread on how one might go about doing so.
Worth considering is that there are actually two aspects that make it hard to define the term ‘alignment’ such to allow for sufficiently rigorous reasoning:
It must allow for logically valid reasoning (therefore requiring formalisation).
It must allow for empirically sound reasoning (ie. the premises correspond with how the world works).
In my reply above, I did not help you much with (1.). Though even while still using the English language, I managed to restate a vague notion of alignment in more precise terms.
Notice how it does help to define the correspondences with how the world works (2.):
“That ‘AGI’ continuing to exist, in some modified form, does not result eventually in changes to world conditions/contexts that fall outside the ranges that existing humans could survive under.”
The reason why 2. is important is that just formalisation is not enough. Just describing and/or deriving logical relations between mathematical objects does not say something about the physical world. Somewhere in your fully communicated definition there also needs to be a description of how the mathematical objects correspond with real-world phenonema. Often, mathematicians do this by talking to collaborators about what symbols mean while they scribble the symbols out on eg. a whiteboard.
But whatever way you do it, you need to communicate how the definition corresponds to things happening in the real world, in order to show that it is a rigorous definition. Otherwise, others could still critique you that the formally precise definition is not rigorous, because it does not adequately (or explicitly) represent the real-world problem.
This is maybe not the central point, but I note that your definition of “alignment” doesn’t precisely capture what I understand “alignment” or a good outcome from AI to be:
AGI could be very catastrophic even when it stops existing a year later.
If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless.
I don’t know whether that covers “humans can survive on mars with a space-suit”, but even then, if humans evolve/change to handle situations that they currently do not survive under, that could be part of an acceptable outcome.
Thanks! These are thoughtful points. See some clarifications below:
You’re right. I’m not even covering all the other bad stuff that could happen in the short-term, that we might still be able to prevent, like AGI triggering global nuclear war.
What I’m referring to is unpreventable convergence on extinction.
Agreed that could be a good outcome if it could be attainable.
In practice, the convergence reasoning is about total human extinction happening within 500 years after ‘AGI’ has been introduced into the environment (with very very little probability remainder above that).
In theory of course, to converge toward 100% chance, you are reasoning about going across a timeline of potentially infinite span.
Yes, it does cover that. Whatever technological means we could think of shielding ourselves, or ‘AGI’ could come up with to create as (temporary) barriers against the human-toxic landscape it creates, still would not be enough.
Unfortunately, this is not workable. The mismatch between the (expanding) set of conditions needed for maintaining/increasing configurations of the AGI artificial hardware and for our human organic wetware is too great.
Also, if you try entirely changing our underlying substrate to the artificial substrate, you’ve basically removed the human and are left with ‘AGI’. The lossy scans of human brains ported onto hardware would no longer feel as ‘humans’ can feel, and will be further changed/selected for to fit with their artificial substrate. This is because what humans and feel and express as emotions is grounded in the distributed and locally context-dependent functioning of organic molecules (eg. hormones) in our body.