I would be skeptical such a proof is possible. As an existence proof, we could create aligned ASI by simulating the most intelligent and moral people, running at 10,000 times the speed of a normal human.
Okay, maybe I’m moving the bar, hopefully not and this thread is helpful...
Your counter-example, your simulation would prove that examples of aligned systems—at a high level—are possible. Alignment at some level is possible, of course. Functioning thermostats are aligned.
What I’m trying to propose is the search for a proof that a guarantee of alignment—all the way up—is mathematically impossible. We could then make the statement: “If we proceed down this path, no one will ever be able to guarantee that humans remain in control.” I’m proposing we see if we can prove that Stuart Russell’s “provably beneficial” does not exist.
If a guarantee is proved to be impossible, I am contending that the public conversation changes.
Maybe many people—especially on LessWrong—take this fact as a given. Their internal belief is close enough to a proof...that there is not a guarantee all the way up.
I think a proof that there is no guarantee would be important news for the wider world...the world that has to move if there is to be regulation.
we could create aligned ASI by simulating the most intelligent and moral people
This is not an existence proof, because it does not take into account the difference in physical substrates.
Artificial General Intelligence would be artificial, by definition. In fact, what allows for the standardisation of hardware components is the fact that the (silicon) substrate is hard under human living temperatures and pressures. That allows for configurations to stay compartmentalised and stable.
Human “wetware” has a very different substrate. It’s a soup of bouncing organic molecules constantly reacting under living temperatures and pressures
I would be skeptical such a proof is possible. As an existence proof, we could create aligned ASI by simulating the most intelligent and moral people, running at 10,000 times the speed of a normal human.
Okay, maybe I’m moving the bar, hopefully not and this thread is helpful...
Your counter-example, your simulation would prove that examples of aligned systems—at a high level—are possible. Alignment at some level is possible, of course. Functioning thermostats are aligned.
What I’m trying to propose is the search for a proof that a guarantee of alignment—all the way up—is mathematically impossible. We could then make the statement: “If we proceed down this path, no one will ever be able to guarantee that humans remain in control.” I’m proposing we see if we can prove that Stuart Russell’s “provably beneficial” does not exist.
If a guarantee is proved to be impossible, I am contending that the public conversation changes.
Maybe many people—especially on LessWrong—take this fact as a given. Their internal belief is close enough to a proof...that there is not a guarantee all the way up.
I think a proof that there is no guarantee would be important news for the wider world...the world that has to move if there is to be regulation.
Sorry, could you elaborate what you mean by all the way up?
All the way up meaning at increasing levels of intelligence…your 10,000 becomes 100,000X, etc.
At some level of performance, a moral person faces new temptations because of increased capabilities and greater power for damage, right?
In other words, your simulation may fail to be aligned at 20,000...30,000...
This is not an existence proof, because it does not take into account the difference in physical substrates.
Artificial General Intelligence would be artificial, by definition. In fact, what allows for the standardisation of hardware components is the fact that the (silicon) substrate is hard under human living temperatures and pressures. That allows for configurations to stay compartmentalised and stable.
Human “wetware” has a very different substrate. It’s a soup of bouncing organic molecules constantly reacting under living temperatures and pressures
Here’s why the substrate distinction matters.