Below I have transcribes the ending argument from Eliezer. The underlined claims seem to state it’s impossible.
I updated “aligned” to “poorly defined”. A poorly defined superintelligence would be some technical artifact as a result of modern AI training, where it does way above human level at benchmarks but isn’t coherently moral or in service to human goals when given inputs outside of the test distribution.
So from my perspective, lots of people want to make perpetual motion machines by making their designs more and more complicated and until they can no longer keep track of things until they can no longer see the flaw in their own invention. But, like the principle that says you can’t get perpetual motion out of the collection of gears is simpler than all these complicated machines that they describe. From my perspective, what you’ve got is like a very smart thing or like a collection of very smart things, whatever, that have desires pointing in multiple directions. None of them are aligned with humanity, none of them want for it’s own sake to keep humanity around and that wouldn’t be enough ask, you also want happening to be alive and free. Like the galaxies we turn into something interesting but you know none of them want the good stuff. And if you have this enormous collection of powerful intelligences, but steering the future none of them steering it in a good way, and you’ve got the humans here who are not that smart, no matter what kind of clever things the humans are trying to do or they try to cleverly play off the super intelligences against each other, they’re [human subgroups] like oh this is my super intelligence yeah but they can’t actually shape it’s goals to be like in clear alignment, you know somewhere at the end all this it ends up with the humans gone and the Galaxy is being transformed and that ain’t all that cool. There’s maybe like Dyson sphere’s but there’s not people to wonder at them and care about each other. And you know that this is the end point, this is obviously where it ends up, but we can dive into the details of how the humans lose, we can dive into it and you know like what goes wrong if you you’ve got like little stupid things, things like that they’re going to like cleverly play off a bunch of smart things against each other in a way that preserves their own power and control. But you know, it’s not a complicated story in the end. The reason you can’t build a perpetual motion machine is a lot simpler than the perpetual motion machines that people build. You know that the components, like, none of the components of this system of super intelligence wants us to live happily ever after in a Galaxy full of Wonders and so it doesn’t happen.
He’s talking about “modern AI training” i.e. “giant, inscrutable matrices of floating-point numbers”. My impression is that he thinks it is possible (but extremely difficult) to build aligned ASI, but nearly impossible to bootstrap modern DL systems to alignment.
The green lines are links into the actual video.
Below I have transcribes the ending argument from Eliezer. The underlined claims seem to state it’s impossible.
I updated “aligned” to “poorly defined”. A poorly defined superintelligence would be some technical artifact as a result of modern AI training, where it does way above human level at benchmarks but isn’t coherently moral or in service to human goals when given inputs outside of the test distribution.
So from my perspective, lots of people want to make perpetual motion machines by making their
designs more and more complicated and until they can no longer keep track of things until they
can no longer see the flaw in their own invention. But, like the principle that says you can’t
get perpetual motion out of the collection of gears is simpler than all these complicated
machines that they describe. From my perspective, what you’ve got is like a very smart thing or
like a collection of very smart things, whatever, that have desires pointing in multiple
directions. None of them are aligned with humanity, none of them want for it’s own sake to
keep humanity around and that wouldn’t be enough ask, you also want happening to be alive and
free. Like the galaxies we turn into something interesting but you know none of them want the
good stuff. And if you have this enormous collection of powerful intelligences, but steering
the future none of them steering it in a good way, and you’ve got the humans here who are not
that smart, no matter what kind of clever things the humans are trying to do or they try to
cleverly play off the super intelligences against each other, they’re [human subgroups] like oh
this is my super intelligence yeah but they can’t actually shape it’s goals to be like in clear alignment, you know somewhere at the end all this it ends up with the humans gone and the Galaxy is being transformed and that ain’t all that cool. There’s maybe like Dyson sphere’s but there’s not people to wonder at them and care about each other. And you know that this is the end point, this is obviously where it ends up, but we can dive into the details of how the humans lose, we can dive into it and you know like what goes wrong if you you’ve got like little stupid things, things like that they’re going to like cleverly play off a bunch of smart things against each other in a way that preserves their own power and control. But you know, it’s not a complicated story in the end. The reason you can’t build a perpetual motion machine is a lot simpler than the perpetual motion machines that people build. You know that the components, like, none of the components of this system of super intelligence wants us to live happily ever after in a Galaxy full of Wonders and so it doesn’t happen.
He’s talking about “modern AI training” i.e. “giant, inscrutable matrices of floating-point numbers”. My impression is that he thinks it is possible (but extremely difficult) to build aligned ASI, but nearly impossible to bootstrap modern DL systems to alignment.
Would you agree calling it “poorly defined” instead of “aligned” is an accurate phrasing for his argument or not? I edited the post.