It was a relatively fringe topic that only recently got the attention of a large number of real researchers. And parts of it could need large amounts of computational power afforded by only by superhuman narrow AI.
There have been a few random phd dissertations saying the topic is hard but as far as I can tell there has only recently been push for a group effort by capable and well funded actors (I.e. openAI’s interpretability research).
I don’t trust older alignment research much as an outsider. It seems to me that Yud has built a cult of personality around AI dooming and thus is motivated to find reasons for alignment not being possible. And most of his followers treat his initial ideas as axiomatic principles and don’t dare to challenge them. And lastly most past alignment research seems to be made by those followers.
Unfortunately, we do not have the luxury of experimenting with dangerous AI systems to see whether they cause human extinction or not. When it comes to extinction, we do not get another chance to test.
For example this is an argument that has been convincingly disputed to varying levels (warning shots, incomputability of most plans of danger) but it is still treated as a fundamental truth on this site.
and thus is motivated to find reasons for alignment not being possible.
I don’t get this sense.
More like Yudkowsky sees the rate at which AI labs are scaling up and deploying code and infrastructure of ML models, and recognises that there a bunch of known core problems that would need to be solved before there is any plausible possibility of safely containing/aligning AGI optimisation pressure toward outcomes.
I personally think some of the argumentation around AGI being able to internally simulate the complexity in the outside world and play it like a complicated chess game is unsound. But I would not attribute the reasoning in eg. the AGI Ruin piece to Yudkowsky’s cult of personality.
dangerous AI systems
I was gesturing back at “AGI” in the previous paragraph here, and something like precursor AI systems before “AGI”.
Thanks for making me look at that. I just rewrote it to “dangerous autonomous AI systems”.
It was a relatively fringe topic that only recently got the attention of a large number of real researchers. And parts of it could need large amounts of computational power afforded by only by superhuman narrow AI.
There have been a few random phd dissertations saying the topic is hard but as far as I can tell there has only recently been push for a group effort by capable and well funded actors (I.e. openAI’s interpretability research).
I don’t trust older alignment research much as an outsider. It seems to me that Yud has built a cult of personality around AI dooming and thus is motivated to find reasons for alignment not being possible. And most of his followers treat his initial ideas as axiomatic principles and don’t dare to challenge them. And lastly most past alignment research seems to be made by those followers.
For example this is an argument that has been convincingly disputed to varying levels (warning shots, incomputability of most plans of danger) but it is still treated as a fundamental truth on this site.
I don’t get this sense.
More like Yudkowsky sees the rate at which AI labs are scaling up and deploying code and infrastructure of ML models, and recognises that there a bunch of known core problems that would need to be solved before there is any plausible possibility of safely containing/aligning AGI optimisation pressure toward outcomes.
I personally think some of the argumentation around AGI being able to internally simulate the complexity in the outside world and play it like a complicated chess game is unsound. But I would not attribute the reasoning in eg. the AGI Ruin piece to Yudkowsky’s cult of personality.
I was gesturing back at “AGI” in the previous paragraph here, and something like precursor AI systems before “AGI”.
Thanks for making me look at that. I just rewrote it to “dangerous autonomous AI systems”.