I mostly agree with this post. That said, here’s some points I don’t agree with, and some extra nit-picking because Karl asked me for feedback.
The points above indicate that the line between “harmless” and “dangerous” must be somewhere below the traditional threshold of “at least human problem-solving capabilities in most domains”.
I don’t think we know even this. I can imagine an AI that is successfully trained to imitate human behaviour, such that it is it has human problem-solving capabilities in most domains, but which does not pose an existential threat, because it just keeps behaving like a human. This could happen because this AI is not an optimiser but a “predict what a skilled human would do next and then do that” machine.
It is also possible that no such AI would be stable, because it would notice that it is not human, which will somehow cause it to go of rail and start self-improve, or something. At the moment I don’t think we have good evidence either way.
But while it is often difficult to get people to agree on any kind of policy, there are already many things which are not explicitly forbidden, but most people don’t do anyway,
The list of links to stupid things did anyway don’t exactly illustrate your point. But there is a possible argument here regarding the fact that the number of people who have access to teraflops of compute is a much smaller number than those who have access to aquarium fluid.
If we managed to create a widespread common-sense understanding of what AI we should not build. How long do you think it will take for some idiot to do it anyway, after it becomes possible?
(think for example of social media algorithms pushing extremist views, amplifying divisiveness and hatred, and increasing the likelihood of nationalist governments and dictatorships, which in turn increases the risk of wars).
I don’t think the algorithms have much to do with this. I know this is a claim that keeps circulating, but I don’t know what the evidence is. Clearly social media have political influence, but to me this seems to have more to do with the massively increased communication connectiveness, than anything about the specific algorithms.
This will require a lot more research. But there are at least some properties of an AI that could be relevant in this context:
I think this is a good list. On first read I wanted to add agency/agentic-ness/optimiser-similarity but thinking some more I think this should not be included. The reason not to put it on the list is that it’s because of the combination:
agency is vague hard to define concept.
The relevant aspects of agency (from the perspective of safety) are covered by strategic awareness and stability. So probably don’t add it to the list.
However, you might want to add the similar concept “consequentialist reasoning ability”. Although it can be argued that this is just the same as “world model”.
I mostly agree with this post.
That said, here’s some points I don’t agree with, and some extra nit-picking because Karl asked me for feedback.
I don’t think we know even this. I can imagine an AI that is successfully trained to imitate human behaviour, such that it is it has human problem-solving capabilities in most domains, but which does not pose an existential threat, because it just keeps behaving like a human. This could happen because this AI is not an optimiser but a “predict what a skilled human would do next and then do that” machine.
It is also possible that no such AI would be stable, because it would notice that it is not human, which will somehow cause it to go of rail and start self-improve, or something. At the moment I don’t think we have good evidence either way.
The list of links to stupid things did anyway don’t exactly illustrate your point. But there is a possible argument here regarding the fact that the number of people who have access to teraflops of compute is a much smaller number than those who have access to aquarium fluid.
If we managed to create a widespread common-sense understanding of what AI we should not build. How long do you think it will take for some idiot to do it anyway, after it becomes possible?
I don’t think the algorithms have much to do with this. I know this is a claim that keeps circulating, but I don’t know what the evidence is. Clearly social media have political influence, but to me this seems to have more to do with the massively increased communication connectiveness, than anything about the specific algorithms.
I think this is a good list. On first read I wanted to add agency/agentic-ness/optimiser-similarity but thinking some more I think this should not be included. The reason not to put it on the list is that it’s because of the combination:
agency is vague hard to define concept.
The relevant aspects of agency (from the perspective of safety) are covered by strategic awareness and stability. So probably don’t add it to the list.
However, you might want to add the similar concept “consequentialist reasoning ability”. Although it can be argued that this is just the same as “world model”.