You need a method of touching grass so that researchers have some idea of whether or not they’re making progress on the real issues.
We already can’t make MNIST digit recognizers secure against adversarial attacks. We don’t know how to prevent prompt injection. Convnets are vulnerable to adversarial attacks. RL agents that play Go at superhuman levels are vulnerable to simple strategies that exploit gaps in their cognition.
No, there’s plenty of evidence that we can’t make ML systems robust.
What is lacking is “concrete” evidence that that will result in blood and dead bodies.
We already can’t make MNIST digit recognizers secure against adversarial attacks. We don’t know how to prevent prompt injection. Convnets are vulnerable to adversarial attacks. RL agents that play Go at superhuman levels are vulnerable to simple strategies that exploit gaps in their cognition.
None of those things are examples of misalignment except arguably prompt injection, which seems like it’s being solved by OpenAI with ordinary engineering.
We already can’t make MNIST digit recognizers secure against adversarial attacks. We don’t know how to prevent prompt injection. Convnets are vulnerable to adversarial attacks. RL agents that play Go at superhuman levels are vulnerable to simple strategies that exploit gaps in their cognition.
No, there’s plenty of evidence that we can’t make ML systems robust.
What is lacking is “concrete” evidence that that will result in blood and dead bodies.
None of those things are examples of misalignment except arguably prompt injection, which seems like it’s being solved by OpenAI with ordinary engineering.