Roko comments on Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.)

Roko 18 Oct 2023 14:48 UTC
1 point
−1

You need a method of touching grass so that researchers have some idea of whether or not they’re making progress on the real issues.

We already can’t make MNIST digit recognizers secure against adversarial attacks. We don’t know how to prevent prompt injection. Convnets are vulnerable to adversarial attacks. RL agents that play Go at superhuman levels are vulnerable to simple strategies that exploit gaps in their cognition.

No, there’s plenty of evidence that we can’t make ML systems robust.

What is lacking is “concrete” evidence that that will result in blood and dead bodies.
- lc 18 Oct 2023 15:02 UTC
  4 points
  0
  Parent
  
  We already can’t make MNIST digit recognizers secure against adversarial attacks. We don’t know how to prevent prompt injection. Convnets are vulnerable to adversarial attacks. RL agents that play Go at superhuman levels are vulnerable to simple strategies that exploit gaps in their cognition.
  
  None of those things are examples of misalignment except arguably prompt injection, which seems like it’s being solved by OpenAI with ordinary engineering.