[Question] What is a disagreement you have around AI safety?

tailcalled12 Jan 2023 16:58 UTC

16 points

Rationality Disagreement AI Aumann's Agreement Theorem

I was once discussing Aumann’s agreement theorem with a rationalist who wanted to use it to reduce disagreements about AI safety. We were thinking about why it doesn’t work in that context when (I would argue) Aumann’s agreement theorem applies robustly in lots of other contexts.

I think I can find out (or maybe even already know) why it doesn’t apply in the context of AI safety, but to test it I would benefit from having a bunch of cases of disagreements about AI safety to investigate.

So if you have a disagreement with anyone about AI safety, feel encouraged to post it here.

tailcalled12 Jan 2023 16:58 UTC

16 points

7 comments1 min readLW link

Rationality Disagreement AI Aumann's Agreement Theorem

Garrett Baker 12 Jan 2023 18:26 UTC
6 points
0
Some say mechanistic interpretability seems really unlikely to bear any fruit because of one or both of the following
1. Neural networks are fundamentally impossible (or just very very hard) for humans to understand, like neuroscience or economics. Fully understanding complex systems is not a thing humans can do.
2. Neural networks are not doing anything you would want to understand, mostly shallow pattern matching and absurd statistical correlations, and it seems impossible to explain how that pattern matching is occurring or tease apart what the root cause of the statistical correlations are.
I think 1 is wrong, because mechanistic interpretability seems to have very fast feedback loops, and we are able to run a shit-ton of experiments. Humans are empirically great at understanding even the most complex of systems if they’re able to run a shit-ton of experiments.

For 2, I think the claim that neural networks are doing shallow pattern matching & absurd statistical correlations is true, and may continue to be true for really scary systems, but am still optimistic we’ll be able to understand why it uses the correlations it does. We have access to the causal system which produced the network (the gradient descent process), and it doesn’t seem too far a step to go from understanding raw networks, to tracing back parts of the network you don’t quite understand or you think are doing shallow pattern matching to the gradients which built those in, and the datapoints which led to those gradients.
Ilio 13 Jan 2023 18:20 UTC
1 point
−1
The most usual concern is: some agentic AI going rogue and taking over. I think this is wrong for two reasons:
1. it’s a human behavior we’ve been dealing with for ages; maybe the one danger our species is the best at fighting.
2. there’s a much more pressing issue: the rise of Moloch, i.e. AI tools are already disrupting our ability to coordinate, and even our desire to do so.

simon 12 Jan 2023 19:51 UTC
8 points
7
We were thinking about why it doesn’t work in that context when (I would argue) Aumann’s agreement theorem applies robustly in lots of other contexts.
That’s an interesting pair of claims, and I’d be interested in hearing your explanation.
IMO, Aumann’s theorem, while technically not incorrect, is highly overrated because it requires arbitrary levels of meta-trust (trust that the other person trusts you to trust them...) to work correctly, which is difficult to obtain, and people already, “rational” or not, base their opinions on the opinions of others so we never see the opinions they would have without taking into account the views of others. Also, even if a group had sufficient meta-trust to reach a consensus, they wouldn’t be able to find out exactly how much their private evidence overlapped without asking each other about their evidence—so merely reaching a consensus would not lead to opinions as accurate as could be reached by discussion of the evidence.
- tailcalled 12 Jan 2023 20:34 UTC
  5 points
  3
  Parent
  I think people just straightforwardly use Aumann’s agreement theorem all the time. Like for example at work today I needed to install a specific obscure program, so I asked one of my teammates who had previously installed it where to get the installer, and what to do in cases where I was unsure what settings to pick.
  This relied on the fact that my teammate had correctly absorbed the information during the first installation run (i.e. was rational) and would share this information to me (i.e. was honest).
  People very often get information from other people, and this very often depends on Aumann-like assumptions.
  I think people in general assume that Aumann’s agreement theorem doesn’t apply because they have a different definition of disagreement than Aumann’s agreement theorem uses. People don’t tend to think of cases where one person knows about X and the other person doesn’t know about X as a disagreement between the two people on X, but according to Aumann’s agreement theorem’s definition, it is.
  - simon 13 Jan 2023 19:13 UTC
    1 point
    0
    Parent
    Yeah, I was reacting mainly to a fad that used to be common in the rationalist community where people would consider it to be a problem that we didn’t agree on everything, and where stating opinions was emphasized over discussing evidence. (i.e. devaluing the normal human baseline, and being overoptimistic about improvements over it, using actually-unhelpful techniques). I see now that you aren’t repeating that approach, and instead you are talking about the normal baseline including Aumann-like information sharing, which I agree with.
Shmi 12 Jan 2023 21:10 UTC
2 points
0
Have you tried using this approach (e.g by double-cruxing) to come to an agreement on a simpler issue to start? AI safety is complicated, smart small.
- tailcalled 12 Jan 2023 22:15 UTC
  4 points
  2
  Parent
  I frequently come to agreement with Aumannian stuff.
  But yes I suspect that one cannot simply use Aumann’s agreement theorem to reach agreement on AI safety; it was the other rationalist who wanted to do it.