iceman comments on My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI

iceman 27 May 2023 17:17 UTC
12 points
4
I was going to write something saying “no actually we have the word genocide to describe the destruction of a peoples,” but walked away because I didn’t think that’d be a productive argument for either of us. But after sleeping on it, I want to respond to your other point:

I don’t think the orthogonality thesis is true in humans (i.e. I think smarter humans tend to be more value aligned with me); and sometimes making non-value-aligned agents smarter is good for you (I’d rather play iterated prisoner’s dilemma with someone smart enough to play tit-for-tat than someone who can only choose between being CooperateBot or DefectBot).

My actual experience over the last decade is that some form of the above statement isn’t true. As a large human model trained on decades of interaction, my immediate response to querying my own next experience predictor in situations around interacting with smarter humans is: no strong correlation with my values and will defect unless there’s a very strong enforcement mechanism (especially in finance, business and management). (Presumably because in our society, most games aren’t iterated—or if they are iterated are closer to the dictator game instead of the prisoner’s dilemma—but I’m very uncertain about causes and am much more worried about previous observed outputs.)

I suspect that this isn’t going to be convincing to you because I’m giving you the output of a fuzzy statistical model instead of giving you a logical verbalized step by step argument. But the deeper crux is that I believe “The Rationalists” heavily over-weigh the second and under-weigh the first, when the first is a much more reliable source of information: it was generated by entanglement with reality in a way that mere arguments aren’t.

And I suspect that’s a large part of the reason why we—and I include myself with the Rationalists at that point in time—were blindsided by deep learning and connectionism winning: we expected intelligence to require some sort of symbolic reasoning and focusing on explicit utility functions and formal decision theory and maximizing things...and none of that seems even relevant to the actual intelligences we’ve made, which are doing fuzzy statistical learning on their training sets, arguably, just the way we are.
- philh 28 May 2023 9:48 UTC
  2 points
  0
  Parent
  So I mostly don’t disagree with what you say about fuzzy statistical models versus step by step arguments. But also, what you said is indeed not very convincing to me, I guess in part because it’s not like my “I think smarter humans tend to be more value aligned with me” was the output of a step by step argument either. So when the output of your fuzzy statistical model clashes with the output of my fuzzy statistical model, it’s hardly surprising that I don’t just discard my own output and replace it with yours.
  
  I’m also not simply discarding yours, but there’s not loads I can do with it as-is—like, you’ve given me the output of your fuzzy statistical model, but I still don’t have access to the model itself. I think if we cared enough to explore this question in more depth (which I probably don’t, but this meta thread is interesting) we’d need to ask things like “what exactly have we observed”, “can we find specific situations where we anticipate different things”, “do we have reason to trust one person’s fuzzy statistical models over another”, “are we even talking about the same thing here”.