adamShimi comments on Tournesol, YouTube and AI Risk

adamShimi 13 Feb 2021 14:43 UTC
LW: 2 AF: 1
AF
I suspect the best way to think about the polarizing political content thing which is going on right now is something like: The algorithm knows that if it recommends some polarizing political stuff, there’s some chance you will head down a rabbit hole and watch a bunch more vids. So in terms of maximizing your expected watch time, recommending polarizing political stuff is a good bet. “Jumping out of the system” and noticing that recommending polarizing videos also tends to polarize society as a whole and gets them to spend more time on Youtube on a macro level seems to require a different sort of reasoning.
I agree, but I think you can have problems (and even Predict-O-Matic like problems) without reaching that different sort of reasoning. Like, maybe depending on the viewer history, the best video to polarize the person is different, and the algorithm could learn that. If you follow that line of reasoning, the system starts to make better and better models of human behavior and how to influence them, without having to “jump out of the system” as you say.
One could also argue that because YouTube videos contain so much info about the real world, a powerful enough algorithm using them can probably develop a pretty good model of the world. And there’s a lot of content on YouTube about YouTube, so it could become “self-aware” in the sense of understanding the system in which it is embedded.
For the stock thing, I think it depends on how the system is scored. When training a supervised machine learning model, we score potential models based on how well they predict past data—data the model itself has no way to affect (except if something really weird is going on?) There doesn’t seem to be much incentive to select a model that makes self-fulfilling prophecies. A model which ignores the impact of its “prophecies” will score better, insofar as the prophecy would’ve affected the outcome.
Agreed, this is more the kind of problem that emerges from RL like training. The page on the Tournesol wiki about this subject points to this recent paper that propose a recommendation algorithm tried in practice on YouTube. AFAIK we don’t have access to the actual algorithm used by YouTube, so it’s hard to say whether it’s using RL; but the paper above looks like evidence that it eventually will be.
- John_Maxwell 13 Feb 2021 22:29 UTC
  LW: 2 AF: 1
  AF Parent
  
  Like, maybe depending on the viewer history, the best video to polarize the person is different, and the algorithm could learn that. If you follow that line of reasoning, the system starts to make better and better models of human behavior and how to influence them, without having to “jump out of the system” as you say.
  
  Makes sense.
  
  ...there’s a lot of content on YouTube about YouTube, so it could become “self-aware” in the sense of understanding the system in which it is embedded.
  
  I think it might be useful to distinguish between being aware of oneself in a literal sense, and the term “self-aware” as it is used colloquially / the connotations the term sneaks in.
  
  Some animals, if put in front of a mirror, will understand that there is some kind of moving animalish thing in front of them. The ones that pass the mirror test are the ones that realize that moving animalish thing is them.
  
  There is a lot of content on YouTube about YouTube, so the system will likely become aware of itself in a literal sense. That’s not the same as our colloquial notion of “self-awareness”.
  
  IMO, it’d be useful to understand the circumstances under which the first one leads to the second one.
  
  My guess is that it works something like this. In order to survive and reproduce, evolution has endowed most animals with an inborn sense of self, to achieve self-preservation. (This sense of self isn’t necessary for cognition—if you trip on psychedelics and experience ego death, your brain can still think. Occasionally people will hurt themselves in this state since their self-preservation instincts aren’t functioning as normal.)
  
  Colloquial “self-awareness” occurs when an animal looking in the mirror realizes that the thing in the mirror and its inborn sense of self are actually the same thing. Similar to Benjamin Franklin realizing that lightning and electricity are actually the same thing.
  
  If this story is correct, we need not worry much about the average ML system developing “self-awareness” in the colloquial sense, since we aren’t planning to endow it with an inborn sense of self.
  
  That doesn’t necessarily mean I think Predict-O-Matic is totally safe. See this post I wrote for instance.