John_Maxwell comments on Self-Supervised Learning and AGI Safety

John_Maxwell 14 Aug 2019 2:58 UTC
2 points
0

Can you be more specific about the daemons you’re thinking about? I had tried to argue that daemons wouldn’t occur under certain circumstances, or at least wouldn’t cause malign failures...

Which part are you referring to?

Anyway, I’m worried that a daemon will arise while searching for models which do a good job of predicting masked bits. As it was put here: ”...we note that trying to predict the output of consequentialist reasoners can reduce to an optimisation problem over a space of things that contains consequentialist reasoners.” Initially daemons seemed implausible to me, but then I thought of a few ways they could happen—hoping to write posts about this before too long. I encourage others to brainstorm as well, so we can try & think of all the plausible ways daemons could get created.

I started my own list of pathological things that might happen with self-supervised learning systems, maybe I’ll show you when it’s ready and we can compare notes...?

Sure!
- Steven Byrnes 21 Aug 2019 10:45 UTC
  1 point
  0
  Parent
  Ah, thanks for clarifying.
  
  The first entry on my “list of pathological things” wound up being a full blog post in length: See Self-supervised learning and manipulative predictions.
  
  RE daemons, I wrote in that post (and have been assuming all along): “I’m assuming that we will not do a meta-level search for self-supervised learning algorithms… Instead, I am assuming that the self-supervised learning algorithm is known and fixed (e.g. “Transformer + gradient descent” or “whatever the brain does”), and that the predictive model it creates has a known framework, structure, and modification rules, and that only its specific contents are a hard-to-interpret complicated mess.” The contents of a world-model, as I imagine it, is a big data structure consisting of gajillions of “concepts” and “transformations between concepts”. It’s a passive data structure, therefore not a “daemon” in the usual sense. Then there’s a KANSI (Known Algorithm Non Self Improving) system that’s accessing and editing the world model. I also wouldn’t call that a “daemon”, instead I would say “This algorithm we wrote can have pathological behavior...”