Steven Byrnes comments on Self-Supervised Learning and AGI Safety

Steven Byrnes 21 Aug 2019 10:45 UTC
1 point
Ah, thanks for clarifying.

The first entry on my “list of pathological things” wound up being a full blog post in length: See Self-supervised learning and manipulative predictions.

RE daemons, I wrote in that post (and have been assuming all along): “I’m assuming that we will not do a meta-level search for self-supervised learning algorithms… Instead, I am assuming that the self-supervised learning algorithm is known and fixed (e.g. “Transformer + gradient descent” or “whatever the brain does”), and that the predictive model it creates has a known framework, structure, and modification rules, and that only its specific contents are a hard-to-interpret complicated mess.” The contents of a world-model, as I imagine it, is a big data structure consisting of gajillions of “concepts” and “transformations between concepts”. It’s a passive data structure, therefore not a “daemon” in the usual sense. Then there’s a KANSI (Known Algorithm Non Self Improving) system that’s accessing and editing the world model. I also wouldn’t call that a “daemon”, instead I would say “This algorithm we wrote can have pathological behavior...”