To me, the important safety feature of “microscope AI” is that the AI is not modeling the downstream consequences of its outputs (which automatically rules out manipulation and deceit).
As I mentioned in this comment, not modeling the consequences of its output is actually exactly what I want to get out of myopia.
For the latter question, what is the user interface, “Use interpretability tools & visualizations on the world-model” seems about as good an answer as any, and I am very happy to have Chris and others trying to flesh out that vision.
Yep; me too!
I hope that they don’t stop at feature extraction, but also pulling out the relationships (causal, compositional, etc.) that we need to do counterfactual reasoning, planning etc., and even a “search through causal pathways to get desired consequences” interface.
Chris (and the rest of Clarity) are definitely working on stuff like this!
unsupervised (a.k.a. “self-supervised”) learning as ofer suggests seems awfully safe but is it really?
I generally agree that unsupervised learning seems much safer than other approaches (e.g. RL), though I also agree that there are still concerns. See for example Abram’s recent “The Parable of Predict-O-Matic” and the rest of his Partial Agency sequence.
As I mentioned in this comment, not modeling the consequences of its output is actually exactly what I want to get out of myopia.
Yep; me too!
Chris (and the rest of Clarity) are definitely working on stuff like this!
I generally agree that unsupervised learning seems much safer than other approaches (e.g. RL), though I also agree that there are still concerns. See for example Abram’s recent “The Parable of Predict-O-Matic” and the rest of his Partial Agency sequence.