Vladimir_Nesov comments on Why I am not an AI extinction cautionista

Vladimir_Nesov 19 Jun 2023 11:37 UTC
2 points
0
By “modern methods” I meant roughly what Ape in the coat noted, more specifically ~end-to-end DNNs (possibly some model-based RL setup, possibly with some of the nets bootstrapped from pre-trained language models, or trained on data or in situations LLMs generate).

As opposed to cognitive architectures that are more like programs in the classical sense, even if they are using DNNs for some of what they do, like CoEm. Or DNNs iteratively and automatically “decompiled” into explicit and modular giant but locally human-understandable program code using AI-assisted interpretability tools (in a way that forces change of behavior and requires retraining of remaining black box DNN parts to maintain capability), taking human-understandable features forming in DNNs as inspiration to write code that more carefully computes them. I’m also guessing alignment-themed decision theory has a use in shaping something like this (whether the top-level program architecture or synthetic data that trains the DNNs), this motivates my own efforts. Or something else entirely; this paragraph is non-examples for “modern methods” in the sense I intended, the kind of stuff that could use some giant training run pause to have a chance to grow up.