Max H comments on The Computational Anatomy of Human Values

Max H 10 Apr 2023 14:55 UTC
1 point
0
I think our biggest crux is this. My idea here is that by default we get systems that look like this—DL systems look like this! and my near-term prediction is that DL systems will scale all the way to AGI. Almost any near-term AGI will almost certainly look ‘human-like’ in a way—some combination of model-free and model-based RL wrapped around an unsupervised world model.
Agree this is a crux. A few remarks:
- Structural similarity doesn’t necessarily tell us a lot about a system’s macro-level behavior. Examples: Stockfish 1 vs. Stockfish 20, the brain of a supervillain vs. the brain of an average human, a transformer model with random weights vs. one trained to predict the next token in a sequence of text.
  
  Or, if you want to extend the similarity to the training process, a transformer model trained on a corpus of text from the human internet vs. one trained on a corpus of text from an alien internet. An average human vs. a supervillain who have 99%+ identical life experiences from birth. Stockfish implemented by a beginner programmer vs. a professional team.
- I’d say, to the extent that current DL systems are structurally similar to human brains, it’s because these structures are instrumentally useful for doing any kind of useful work, regardless of how “values” in those systems are formed, or what those values are. And as you converge towards the most useful structures, there is less room left over for the system to “look similar” to humans, unless humans are pretty close to performing cognition optimally already.
  
  Also, a lot of the structural similarity is in the training process of the foundation models that make up one component of a larger artificial system. The kinds of things people do with LangChain today don’t seem similar in structure to any part of a single human brain, at least to me. For example, I can’t arrange a bunch of copies of myself in a chain or tree, and give them each different prompts running in parallel. I could maybe simulate that by hiring a bunch of people, though it would be OOMs slower and more costly.
  
  I also can’t add a python shell or a “tree search” method, or perform a bunch of experimental neurosurgery on humans, the way I can with artificial systems. These all seem like capabilities-enhancing tools that don’t preserve structural similarity to humans, and may also not preserve similarity of values to the original, un-enhanced artificial system.