Well, for starters, because if the history of ML is anything to go by, we’re gonna be designing the thing analogous to evolution, and not the brain. We don’t pick the actual weights in these transformers, we just design the architecture and then run stochastic gradient descent or some other meta-learning algorithm. That meta-learning algorithm is going to be what decides to go in the DNA, so in order to get the DNA right, we will need to get the meta-learning algorithm correct. Evolution doesn’t have much to teach us about that except as a negative example.
we’re gonna be designing the thing analogous to evolution, and not the brain. We don’t pick the actual weights in these transformers, we just design the architecture and then run stochastic gradient descent or some other meta-learning algorithm.
But, ah, the genome also doesn’t “pick the actual weights” for the human brain which it later grows. So whatever the brain does to align people to care about latent real-world objects, I strongly believe that that process must be compatible with blank-slate initialization and then learning.
That meta-learning algorithm is going to be what decides to go in the DNA, not some human architect.
In the evolution/mainstream-ML analogy, we humans are specifying the DNA, not the search process over DNA specifications. We specify the learning architecture, and then the learning process fills in the rest.
I confess that I already have a somewhat sharp picture of the alignment paradigm used by the brain, that I already have concrete reasons to believe it’s miles better than anything we have dreamed so far. I was originally querying what Eliezer thinks about the “genome->human alignment properties” situation, rather than expressing innocent ignorance of how any of this works.
I think I disagree with you, but I don’t really understand what you’re saying or how these analogies are being used to point to the real world anymore. It seems to me like you might be taking something that makes the problem of “learning from evolution” even more complicated (evolution → protein → something → brain vs. evolution → protein → brain) and using that to argue the issues are solved, in the same vein as the “just don’t use a value function” people. But I haven’t read shard theory, so, GL.
In the evolution/mainstream-ML analogy, we humans are specifying the DNA, not the search process over DNA specifications.
You mean, we are specifying the ATCG strands, or we are specifying the “architecture” behind how DNA influences the development of the human body? It seems to me like we are definitely also choosing how the search for the correct ATCG strands and how they’re identified, in this analogy. The DNA doesn’t “align” new babies out of the womb, it’s just a specification of how to copy the existing, already “”“aligned””” code.
“learning from evolution” even more complicated (evolution → protein → something → brain vs. evolution → protein → brain)
ah, no, this isn’t what I’m saying. Hm. Let me try again.
The following is not a handwavy analogy, it is something which actually happened:
Evolution found the human genome.
The human genome specifies the human brain.
The human brain learns most of its values and knowledge over time.
Human brains reliably learn to care about certain classes of real-world objects like dogs.
Therefore, somewhere in the “genome → brain → (learning) → values” process, there must be aprocesswhich reliably produces values over real-world objects. Shard theoryaims to explain this process. The shard-theoretic explanation is actually pretty simple.
Furthermore, we don’t have to rerun evolution to access this alignment process. For the sake of engaging with my points, please forget completely about running evolution. I will never suggest rerunning evolution, because it’s unwise and irrelevant to my present points. I also currently don’t see why the genome’s alignment process requires more than crude hard-coded reward circuitry, reinforcement learning, and self-supervised predictive learning.
That does seem worth looking at and there’s probably ideas worth stealing from biology. I’m not sure you can call that a robustly aligned system that’s getting bootstrapped though. Existing in a society of (roughly) peers and the lack of a huge power disparity between any given person and the rest of humans is anologous to the AGI that can’t take over the world yet. Humans that aquire significant power do not seem aligned wrt what a typical person would profess to and outwardly seem to care about.
I think your point still mostly follows despite that; even when humans can be deceptive and power seeking, there’s an astounding amount of regularity in what we end up caring about.
Ah, I misunderstood.
Well, for starters, because if the history of ML is anything to go by, we’re gonna be designing the thing analogous to evolution, and not the brain. We don’t pick the actual weights in these transformers, we just design the architecture and then run stochastic gradient descent or some other meta-learning algorithm. That meta-learning algorithm is going to be what decides to go in the DNA, so in order to get the DNA right, we will need to get the meta-learning algorithm correct. Evolution doesn’t have much to teach us about that except as a negative example.
But (I think) the answer is similar to this:
But, ah, the genome also doesn’t “pick the actual weights” for the human brain which it later grows. So whatever the brain does to align people to care about latent real-world objects, I strongly believe that that process must be compatible with blank-slate initialization and then learning.
In the evolution/mainstream-ML analogy, we humans are specifying the DNA, not the search process over DNA specifications. We specify the learning architecture, and then the learning process fills in the rest.
I confess that I already have a somewhat sharp picture of the alignment paradigm used by the brain, that I already have concrete reasons to believe it’s miles better than anything we have dreamed so far. I was originally querying what Eliezer thinks about the “genome->human alignment properties” situation, rather than expressing innocent ignorance of how any of this works.
I think I disagree with you, but I don’t really understand what you’re saying or how these analogies are being used to point to the real world anymore. It seems to me like you might be taking something that makes the problem of “learning from evolution” even more complicated (evolution → protein → something → brain vs. evolution → protein → brain) and using that to argue the issues are solved, in the same vein as the “just don’t use a value function” people. But I haven’t read shard theory, so, GL.
You mean, we are specifying the ATCG strands, or we are specifying the “architecture” behind how DNA influences the development of the human body? It seems to me like we are definitely also choosing how the search for the correct ATCG strands and how they’re identified, in this analogy. The DNA doesn’t “align” new babies out of the womb, it’s just a specification of how to copy the existing, already “”“aligned””” code.
ah, no, this isn’t what I’m saying. Hm. Let me try again.
The following is not a handwavy analogy, it is something which actually happened:
Evolution found the human genome.
The human genome specifies the human brain.
The human brain learns most of its values and knowledge over time.
Human brains reliably learn to care about certain classes of real-world objects like dogs.
Therefore, somewhere in the “genome → brain → (learning) → values” process, there must be a process which reliably produces values over real-world objects. Shard theory aims to explain this process. The shard-theoretic explanation is actually pretty simple.
Furthermore, we don’t have to rerun evolution to access this alignment process. For the sake of engaging with my points, please forget completely about running evolution. I will never suggest rerunning evolution, because it’s unwise and irrelevant to my present points. I also currently don’t see why the genome’s alignment process requires more than crude hard-coded reward circuitry, reinforcement learning, and self-supervised predictive learning.
That does seem worth looking at and there’s probably ideas worth stealing from biology. I’m not sure you can call that a robustly aligned system that’s getting bootstrapped though. Existing in a society of (roughly) peers and the lack of a huge power disparity between any given person and the rest of humans is anologous to the AGI that can’t take over the world yet. Humans that aquire significant power do not seem aligned wrt what a typical person would profess to and outwardly seem to care about.
I think your point still mostly follows despite that; even when humans can be deceptive and power seeking, there’s an astounding amount of regularity in what we end up caring about.
Yes, this is my claim. Not that eg >95% of people form values which we would want to form within an AGI.