The history of evolution to date is exactly what you’d get from a particular kind of RL algorithm optimizing genes for how many future copies are encoded in literal DNA molecules in basement reality.
The history of evolution to date is exactly what you’d get from a particular kind of RL algorithm optimizing genes for how many future copies are encoded in literal DNA molecules in either basement reality or accurate simulations.
The history of evolution to date is exactly what you’d get from a particular kind of RL algorithm optimizing genes for how many future copies are encoded in DNA molecules, or any other format that resembles DNA functionally, regardless of whether it resembles DNA chemically or mechanistically.
The history of evolution to date is exactly what you’d get from a particular kind of RL algorithm optimizing ‘things’ for ‘their future existence & proliferation’ in some broad sense (or something like that)
[infinitely many more things like that]
If future humans switch from DNA to XNA, or upload themselves into simulations, or imprint their values on AI successors, or whatever, then the future would be high-reward according to some of those RL algorithms and the future would be zero-reward according to others of those RL algorithms.
In other words, one “experiment” is simultaneously providing evidence about what the results look like for infinitely many different RL algorithms. Lucky us.
I don’t think it’s productive to just stare at the list of bullet points and try to find the one that corresponds to the “broadest, truest” essence of natural selection. What does that even mean? Why is it relevant to this discussion?
I do think it is potentially productive to argue that the evidence from some of these bullet-point “experiments” is more relevant to AI alignment than the evidence from others of these bullet-point “experiments”. But to make that argument, one needs to talk more specifically about what AI alignment will look like, and argue on that basis that some of the above bullet point RL algorithms are more disanalogous to AI alignment than others. This kind of argument wouldn’t be talking about which bullet point is “reasonable” or “the true essence of natural selection”, but rather about which bullet point is the tightest analogy to the situation where future programmers are developing powerful AI.
(And FWIW my answer to the latter is: none of the above—I think all of those bullet points are sufficiently disanalogous to AI alignment that we don’t really learn anything from them, except that they serve as an existence proof illustration of the extremely weak claim that inner misalignment in RL is not completely impossible. Further details here.)
I think your error is you are thinking the “RL algorithm” is the encoded policy network on a specific creature. Like a human wants to have children, a bacterium wants to find food and replicate itself the moment thresholds are reached. There is a physical mechanism that causes these policies to be enacted.
This is not the RL algorithm. The RL algorithm of evolution doesn’t “exist” anywhere physical, it just happens to prefer outcomes where creatures cause other creatures to exist, they do not have to be remotely sharing the same code. Evolution ranks : build an AI successor >>>>>>>> father 1000 children >> father 1 child, for a concrete example, and it prefers them in that order.
Or another example, you think individual creatures would prefer their own genes to be propagated*. This is a policy. If hypothetically you could go to a biotech clinic and have your genetic code upgraded (junk cleaned out, AI designed genes replace all of your genes with superior versions or the best version found in the human gene pool), your policy network as a human being may not prefer that outcome, but evolution DOES.
[partly copied from here]
The history of evolution to date is exactly what you’d get from a particular kind of RL algorithm optimizing genes for how many future copies are encoded in literal DNA molecules in basement reality.
The history of evolution to date is exactly what you’d get from a particular kind of RL algorithm optimizing genes for how many future copies are encoded in literal DNA molecules in either basement reality or accurate simulations.
The history of evolution to date is exactly what you’d get from a particular kind of RL algorithm optimizing genes for how many future copies are encoded in DNA molecules, or any other format that resembles DNA functionally, regardless of whether it resembles DNA chemically or mechanistically.
The history of evolution to date is exactly what you’d get from a particular kind of RL algorithm optimizing ‘things’ for ‘their future existence & proliferation’ in some broad sense (or something like that)
[infinitely many more things like that]
If future humans switch from DNA to XNA, or upload themselves into simulations, or imprint their values on AI successors, or whatever, then the future would be high-reward according to some of those RL algorithms and the future would be zero-reward according to others of those RL algorithms.
In other words, one “experiment” is simultaneously providing evidence about what the results look like for infinitely many different RL algorithms. Lucky us.
(Related to: “goal misgeneralization”.)
I don’t think it’s productive to just stare at the list of bullet points and try to find the one that corresponds to the “broadest, truest” essence of natural selection. What does that even mean? Why is it relevant to this discussion?
I do think it is potentially productive to argue that the evidence from some of these bullet-point “experiments” is more relevant to AI alignment than the evidence from others of these bullet-point “experiments”. But to make that argument, one needs to talk more specifically about what AI alignment will look like, and argue on that basis that some of the above bullet point RL algorithms are more disanalogous to AI alignment than others. This kind of argument wouldn’t be talking about which bullet point is “reasonable” or “the true essence of natural selection”, but rather about which bullet point is the tightest analogy to the situation where future programmers are developing powerful AI.
(And FWIW my answer to the latter is: none of the above—I think all of those bullet points are sufficiently disanalogous to AI alignment that we don’t really learn anything from them, except that they serve as an existence proof illustration of the extremely weak claim that inner misalignment in RL is not completely impossible. Further details here.)
Isn’t your list an “any_of”?
I think your error is you are thinking the “RL algorithm” is the encoded policy network on a specific creature. Like a human wants to have children, a bacterium wants to find food and replicate itself the moment thresholds are reached. There is a physical mechanism that causes these policies to be enacted.
This is not the RL algorithm. The RL algorithm of evolution doesn’t “exist” anywhere physical, it just happens to prefer outcomes where creatures cause other creatures to exist, they do not have to be remotely sharing the same code. Evolution ranks : build an AI successor >>>>>>>> father 1000 children >> father 1 child, for a concrete example, and it prefers them in that order.
Or another example, you think individual creatures would prefer their own genes to be propagated*. This is a policy. If hypothetically you could go to a biotech clinic and have your genetic code upgraded (junk cleaned out, AI designed genes replace all of your genes with superior versions or the best version found in the human gene pool), your policy network as a human being may not prefer that outcome, but evolution DOES.
and culture and everything else that @Zvi values.