I’m curious to hear more about this. Reviewing the analogy:
Evolution, ‘trying’ to get general intelligences that are great at reproducing <--> The AI Industry / AI Corporations, ‘trying’ to get AGIs that are HHH Genes, instructing cells on how to behave and connect to each other and in particular how synapses should update their ‘weights’ in response to the environment <--> Code, instructing GPUs on how to behave and in particular how ‘weights’ in the neural net should update in response to the environment Brains, growing and learning over the course of lifetime <--> Weights, changing and learning over the course of training
Now turning to your three points about evolution:
Optimizing the genome indirectly influences value formation within lifetime, via this unstable Rube Goldberg mechanism that has to zero-shot direct an organism’s online learning processes through novel environments via reward shaping --> translating that into the analogy, it would be “optimizing the code indirectly influences value formation over the course of training, via this unstable Rube Goldberg mechanism that has to zero-shot direct the model’s learning process through novel environments vai reward shaping… yep seems to check out. idk. What do you think?
Accumulated lifetime value learning is mostly reset with each successive generation without massive fixed corpuses of human text / RLHF supervisors --> Accumulated learning in the weights is mostly reset when new models are trained since they are randomly initialized; fortunately there is a lot of overlap in training environment (internet text doesn’t change that much from model to model) and also you can use previous models as RLAIF supervisors… (though isn’t that also analogous to how humans generally have a lot of shared text and culture that spans generations, and also each generation of humans literally supervises and teaches the next?)
Massive optimization power overhang in the inner loop of its optimization process --> isn’t this increasingly true of AI too? Maybe I don’t know what you mean here. Can you elaborate?
I’m curious to hear more about this. Reviewing the analogy:
Evolution, ‘trying’ to get general intelligences that are great at reproducing <--> The AI Industry / AI Corporations, ‘trying’ to get AGIs that are HHH
Genes, instructing cells on how to behave and connect to each other and in particular how synapses should update their ‘weights’ in response to the environment <--> Code, instructing GPUs on how to behave and in particular how ‘weights’ in the neural net should update in response to the environment
Brains, growing and learning over the course of lifetime <--> Weights, changing and learning over the course of training
Now turning to your three points about evolution:
Optimizing the genome indirectly influences value formation within lifetime, via this unstable Rube Goldberg mechanism that has to zero-shot direct an organism’s online learning processes through novel environments via reward shaping --> translating that into the analogy, it would be “optimizing the code indirectly influences value formation over the course of training, via this unstable Rube Goldberg mechanism that has to zero-shot direct the model’s learning process through novel environments vai reward shaping… yep seems to check out. idk. What do you think?
Accumulated lifetime value learning is mostly reset with each successive generation without massive fixed corpuses of human text / RLHF supervisors --> Accumulated learning in the weights is mostly reset when new models are trained since they are randomly initialized; fortunately there is a lot of overlap in training environment (internet text doesn’t change that much from model to model) and also you can use previous models as RLAIF supervisors… (though isn’t that also analogous to how humans generally have a lot of shared text and culture that spans generations, and also each generation of humans literally supervises and teaches the next?)
Massive optimization power overhang in the inner loop of its optimization process --> isn’t this increasingly true of AI too? Maybe I don’t know what you mean here. Can you elaborate?