bvbvbvbvbvbvbvbvbvbvbv comments on Data and “tokens” a 30 year old human “trains” on

bvbvbvbvbvbvbvbvbvbvbv 27 May 2023 12:39 UTC
3 points
2
edit: rereading your above comments. I see that I should have made clear that I was thinking more about learned architectures. In which case we apparently agree is I meant what you said in https://www.lesswrong.com/posts/ftEvHLAXia8Cm9W5a/data-and-tokens-a-30-year-old-human-trains-on?commentId=4QtpAo3XXsbeWt4NC

Thank your for taking the time.

I agree that it’s probably terminology that is the culprit here. It’s entirely my fault: I was using the word pretraining loosely and meant more something like that hyper parameters (number of layers, inputs, outputs, activation fn, loss) are “learned” by evolution. Leaving to us poor creatures only the task to prune neurons and adjust the synaptic weights.

The reason I was thinking at it this way is that I’ve been reading about NEAT recently, an algorithm that uses a genetic algorithm to learn an architecture as well as train selected architecture. A bit like evolution?

To rephrase my initial point: evolution does its part of the heavy lifting for finding the right brain to live on earth. This shrinks tremendously the space of computation a human has to explore in his lifetime to have a brain fitted to the environnement. This “shrinking of the space” is kinda is like a strong bias towards certain computation. And model pretraining is having the weights of the network already initialized at a value that “already works”, kinda like a strong bias too. Hence the link in my mind.

But yeah, evolution does not give us synaptic weights that work so pretraining is not the right word. Unless you are thinking about learned architectures, in that case my point can somewhat work I think.