Isn’t there an alternative story here where we care about the sharp left turn, but in the cultural sense, similar to Drexler’s CAIS where we have similar types of experimentation as happened during the cultural evolution phase?
You’ve convinced me that the sharp left turn will not happen in the classical way that people have thought about it, but are you that certain that there isn’t that much free energy available in cultural style processes? If so, why?
I can imagine that there is something to say about SGD already being pretty algorithmically efficient, but I guess I would say that determining how much available free energy there is in improving optimisation processes is an open question. If the error bars are high here, how can we then know that the AI won’t spin up something similar internally?
I also want to add something about genetic fitness becoming twisted as a consequence of cultural evolutionary pressure on individuals. Culture in itself changed the optimal survival behaviour of humans, which then meant that the meta-level optimisation loop changed the underlying optimisation loop. Isn’t the culture changing the objective function still a problem that we have to potentially contend with, even though it might not be as difficult as the normal sharp left turn?
For example, let’s say that we deploy GPT-6 and it figures out that in order to solve the loosely defined objective that we have determined for it using (Constitutional AI)^2 should be discussed by many different iterations of itself to create a democratic process of multiple COT reasoners. This meta-process seems, in my opinion, like something that the cultural evolution hypothesis would predict is more optimal than just one GPT-6, and it also seems a lot harder to align than normal?
Isn’t there an alternative story here where we care about the sharp left turn, but in the cultural sense, similar to Drexler’s CAIS where we have similar types of experimentation as happened during the cultural evolution phase?
You’ve convinced me that the sharp left turn will not happen in the classical way that people have thought about it, but are you that certain that there isn’t that much free energy available in cultural style processes? If so, why?
I can imagine that there is something to say about SGD already being pretty algorithmically efficient, but I guess I would say that determining how much available free energy there is in improving optimisation processes is an open question. If the error bars are high here, how can we then know that the AI won’t spin up something similar internally?
I also want to add something about genetic fitness becoming twisted as a consequence of cultural evolutionary pressure on individuals. Culture in itself changed the optimal survival behaviour of humans, which then meant that the meta-level optimisation loop changed the underlying optimisation loop. Isn’t the culture changing the objective function still a problem that we have to potentially contend with, even though it might not be as difficult as the normal sharp left turn?
For example, let’s say that we deploy GPT-6 and it figures out that in order to solve the loosely defined objective that we have determined for it using (Constitutional AI)^2 should be discussed by many different iterations of itself to create a democratic process of multiple COT reasoners. This meta-process seems, in my opinion, like something that the cultural evolution hypothesis would predict is more optimal than just one GPT-6, and it also seems a lot harder to align than normal?