Can you expand on sexual recombinant hill-climbing search vs. gradient descent relative to a loss function, keeping in mind that I’m very weak on my understanding of these kinds of algorithms and you might have to explain exactly why they’re different in this way?
[Yudkowsky][14:21]
It’s about the size of the information bottleneck. [followed by a 6 paragraph explanation]
It’s sections like this that show me how many levels above me Eliezer is. When I read Scott’s question I thought “I can see that these two algorithms are quite different but I don’t have a good answer for how they’re different”, and then Eliezer not only had an answer, but a fully fleshed out mechanistic model of the crucial differences between the two that he could immediately explain clearly, succinctly, and persuasively, in 6 paragraphs. And he only spent 4 minutes writing it.
I would be more impressed if he had used the information bottleneck as a simple example of a varying training condition, instead of authoritatively declaring it The Difference, accompanied with its own just so story to explain discrepancies in implementation that haven’t even been demonstrated. I’m not even sure the analogy is correct; is the 7.5MB storing training parameters or the python code?
FYI, the timestamp is for the first Discord message. If the log broke out timestamps for every part of the message, it would look like this:
[2:21 PM]
It’s about the size of the information bottleneck. The human genome is 3 billion base pairs drawn from 4 possibilities, so 750 megabytes. Let’s say 90% of that is junk DNA, and 10% of what’s left is neural wiring algorithms. So the code that wires a 100-trillion-synapse human brain is about 7.5 megabytes. Now an adult human contains a lot more information than this. Your spinal cord is about 70 million neurons so probably just your spinal cord has more information than this. That vastly greater amount of runtime info inside the adult organism grows out of the wiring algorithms as your brain learns to move around your muscles, and your eyes open and the retina wires itself and starts directing info on downward to more things that wire themselves, and you learn to read, and so on.
[2:22 PM]
Anything innate that makes reasoning about people out to cheat you, easier than reasoning about isomorphic simpler letters and numbers on cards, has to be packed into the 7.5MB, and gets there via a process where ultimately one random mutation happens at a time, even though lots of mutations are recombining and being selected on at a time.
[2:24 PM]
It’s a very slow learning process. It takes hundreds or thousands of generations even for a pretty good mutation to fix itself in the population and become reliably available as a base for other mutations to build on. The entire organism is built out of copying errors that happened to work better than the things they were copied from. Everything is built out of everything else, the pieces that were already lying around for building other things.
[2:27 PM]
When you’re building an organism that can potentially benefit from coordinating, trading, with other organisms very similar to itself, and accumulating favors and social capital over long time horizons—and your organism is already adapted to predict what other similar organisms will do, by forcing its own brain to operate in a special reflective mode where it pretends to be the other person’s brain—then a very simple way of figuring out what other people will like, by way of figuring out how to do them favors, is to notice what your brain feels when it operates in the special mode of pretending to be the other person’s brain.
[2:27 PM]
And one way you can get people who end up accumulating a bunch of social capital is by having people with at least some tendency in them—subject to various other forces and overrides, of course—to feel what they imagine somebody else feeling. If somebody else drops a rock on their foot, they wince.
[2:28 PM]
This is a way to solve a favor-accumulation problem by laying some extremely simple circuits down on top of a lot of earlier machinery.
Lol, cool. I tried the “4 minute” challenge (without having read EY’s answer, but having read yours).
Hill-climbing search requires selecting on existing genetic variance on alleles already in the gene pool. If there isn’t a local mutation which changes the eventual fitness of the properties which that genotype unfolds into, then you won’t have selection pressure in that direction. On the other hand, gradient descent is updating live on a bunch of data in fast iterations which allow running modifications over the parameters themselves. It’s like being able to change a blueprint for a house, versus being able to be at the house in the day and direct repair-people.
The changes happen online, relative to the actual within-cognition goings-on of the agent (e.g. you see some cheese, go to the cheese, get a policy gradient and become more likely to do it again). Compare that to having to try out a bunch of existing tweaks to a cheese-bumping-into agent (e.g. make it learn faster early in life but then get sick and die later), where you can’t get detailed control over its responses to specific situations (you can just tweak the initial setup).
Gradient descent is just a fundamentally different operation. You aren’t selecting over learning processes which unfold into minds, trying out a finite but large gene pool of variants, and then choosing the most self-replicating; you are instead doing local parametric search over what changes outputs on the training data. But RL isn’t even differentiable, you aren’t running gradients through it directly. So there isn’t even an analogue of “training data” in the evolutionary regime.
I think I ended up optimizing for “actually get model onto the page in 4 minutes” and not for “explain in a way Scott would have understood.”
FWIW this was basically cached for me, and if I were better at writing and had explained this ~10 times before like I expect Eliezer has, I’d be able to do about as well. So would Nate Soares or Buck or Quintin Pope (just to pick people in 3 different areas of alignment), and Quintin would also have substantive disagreements.
Fair enough. Nonetheless, I have had this experience many times with Eliezer, including when dialoguing with people with much more domain-experience than Scott.
I’m not certain, but I’m fairly confident I follow the structure of the argument and how it fits into the conversation.
I don’t mean to imply I achieved mastery myself from reading the passage, I’m saying that the writer seems to me (from this and other instances) to have a powerful understanding of the domain.
Yes, I understood what you meant. What I’m suggesting is that “seems” is precisely the operative word here.
Now, I obviously don’t know what “other instances” you have in mind, so I can’t comment on the validity of your overall impression. But judging just on the basis of this particular explanation, it seems to me that the degree to which it appears to convey a powerful understanding of the domain rather exceeds the degree to which it actually conveys a powerful understanding of the domain. (Note the word “conveys” there—I am not making a claim about the degree to which Eliezer actually understands the domain in question!)
In other words, if the argument that Eliezer gives was bad and wrong, how sure are you that you’d have noticed?
EDIT: For instance, how easily could you answer the question I asked in my top-level comment? Whatever answer you may give—is it the answer Eliezer would give, as well? If you think it is—how sure are you? Trying to answer these questions is one way to check whether you’ve really absorbed a coherent understanding of the matter, I think.
It’s sections like this that show me how many levels above me Eliezer is. When I read Scott’s question I thought “I can see that these two algorithms are quite different but I don’t have a good answer for how they’re different”, and then Eliezer not only had an answer, but a fully fleshed out mechanistic model of the crucial differences between the two that he could immediately explain clearly, succinctly, and persuasively, in 6 paragraphs. And he only spent 4 minutes writing it.
I would be more impressed if he had used the information bottleneck as a simple example of a varying training condition, instead of authoritatively declaring it The Difference, accompanied with its own just so story to explain discrepancies in implementation that haven’t even been demonstrated. I’m not even sure the analogy is correct; is the 7.5MB storing training parameters or the python code?
FYI, the timestamp is for the first Discord message. If the log broke out timestamps for every part of the message, it would look like this:
That makes more sense.
Lol, cool. I tried the “4 minute” challenge (without having read EY’s answer, but having read yours).
I think I ended up optimizing for “actually get model onto the page in 4 minutes” and not for “explain in a way Scott would have understood.”
FWIW this was basically cached for me, and if I were better at writing and had explained this ~10 times before like I expect Eliezer has, I’d be able to do about as well. So would Nate Soares or Buck or Quintin Pope (just to pick people in 3 different areas of alignment), and Quintin would also have substantive disagreements.
Fair enough. Nonetheless, I have had this experience many times with Eliezer, including when dialoguing with people with much more domain-experience than Scott.
Could you reconstruct the argument now, having seen it, and without having it in front of you?
No.
Are you quite sure you understand it, then…?
I’m not certain, but I’m fairly confident I follow the structure of the argument and how it fits into the conversation.
I don’t mean to imply I achieved mastery myself from reading the passage, I’m saying that the writer seems to me (from this and other instances) to have a powerful understanding of the domain.
Yes, I understood what you meant. What I’m suggesting is that “seems” is precisely the operative word here.
Now, I obviously don’t know what “other instances” you have in mind, so I can’t comment on the validity of your overall impression. But judging just on the basis of this particular explanation, it seems to me that the degree to which it appears to convey a powerful understanding of the domain rather exceeds the degree to which it actually conveys a powerful understanding of the domain. (Note the word “conveys” there—I am not making a claim about the degree to which Eliezer actually understands the domain in question!)
In other words, if the argument that Eliezer gives was bad and wrong, how sure are you that you’d have noticed?
EDIT: For instance, how easily could you answer the question I asked in my top-level comment? Whatever answer you may give—is it the answer Eliezer would give, as well? If you think it is—how sure are you? Trying to answer these questions is one way to check whether you’ve really absorbed a coherent understanding of the matter, I think.