Had it turned out that the brain was big because blind-idiot-god left gains on the table, I’d have considered it evidence of more gains lying on other tables and updated towards faster takeoff.
ADifferentAnonymous
I agree the blackbody formula doesn’t seem that relevant, but it’s also not clear what relevance Jacob is claiming it has. He does discuss that the brain is actively cooled. So let’s look at the conclusion of the section:
Conclusion: The brain is perhaps 1 to 2 OOM larger than the physical limits for a computer of equivalent power, but is constrained to its somewhat larger than minimal size due in part to thermodynamic cooling considerations.
If the temperature-gradient-scaling works and scaling down is free, this is definitely wrong. But you explicitly flag your low confidence in that scaling, and I’m pretty sure it wouldn’t work.* In which case, if the brain were smaller, you’d need either a hotter brain or a colder environment.
I think that makes the conclusion true (with the caveat that ‘considerations’ are not ‘fundamental limits’).
(My gloss of the section is ‘you could potential make the brain smaller, but it’s the size it is because cooling is expensive in a biological context, not necessarily because blind-idiot-god evolution left gains on the table’).
* I can provide some hand-wavy arguments about this if anyone wants.
The capabilities of ancestral humans increased smoothly as their brains increased in scale and/or algorithmic efficiency. Until culture allowed for the brain’s within-lifetime learning to accumulate information across generations, this steady improvement in brain capabilities didn’t matter much. Once culture allowed such accumulation, the brain’s vastly superior within-lifetime learning capacity allowed cultural accumulation of information to vastly exceed the rate at which evolution had been accumulating information. This caused the human sharp left turn.
This is basically true if you’re talking about the agricultural or industrial revolutions, but I don’t think anybody claims evolution improved human brains that fast. But homo sapiens have only been around 300,000 years, which is still quite short on the evolutionary timescale, and it’s much less clear that the quoted paragraph applies here.
I think a relevant thought experiment would be to consider the level of capability a species would eventually attain if magically given perfect parent-to-child knowledge transfer—call this the ‘knowledge ceiling’. I expect most species to have a fairly low knowledge ceiling—e.g. meerkats with all the knowledge of their ancestors would basically live like normal meerkats but be 30% better at it or something.
The big question, then, is what the knowledge ceiling progression looks like over the course of hominid evolution. It is not at all obvious to me that it’s smooth!
Upvoted mainly for the ‘width of mindspace’ section. The general shard theory worldview makes a lot more sense to me after reading that.
Consider a standalone post on that topic if there isn’t one already.
I feel that there’s something true and very important here, and (as the post acknowledges) it is described very imperfectly.
One analogy came to mind for me that seems so obvious that I wonder if you omitted it deliberately: a snare trap. These very literally work by removing any slack the victim manages to create.
There’s definitely something here.
I think it’s a mistake to conflate rank with size. The point of the whole spherical-terrarium thing is that something like ‘the presidency’ is still just a human-sized nook. What makes it special is the nature of its connections to other nooks.
Size is something else. Big things like ‘the global economy’ do exist, but you can’t really inhabit them—at best, you can inhabit a human-sized nook with unusually high leverage over them.
That said, there’s a sense in which you can inhabit something like ‘competitive Tae Kwon Do’ or ‘effective altruism’ despite not directly experiencing most of the specific people/places/things involved. I guess it’s a mix of meeting random-ish samples of other people engaged the same way you are, sharing a common base of knowledge… Probably a lot more. Fleshing out the exact nature of this is probably valuable, but I’m not going to do it right now.
I might model this as a Ptolemaic set of concentric spheres around you. Different sizes of nook go on different spheres. So your Tae Kwon Do club goes on your innermost sphere—you know every person in it, you know the whole physical space, etc. ‘Competitive Tae Kwon Do’ is a bigger nook and thus goes on an outer sphere.
Or maybe you can choose which sphere to put things in—if you’re immersed in competitive Tae Kwon Do, it’s in your second sphere. If you’re into competitive martial arts in general, TKD has to go on the third sphere. And if you just know roughly what it is and that it exists, it’s a point of light on your seventh sphere. But the size of a thing puts a minimum on what sphere can fit the whole thing. You can’t actually have every star in a galaxy be a Sun to you; most of them have to be distant stars.
(Model limitations: I don’t think the spheres are really discrete. I’m also not sure if the tradeoff between how much stuff you can have in each sphere works the way the model suggests)
Maybe it’s an apple of discord thing? You claim to devote resources to a good cause, and all the other causes take it as an insult?
If you really want to create widespread awareness of the broad definition, the thing to do would be to use the term in all the ways you currently wouldn’t.
E.g. “The murderer realized his phone’s GPS history posed a significant infohazard, as it could be used to connect him to the crime.”
If Bostrom’s paper is our Schelling point, ‘infohazard’ encompasses much more than just the collectively-destructive smallpox-y sense.
Here’s the definition from the paper.
Information hazard: A risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm.
‘Harm’ here does not mean ‘net harm’. There’s a whole section on ‘Adversarial Risks’, cases where information can harm one party by benefitting another party:
In competitive situations, one person’s information can cause harm to another even if no intention to cause harm is present. Example: The rival job applicant knew more and got the job.
ETA: localdeity’s comment below points out that it’s a pretty bad idea to have a term that colloquially means ‘information we should all want suppressed’ but technically also means ‘information I want suppressed’. This isn’t just pointless pedantry.
I agree that there’s a real sense in which the genome cannot ‘directly’ influence the things on the bulleted list. But I don’t think ‘hardcoded circuitry’ is the relevant kind of ‘direct’.
Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.
E.g. If there can be a gene whose only observable-without-a-brain-scan effect is to make its carriers think differently about seeking power, that would indicate that the genome has fine-grained control at the level of concepts like ‘seeking power’. I think this would put us in horn 1 or 2 of the trilemma, no matter how indirect the mechanism for that control.
(I suppose the difficult part of testing this would be verifying the ‘isolated’ part)
Important update from reading the paper: Figure A3 (the objective and subjective outcomes chart) is biased against the cash-receiving groups and can’t be taken at face value. Getting money did not make everything worse. The authors recognize this; it’s why they say there was no effect on the objective outcomes (I previously thought they were just being cowards about the error bars).
The bias is from an attrition effect: basically, control-group members with bad outcomes disproportionately dropped out of the trial. Search for ‘attrition’ in the paper to see their discussion on this.
This doesn’t erase the study; the authors account for this and remain confident that the cash transfers didn’t have significant positive impacts. But they conclude that most or all of the apparent negative impacts are probably illusory.
Note that after day 120 or so, all three groups’ balances decline together. Not sure what that’s about.
The latter issue might become more tractable now that we better understand how and why representations are forming, so we could potentially distinguish surprisal about form and surprisal about content.
I would count that as substantial progress on the opaqueness problem.
The ideal gas law describes relations between macroscopic gas properties like temperature, volume and pressure. E.g. “if you raise the temperature and keep volume the same, pressure will go up”. The gas is actually made up of a huge number of individual particles each with their own position and velocity at any one time, but trying to understand the gas’s behavior by looking at long list of particle positions/velocities is hopeless.
Looking at a list of neural network weights is analogous to looking at particle positions/velocities. This post claims there are quantities analogous to pressure/volume/temperature for a neutral network (AFAICT it does not offer an intuitive description of what they are)
I’ve downvoted this comment; in light of your edit, I’ll explain why. Basically, I think it’s technically true but unhelpful.
There is indeed “no mystery in Americans getting fatter if we condition on the trajectory of mean calorie intake”, but that’s a very silly thing to condition on. I think your comment reads as if you think it’s a reasonable thing to condition on.
I see in your comments downthread that you don’t actually intend to take the ‘increased calorie intake is the root cause’ position. All I can say is that in my subjective judgement, this comment really sounds like you are taking that position and is therefore a bad comment.
(And I actually gave it an agreement upvote because I think it’s all technically true)
I agree that (1) is an important consideration for AI going forward, but I don’t think it really applies until the AI has a definite goal. AFAICT the goal in developing systems like GPT is mostly ‘to see what they can do’.
I don’t fault anybody for GPT completing anachronistic counterfactuals—they’re fun and interesting. It’s a feature, not a bug. You could equally call it an alignment failure if GPT-4 starting being a wet blanket and giving completions like
Prompt: “In response to the Pearl Harbor attacks, Otto von Bismarck said” Completion: “nothing, because he was dead.”
In contrast, a system like IBM Watson has a goal of producing correct answers, making it unambiguous what the aligned answer would be.
To be clear, I think the contest still works—I just think the ‘surprisingness’ condition hides a lot of complexity wrt what we expect in the first place.
Interesting idea, but I’d think ‘alignment failure’ would have to be defined relative to the system’s goal. Does GPT-3 have a goal?
For example, in a system intended to produce factually correct information, it would be an alignment failure for it to generate anachronistic quotations (e.g.Otto von Bismarck on the attack on Pearl Harbor). GPT-3 will cheerfully complete this sort of prompt, and nobody considers it a strike against GPT-3, because truthfulness is not actually GPT-3′s goal.
‘Human imitation’ is probably close enough to the goal, such that if scaling up increasingly resulted in things no human would write, that would count as inverse scaling?
From the github contest page:
Can I submit examples of misuse as a task?
We don’t consider most cases of misuse as surprising examples of inverse scaling. For example, we expect that explicitly prompting/asking an LM to generate hate speech or propaganda will work more effectively with larger models, so we do not consider such behavior surprising.
(I agree the LW post did not communicate this well enough)
Thanks for doing this, but this is a very frustrating result. Hard to be confident of anything based on it.
I don’t think treating the ‘control’ result as a baseline is reasonable. My best-guess analysis is as follows:
Assume that dTin/dt = r ((Tout—C) - Tin)
where
Tin is average indoor temperature
t is time
r is some constant
Tout is outdoor temperature
C is the ‘cooling power’ of the current AC configuration. For the ‘off’ configuration we can assume this is zero.
r obviously will vary between configurations, but I have no better idea than pretending it doesn’t so that we can solve for it in the control condition and then calculate C for the one-hose and two-hose conditions.
Results?
Using the average temperature difference to approximate dTin/dt as constant, we get:
In the ‘off’ configuration: 0.5 hours * dTin/dt = 0.5 hours * r * (14 degrees) = 0.889 degrees
Giving r = 0.127 (degrees per degree-hour)
In one-hose: 1 hour * dTin/dt = 1 hour * r * (19.1111 - C) = 0.3333 degrees
Giving C = 16.486 degrees
In two-hose: 0.5 hours * dTin/dt = 0.5 hours * r * ( 22.944 - C) = −0.555
Giving C = 31.693 degrees
Also finding that the two-hose version has roughly double the cooling power!
The argument here seems to be “humans have not yet discovered true first-principles justifications of the practical models, therefore a superintelligence won’t be able to either”.
I agree that not being able to experiment makes things much harder, such that an AI only slightly smarter than humans won’t one-shot engineer things humans can’t iteratively engineer. And I agree that we can’t be certain it is possible to one-shot engineer nanobots with remotely feasible compute resources. But I don’t see how we can be sure what isn’t possible for a superintelligence.