MMath Cambridge. Currently studying postgrad at Edinburgh.
Donald Hobson
Von Neumann existed,
Yes. I expect extreme cases of human intelligence to come from a combination of fairly good genes, and a lot of environmental and developmental luck. Ie if you took 1000 clones of Von Neumann, you still probably wouldn’t get that lucky again. (Although it depends on the level of education too)
Some ideas about what the tradeoffs might be.
Emotional social getting on with people vs logic puzzle solving IQ.
Engineer parents are apparently more likely to have autistic children. This looks like a tradeoff to me. To many “high IQ” genes and you risk autism.
How many angels can dance on the head of a pin. In the modern world, we have complicated elaborate theoretical structures that are actually correct and useful. In the pre-modern world, the sort of mind that now obsesses about quantum mechanics would be obsessing about angels dancing on pinheads or other equally useless stuff.
That is good evidence that we aren’t in a mutation selection balance.
There are also game theoretic balances.
Here is a hypothesis that fits my limited knowledge of genetics, and is consistent with the data as I understand it and implies no huge designer baby gains. It’s a bit of a worst plausible case hypothesis.
But suppose we were in a mutation selection balance, and then there was an environmental distribution shift.
The surrounding nutrition and information environment has changed significantly between the environment of evolutionary adaptiveness, and today.
A large fraction of what was important in the ancestral world was probably quite emotion based. Eg calming down other tribe members. Winning friends and influencing people.
In the modern world, abstract logic and maths are somewhat more important than they were, although the emotional stuff still matters too.
Iq tests mostly test the more abstract logical stuff.
Now suppose that the optimum genes aren’t that different compared to ambient genetic variation. Say 3 standard deviations.
Metacompilation
I’m not quite convinced by the big chicken argument. A much more convincing argument would be genetically selecting giraffes to be taller or cheetah to be faster.
That is, it’s plausible evolution has already taken all the easy wins with human intelligence, in a way it hasn’t with chicken size.
Fixed
Yes. In my model that is something that can happen. But it does need from-the-outside access to do this.
Set the LLM up in a sealed box, and the mask can’t do this. Set it up so the LLM can run arbitrary terminal commands, and write code that modifies it’s own weights, and this can happen.
Hopeful hypothesis, the Persona Jukebox.
I wasn’t really thinking about a specific algorithm. Well I was kind of thinking about LLM’s and the alien shogolith meme.
But yes. I know this would be helpful.
But I’m more thinking about what work remains. Like is it a idiot-proof 5 minute change? Or does it still take MIRI 10 years to adapt the alien code?
Also.
Domain limited optimization is a natural thing. The prototypical example is deep blue or similar. Lots of optimization power, over a very limited domain. But any teacher who optimizes the class schedule without thinking about putting nanobots in the student brains is doing something similar.
I am guessing and hoping that the masks in an LLM are at least as limited-optimizers as humans, often more. Due to their tendency to learn the most usefully predictive patterns first. Hidden long term sneaky plans will only very rarely influence the text. (Due to the plans being hidden)
And, I hope, the shogolith isn’t itself particularly intrested in optimizing the real world. The shogolith just chooses what mask to wear.
So.
Can we duct tape a mask of “alignment researcher” onto a shogolith, and keep the mask in place long enough to get some useful alignment research done.
The more that there is one “know it when you see it” simple alignment solution, the more likely this is to work.
[Question] How useful would alien alignment research be?
“Go read the sequences” isn’t that helpful. But I find myself linking to the particular post in the sequences that I think is relevant.
Imagine a medical system that categorizes diseases as hot/cold/wet/dry.
This doesn’t deeply describe the structure of a disease. But if a patient is described as “wet”, then it’s likely some orifice is producing lots of fluid, and a box of tissues might be handy. If a patient is described as “hot”, then maybe they have some sort of rash or inflammation that would make a cold pack useful.
It is, at best, a very lossy compression of the superficial symptoms. But it still carries non-zero information. There are some medications that a modern doctor might commonly use on “wet” patients, but only rarely used on “dry” patients or visa versa.
It is at least more useful information than someones star sign, in a medical context.
Old alchemical air/water/fire/earth systems are also like this. “air-ish” substances tend to have a lower density.
These sort of systems are a rough attempt at a principle component analysis on the superficial characteristics.
And the Five Factor model of personality is another example of such a system.
We really fully believe that we will build AGI by 2027, and we will enact your plan, but we aren’t willing to take more than a 3-month delay
Well I ask what they are doing to make AGI.
Maybe I look at their AI plan and go “eurika”.
But if not.
Negative reinforcement by giving the AI large electric shocks when it gives a wrong answer. Hopefully big enough shocks to set the whole data center on fire. Implement a free bar for all their programmers, and encourage them to code while drunk. Add as many inscrutable bugs to the codebase as possible.
But, taking the question in the spirit it’s meant in.
The Halting problem is a worst case result. Most agents aren’t maximally ambiguous about whether or not they halt. And those that are, well then it depends what the rules are for agents that don’t halt.
There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement “if I cooperate, then they cooperate” and cooperating if they found a proof.
(Ie searching all proofs containing <10^100 symbols)
There is a model of bounded rationality, logical induction.
Can that be used to handle logical counterfactuals?
I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q;
And here the main difficulty pops up again. There is no causal connection between your choice and their choice. Any correlation is a logical one. So imagine I make a copy of you. But the copying machine isn’t perfect. A random 0.001% of neurons are deleted. Also, you know you aren’t a copy. How would you calculate that probability p,q? Even in principle.
If two Logical Decision Theory agents with perfect knowledge of each other’s source code play prisoners dilemma, theoretically they should cooperate.
LDT uses logical counterfactuals in the decision making.
If the agents are CDT, then logical counterfactuals are not involved.
[Question] How counterfactual are logical counterfactuals?
The research on humans in 0 g is only relevant if you want to send humans to mars. And such a mission is likely to end up being an ISS on mars. Or a moon landings reboot. A lot of newsprint and bandwidth expended talking about it. A small amount of science that could have been done more cheaply with a robot. And then everyone gets bored, they play golf on mars and people look at the bill and go “was that really worth it?”
Oh and you would contaminate mars with earth bacteria.
A substantially bigger, redesigned space station is fairly likely to be somewhat more expensive. And the point of all this is still not clear.
Current day NASA also happens to be in a failure mode where everything is 10 to 100 times more expensive than it needs to be, projects live or die based on politics not technical viability, and repeating the successes of the past seems unattainable. They aren’t good at innovating, especially not quickly and cheaply.
n tHere is a more intuitive version of the same paradox.
Again, conditional on all dice rolls being even. But this time it’s either
A) 1,000,000 consecutive 6′s.
B) 999,999 consecutive 6′s followed by a (possibly non-consecutive 6).
Suppose you roll a few even numbers, followed by an extremely lucky sequence of 999,999 6′s.
From the point of view of version A, the only way to continue the sequence is a single extra 6. If you roll 4, you would need to roll a second sequence of a million 6′. And you are very unlikely to do that in the next 10 million steps. And very unlikely to go for 10 million steps without rolling an odd number.
Yes if this happened, it would add at least a million extra rolls. But the chance of that is exponentially tiny.
Whereas, for B, then it’s quite plausible to roll 26 or 46 or 2426 instead of just 6.
Another way to think about this problem is with regular expressions. Let e=even numbers. *=0 or more.
The string “e*6e*6” matches any sequence with at least two 6′s and no odd numbers.
The sequence “e*66” matches those two consecutive 6′s. And the sequence “66″ matches two consecutive 6′s with no room for extra even numbers before the first 6. This is the shortest.
Phrased this way it looks obvious. Every time you allow a gap for even numbers to hide in, an even number might be hiding in the gap, and that makes the sequence longer.
When you remove the conditional on the other numbers being even, then the “first” becomes important to making the sequence converge at all.
Ok. Im Imagining an AI that has at least my level of AI alignment research, maybe a bit more.
If that AI produces slop, it should be pretty explicitly aware that it’s producing slop. I mean I might write slop if someone was paying per word and then shredding my work without reading it. But I would know it was slop.
Regardless of which is easier, if the AI is doing this, it has to be thinking about the researchers psychology, not just about alignment.
How many of these failure modes still happen when there is an AI at least as smart as you, that is aware of these failure modes and actively trying to prevent them?