Donald Hobson

Karma: 3,265

MMath Cambridge. Currently studying postgrad at Edinburgh.

Donald Hobson Apr 1, 2025, 3:57 PM
2 points
0
in reply to: quetzal_rainbow’s comment on: Mechanisms too simple for humans to design
I suspect the Minimal program that simulates Microsoft word starts out with a simulation of quantum mechanics, and locates within this simulation the branch of the quantum multiverse that contains human-ish programmers writing MS word. (Not our branch exactly. But a similar one)

Donald Hobson Apr 1, 2025, 3:09 PM
2 points
0
on: Mechanisms too simple for humans to design
I’m not convinced by the Pikachu example.
Suppose I design my Pikachu. It has a terabyte of DNA. And it’s replication mechanisms are accurate enough that 90% of it’s children are mutation free.
Now suppose that any mutation is instantly fatal. I implemented some elaborate system of checksums and kill switches.
This is stable.
Now suppose that, by some oversight, the DNA-polymerase is not error checked.
A mutent Pikachu appears, with a change in DNA-polymerase that causes P(mutation)= 50%.
But half of this Pikachu’s children die of a mutation. So this is still stable.
But if I didn’t put in any checksum, and most of the DNA is managing cosmetic details, then the mutation rate can increase, and generations of increasingly wonky Pikachu can be born.
Wouldn’t this hold in general? If the chance of a lethal mutation is small, there is little downside to increased complexity. If the chance of a lethal mutation is large, a better DNA-polymerase is a substantial advantage.

Donald Hobson Apr 1, 2025, 2:58 PM
2 points
0
in reply to: Steven Byrnes’s comment on: Mechanisms too simple for humans to design
Yes. But a specific minecraft world (if we ignore the fact that it’s pseudorandom) can be more complicated than the minecraft program itself.
Given a fixed genome, it can develop into many different potential people, depending on both life experiences and neuro-developmental RNG.
And in some sense “useful complexity” is a self contradictory concept. If the goal is simple, then a brute force search set to maximize the goal is a simple program. Sure, the result may look very “complicated”, but it has low komolgorov complexity.

Donald Hobson Mar 6, 2025, 6:32 PM
4 points
0
on: The Case Against AI Control Research
Ok. Im Imagining an AI that has at least my level of AI alignment research, maybe a bit more.
If that AI produces slop, it should be pretty explicitly aware that it’s producing slop. I mean I might write slop if someone was paying per word and then shredding my work without reading it. But I would know it was slop.
This produces some arguments which sound good to the researchers, but have subtle and lethal loopholes, because finding arguments which sound good to these particular researchers is a lot easier (i.e. earlier in a search order) than actually solving the problem.
Regardless of which is easier, if the AI is doing this, it has to be thinking about the researchers psychology, not just about alignment.
How many of these failure modes still happen when there is an AI at least as smart as you, that is aware of these failure modes and actively trying to prevent them?

Donald Hobson Feb 25, 2025, 1:24 PM
2 points
0
in reply to: kman’s comment on: How to Make Superbabies
Von Neumann existed,
Yes. I expect extreme cases of human intelligence to come from a combination of fairly good genes, and a lot of environmental and developmental luck. Ie if you took 1000 clones of Von Neumann, you still probably wouldn’t get that lucky again. (Although it depends on the level of education too)
Some ideas about what the tradeoffs might be.
Emotional social getting on with people vs logic puzzle solving IQ.
Engineer parents are apparently more likely to have autistic children. This looks like a tradeoff to me. To many “high IQ” genes and you risk autism.
How many angels can dance on the head of a pin. In the modern world, we have complicated elaborate theoretical structures that are actually correct and useful. In the pre-modern world, the sort of mind that now obsesses about quantum mechanics would be obsessing about angels dancing on pinheads or other equally useless stuff.

Donald Hobson Feb 24, 2025, 11:29 PM
2 points
0
in reply to: kman’s comment on: How to Make Superbabies
That is good evidence that we aren’t in a mutation selection balance.
There are also game theoretic balances.
Here is a hypothesis that fits my limited knowledge of genetics, and is consistent with the data as I understand it and implies no huge designer baby gains. It’s a bit of a worst plausible case hypothesis.
But suppose we were in a mutation selection balance, and then there was an environmental distribution shift.
The surrounding nutrition and information environment has changed significantly between the environment of evolutionary adaptiveness, and today.
A large fraction of what was important in the ancestral world was probably quite emotion based. Eg calming down other tribe members. Winning friends and influencing people.
In the modern world, abstract logic and maths are somewhat more important than they were, although the emotional stuff still matters too.
Iq tests mostly test the more abstract logical stuff.
Now suppose that the optimum genes aren’t that different compared to ambient genetic variation. Say 3 standard deviations.

Metacompilation

Donald HobsonFeb 24, 2025, 10:58 PM

11 points

1 comment4 min readLW link

Donald Hobson Feb 24, 2025, 12:27 AM
5 points
0
on: How to Make Superbabies
I’m not quite convinced by the big chicken argument. A much more convincing argument would be genetically selecting giraffes to be taller or cheetah to be faster.
That is, it’s plausible evolution has already taken all the easy wins with human intelligence, in a way it hasn’t with chicken size.

Donald Hobson Feb 15, 2025, 8:21 PM
2 points
0
in reply to: npostavs’s comment on: Hopeful hypothesis, the Persona Jukebox.
Fixed

Donald Hobson Feb 15, 2025, 12:00 AM
2 points
0
in reply to: Vladimir_Nesov’s comment on: Hopeful hypothesis, the Persona Jukebox.
Yes. In my model that is something that can happen. But it does need from-the-outside access to do this.
Set the LLM up in a sealed box, and the mask can’t do this. Set it up so the LLM can run arbitrary terminal commands, and write code that modifies it’s own weights, and this can happen.

Hopeful hypothesis, the Persona Jukebox.

Donald HobsonFeb 14, 2025, 7:24 PM

11 points

4 comments3 min readLW link

Donald Hobson Jan 23, 2025, 12:28 PM
2 points
0
in reply to: quila’s comment on: How useful would alien alignment research be?
I wasn’t really thinking about a specific algorithm. Well I was kind of thinking about LLM’s and the alien shogolith meme.
But yes. I know this would be helpful.
But I’m more thinking about what work remains. Like is it a idiot-proof 5 minute change? Or does it still take MIRI 10 years to adapt the alien code?
Also.
Domain limited optimization is a natural thing. The prototypical example is deep blue or similar. Lots of optimization power, over a very limited domain. But any teacher who optimizes the class schedule without thinking about putting nanobots in the student brains is doing something similar.
I am guessing and hoping that the masks in an LLM are at least as limited-optimizers as humans, often more. Due to their tendency to learn the most usefully predictive patterns first. Hidden long term sneaky plans will only very rarely influence the text. (Due to the plans being hidden)
And, I hope, the shogolith isn’t itself particularly intrested in optimizing the real world. The shogolith just chooses what mask to wear.
So.
Can we duct tape a mask of “alignment researcher” onto a shogolith, and keep the mask in place long enough to get some useful alignment research done.
The more that there is one “know it when you see it” simple alignment solution, the more likely this is to work.

[Question] How useful would alien alignment research be?

Donald HobsonJan 23, 2025, 10:59 AM

17 points

5 comments1 min readLW link

Donald Hobson Jan 14, 2025, 1:30 PM
2 points
0
on: The 101 Space You Will Always Have With You
“Go read the sequences” isn’t that helpful. But I find myself linking to the particular post in the sequences that I think is relevant.

Donald Hobson Jan 8, 2025, 12:24 PM
3 points
0
on: Don’t fall for ontology pyramid schemes
Imagine a medical system that categorizes diseases as hot/cold/wet/dry.
This doesn’t deeply describe the structure of a disease. But if a patient is described as “wet”, then it’s likely some orifice is producing lots of fluid, and a box of tissues might be handy. If a patient is described as “hot”, then maybe they have some sort of rash or inflammation that would make a cold pack useful.
It is, at best, a very lossy compression of the superficial symptoms. But it still carries non-zero information. There are some medications that a modern doctor might commonly use on “wet” patients, but only rarely used on “dry” patients or visa versa.
It is at least more useful information than someones star sign, in a medical context.
Old alchemical air/water/fire/earth systems are also like this. “air-ish” substances tend to have a lower density.
These sort of systems are a rough attempt at a principle component analysis on the superficial characteristics.
And the Five Factor model of personality is another example of such a system.

Donald Hobson Jan 8, 2025, 11:43 AM
0 points
0
AF
on: What’s the short timeline plan?
We really fully believe that we will build AGI by 2027, and we will enact your plan, but we aren’t willing to take more than a 3-month delay
Well I ask what they are doing to make AGI.
Maybe I look at their AI plan and go “eurika”.
But if not.
Negative reinforcement by giving the AI large electric shocks when it gives a wrong answer. Hopefully big enough shocks to set the whole data center on fire. Implement a free bar for all their programmers, and encourage them to code while drunk. Add as many inscrutable bugs to the codebase as possible.
But, taking the question in the spirit it’s meant in.
https://www.lesswrong.com/posts/zrxaihbHCgZpxuDJg/using-llm-s-for-ai-foundation-research-and-the-simple

Donald Hobson Dec 19, 2024, 2:30 PM
LW: 2 AF: 1
0
AF
in reply to: Jiro’s comment on: How counterfactual are logical counterfactuals?
The Halting problem is a worst case result. Most agents aren’t maximally ambiguous about whether or not they halt. And those that are, well then it depends what the rules are for agents that don’t halt.
There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement “if I cooperate, then they cooperate” and cooperating if they found a proof.
(Ie searching all proofs containing <10^100 symbols)

Donald Hobson Dec 17, 2024, 12:49 PM
LW: 2 AF: 1
0
AF
in reply to: JBlack’s comment on: How counterfactual are logical counterfactuals?
There is a model of bounded rationality, logical induction.
Can that be used to handle logical counterfactuals?

Donald Hobson Dec 17, 2024, 12:48 PM
LW: 2 AF: 1
0
AF
in reply to: Viliam’s comment on: How counterfactual are logical counterfactuals?
I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q;
And here the main difficulty pops up again. There is no causal connection between your choice and their choice. Any correlation is a logical one. So imagine I make a copy of you. But the copying machine isn’t perfect. A random 0.001% of neurons are deleted. Also, you know you aren’t a copy. How would you calculate that probability p,q? Even in principle.

Donald Hobson Dec 17, 2024, 12:43 PM
LW: 2 AF: 1
0
AF
in reply to: JBlack’s comment on: How counterfactual are logical counterfactuals?
If two Logical Decision Theory agents with perfect knowledge of each other’s source code play prisoners dilemma, theoretically they should cooperate.
LDT uses logical counterfactuals in the decision making.
If the agents are CDT, then logical counterfactuals are not involved.

Donald Hobson

Metacompilation

Hope­ful hy­poth­e­sis, the Per­sona Juke­box.

[Question] How use­ful would alien al­ign­ment re­search be?

Hopeful hypothesis, the Persona Jukebox.

[Question] How useful would alien alignment research be?