Nathan Helm-Burger

Karma: 2,365

AI alignment researcher, ML engineer. Masters in Neuroscience.

I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here’s an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility.

See my prediction markets here:

https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg

I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.

I now work for SecureBio on AI-Evals.

relevant quote:

“There is a powerful effect to making a goal into someone’s full-time job: it becomes their identity. Safety engineering became its own subdiscipline, and these engineers saw it as their professional duty to reduce injury rates. They bristled at the suggestion that accidents were largely unavoidable, coming to suspect the opposite: that almost all accidents were avoidable, given the right tools, environment, and training.” https://www.lesswrong.com/posts/DQKgYhEYP86PLW7tZ/how-factories-were-made-safe

Nathan Helm-Burger 9 May 2024 5:09 UTC
2 points
0
in reply to: Petr Andreev’s comment on: Please stop publishing ideas/insights/research about AI
A bit of a rant, yes, but some good thoughts here.
I agree that unenforceable regulation can be a bad thing. On the other hand, it can also work in some limited ways. For example, the international agreements against heritable human genetic engineering seem to have held up fairly well. But I think that that requires supporting facts about the world to be true. It needs to not be obviously highly profitable to defectors, it needs to be relatively inaccessible to most people (requiring specialized tech and knowledge), it needs to fit with our collective intuitions (bio-engineering humans seems kinda icky to a lot of people).
The trouble is, all of these things fail to help us with the problem of dangerous AI! As you point out, many bitcoin miners have plenty of GPUs to be dangerous if we get even a couple more orders-of-magnitude algorithmic efficiency improvements. So it’s accessible. AI and AGI offer many tempting ways to acquire power and money in society. So it’s immediately and incrementally profitable. People aren’t as widely instinctively outraged by AI experiments as Bio-engineering experiments. So it’s not intuitively repulsive.
So yes, this seems to me to be very much a situation in which we should not place any trust in unenforceable regulation.
I also agree that we probably do need some sort of organization which enforces the necessary protections (detection and destruction) against rogue AI.
And it does seem potentially like a lot of human satisfaction could be bought in the near future with a focus on making sure everyone in the world gets a reasonable minimum amount of satisfaction from their physical and social environments as you describe here:
Usually, the median person is interested in: jobs, a full fridge, rituals, culture, the spread of their opinion leader’s information, dopamine, political and other random and inherited values, life, continuation of life, and the like. Provide a universal way of obtaining this and just monitor it calmly.
As Connor Leahy has said, we should be able to build sufficiently powerful tool-AI to not need to build AGI! Stop while we still have control! Use the wealth to buy off those who would try anyway. Also, build an enforcement agency to stop runaway AI or AI misuse.
I don’t know how we get there from here though.
Also, the offense-dominant weapons development landscape is looking really grim, and I don’t see how to easily patch that.
On the other hand, I don’t think we buy ourselves any chance of victory by trying to gag ourselves for fear of speeding up AGI development. It’s coming soon regardless of what we do! The race is short now, we need to act fast!
I don’t buy the arguments that our discussions here will make a significant impact in the timing of the arrival of AGI. That seems like hubris to me, to imagine we have such substantial effects, just from our discussions.
Code? Yes, code can be dangerous and shouldn’t be published if so.
Sufficiently detailed technical descriptions of potential advancements? Yeah, I can see that being dangerous.
Unsubstantiated commentary about a published paper being interesting and potentially having both capabilities and alignment value? I am unconvinced that such discussions meaningfully impact the experiments being undertaken in AI labs.

Nathan Helm-Burger 9 May 2024 4:32 UTC
2 points
−2
in reply to: mako yass’s comment on: Please stop publishing ideas/insights/research about AI
Hmm. Seems… fragile. I don’t think that’s a reason not to do it, but I also wouldn’t put much hope in the idea that leaks would be successfully prevented by this system.

Nathan Helm-Burger 9 May 2024 4:26 UTC
2 points
0
in reply to: O O’s comment on: Please stop publishing ideas/insights/research about AI
I think you make some valid points. In particular, I agree that some people seem to have fallen into a trap of being unrealistically pessimistic about AI outcomes which mirrors the errors of those AI developers and cheerleaders who are being unrealistically optimistic.
On the other hand, I disagree with this critique (although I can see where you’re coming from):
If it’s instead a boring engineering problem, this stops being a quest to save the world or an all consuming issue. Incremental alignment work might solve it, so in order to preserve the difficulty of the issue, it will cause extinction for some far-fetched reason. Building precursor models then bootstrapping alignment might solve it, so this “foom” is invented and held on to (for a lot of highly speculative assumptions), because that would stop it from being a boring engineering problem that requires lots of effort and instead something a lone genius will have to solve.
I think that FOOM is a real risk, and I have a lot of evidence grounding my calculations about available algorithmic efficiency improvements based on estimates of the compute of the human brain. The conclusion I draw from believing that FOOM is both possible, and indeed likely, after a certain threshold of AI R&D capability is reached by AI models is that preventing/controlling FOOM is an engineering problem.
I don’t think we should expect a model in training to become super-human so fast that it blows past our ability to evaluate it. I do think that in order to have the best chance of catching and controlling a rapid accelerating take-off, we need to do pre-emptive engineering work. We need very comprehensive evals to have detailed measures of key factors like general capability, reasoning, deception, self-preservation, and agency. We need carefully designed high-security training facilities with air-gapped datacenters. We need regulation that prevents irresponsible actors from undertaking unsafe experiments. Indeed, most of the critical work to preventing uncontrolled rogue AGI due to FOOM is well described by ‘boring engineering problems’ or ‘boring regulation and enforcement problems’.
Believing in the dangers of recursive self-improvement doesn’t necessarily involve believing that the best solution is a genius theoretical answer to value and intent alignment. I wouldn’t rule the chance of that out, but I certainly don’t expect that slim possibility. It seems foolish to trust in that the primary hope for humanity. Instead, let’s focus on doing the necessary engineering and political work so that we can proceed with reasonable safety measures in place!

Nathan Helm-Burger 6 May 2024 21:16 UTC
1 point
0
on: Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
I think this is a good description of the problem. The fact that Einstein’s brain had a similar amount of compute and data, similar overall architecture, similar fundamental learning algorithm means that a brain-like algorithm can substantially improve in capability without big changes to these things. How similar to the brain’s learning algorithm does an ML algorithm have to be before we should expect similar effects? That seems unclear to me. I think a lot of people who try to make forecasts about AI progress are greatly underestimating the potential impact of algorithm development, and how the rate of algorithmic progress could be accelerated by large-scale automated searches by sub-AGI models like GPT-5.

A related market I have on manifold:

https://manifold.markets/NathanHelmBurger/gpt5-plus-scaffolding-and-inference

https://manifold.markets/NathanHelmBurger/1hour-agi-a-system-capable-of-any-c

A related comment I made on a different post:

https://www.lesswrong.com/posts/sfWPjmfZY4Q5qFC5o/why-i-m-doing-pauseai?commentId=p2avaaRpyqXnMrvWE

Nathan Helm-Burger 4 May 2024 17:19 UTC
4 points
2
on: Shannon Vallor’s “technomoral virtues”
Something I think humanity is going to have to grapple with soon is the ethics of self-modification / self-improvement, and the perils of value-shift due to rapid internal and external changes. How do we stay true to ourselves while changing fundamental aspects of what it means to be human?

Nathan Helm-Burger 4 May 2024 15:54 UTC
3 points
0
on: OHGOOD: A coordination body for compute governance
This is a solid seeming proposal. If we are in a world where the majority of danger comes from big datacenters and large training runs, I predict that this sort of regulation would be helpful. I don’t think we are in that world though, which I think limits how useful this would be. Further explanation here: https://www.lesswrong.com/posts/sfWPjmfZY4Q5qFC5o/why-i-m-doing-pauseai?commentId=p2avaaRpyqXnMrvWE

Nathan Helm-Burger 4 May 2024 5:32 UTC
5 points
2
on: How does the ever-increasing use of AI in the military for the direct purpose of murdering people affect your p(doom)?
Personally, I have gradually moved to seeing this as lowering my p(doom). I think humanity’s best chance is to politically coordinate to globally enforce strict AI regulation. I think the most likely route to this becoming politically feasible is through empirical demonstrations of the danger of AI. I think AI is more likely to be legibly empirically dangerous to political decision-makers if it is used in the military. Thus, I think military AI is, counter-intuitively, lowering p(doom). A big accident that caused military AI to kill thousands of innocent people that the military had not intended to kill could be really great for p(doom).
This is a sad thing to think, obviously. I’m hopeful we can come up with harmless demonstrations of the dangers involved, so that political action will be taken without anyone needing to be killed.
In scenarios where AI becomes powerful enough to present an extinction risk to humanity, I don’t expect that the level of robotic weaponry it has control over to matter much. It will have many many opportunities to hurt humanity that look nothing like armed robots and greatly exceed the power of armed robots.

Nathan Helm-Burger 4 May 2024 4:36 UTC
2 points
0
on: Why I’m doing PauseAI
I absolutely sympathize, and I agree that with the world view / information you have that advocating for a pause makes sense. I would get behind ‘regulate AI’ or ‘regulate AGI’, certainly. I think though that pausing is an incorrect strategy which would do more harm than good, so despite being aligned with you in being concerned about AGI dangers, I don’t endorse that strategy.
Some part of me thinks this oughtn’t matter, since there’s approximately ~0% chance of the movement achieving that literal goal. The point is to build an anti-AGI movement, and to get people thinking about what it would be like to be able to have the government able to issue an order to pause AGI R&D, or turn off datacenters, or whatever. I think that’s a good aim, and your protests probably (slightly) help that aim.
I’m still hung up on the literal ‘Pause AI’ concept being a problem though. Here’s where I’m coming from:

1. I’ve been analyzing the risks of current day AI. I believe (but will not offer evidence for here) current day AI is already capable of providing small-but-meaningful uplift to bad actors intending to use it for harm (e.g. weapon development). I think that having stronger AI in the hands of government agencies designed to protect humanity from these harms is one of our best chances at preventing such harms.
2. I see the ‘Pause AI’ movement as being targeted mostly at large companies, since I don’t see any plausible way for a government or a protest movement to enforce what private individuals do with their home computers. Perhaps you think this is fine because you think that most of the future dangers posed by AI derive from actions taken by large companies or organizations with large amounts of compute. This is emphatically not my view. I think that actually more danger comes from the many independent researchers and hobbyists who are exploring the problem space. I believe there are huge algorithmic power gains which can, and eventually will, be found. I furthermore believe that beyond a certain threshold, AI will be powerful enough to rapidly self-improve far beyond human capability. In other words, I think every AI researcher in the world with a computer is like a child playing with matches in a drought-stricken forest. Any little flame, no matter how small, could set it all ablaze and kill everyone. Are the big labs playing with bonfires dangerous? Certainly. But they are also visible, and can be regulated and made to be reasonably safe by the government. And the results of their work are the only feasible protection we have against the possibility of FOOM-ing rogue AGI launched by small independent researchers. Thus, pausing the big labs would, in my view, place us in greater danger rather than less danger. I think we are already well within the window of risk from independent-researcher-project-initiated-FOOM. Thus, the faster we get the big labs to develop and deploy worldwide AI-watchdogs, the sooner we will be out of danger.
I know these views are not the majority views held by any group (that I know of). These are my personal inside views from extensive research. If you are curious about why I hold these views, or more details about what I believe, feel free to ask. I’ll answer if I can.

Nathan Helm-Burger 4 May 2024 4:04 UTC
2 points
0
on: An Introduction to AI Sandbagging
I’ve mentioned it elsewhere, but I’ll repeat it again here since it’s relevant. For GPT-style transformers, and probably for other model types, you can smoothly subtly degrade the performance of the model by adding in noise to part or all of the activations. This is particularly useful for detecting sandbagging, because you would expect sandbagging to show up as an anomalous increase in capability, breaking the smooth downward trend in capability, as you increased the amount of noise injected or fraction of activations to which noise was added. I found that there was noticeable decrease in performance even when the noise was added even to a small fraction (e.g. < 1%) of the activations and was created to be small relative to the given activation-magnitude it was being added to.

Nathan Helm-Burger 4 May 2024 3:36 UTC
2 points
0
on: KAN: Kolmogorov-Arnold Networks
So, after reading the KAN paper, and thinking about it in the context of this post: https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their
My vague intuition is that the same experiment done with a KAN would result in a clearer fractal which wiggled less once training loss had plateaued. Is that also other people’s intuition?

Nathan Helm-Burger 3 May 2024 21:52 UTC
5 points
0
in reply to: the gears to ascension’s comment on: Why is AGI/ASI Inevitable?
I, on the other hand, have very little confidence that people trying to build AGI will fail to quickly (within the next 3 years, aka 2027) find ways to do it. I do have confidence that we can politically coordinate to stop the situation becoming an extinction or near-extinction-level catastrophe. So I place much less emphasis on abstaining from publishing ideas which may help both alignment and capabilities, and more emphasis on figuring out ways to generate empirical evidence of the danger before it is too late, so as to facilitate political coordination.
I think that the situation in which humanity fails to politically coordinate to avoid building catastrophically dangerous AI is a situation that leads into conflict, likely a World War III with wide-spread use of nuclear weapons. I don’t expect humanity to go extinct from this and I don’t expect the rogue AGI to emerge as the victor, but I do think it is in everyone’s interests to work hard to avoid such a devastating conflict. I do think that any such conflict would likely wipe out the majority of humanity. That’s a pretty grim risk to be facing on the horizon.

Nathan Helm-Burger 2 May 2024 21:01 UTC
3 points
2
in reply to: EStokes’s comment on: Open Thread Spring 2024
EY may be too busy to respond, but you can probably feel pretty safe consulting with MIRI employees in general. Perhaps also Conjecture employees, and Redwood Research employees, if you read and agree with their views on safety. That at least gives you a wider net of people to potentially give you feedback.

Nathan Helm-Burger 2 May 2024 20:51 UTC
3 points
2
on: Open Thread Spring 2024
Some features I’d like:
a ‘mark read’ button next to posts so I could easily mark as read posts that I’ve read elsewhere (e.g. ones cross-posted from a blog I follow)
a ‘not interested’ button which would stop a given post from appearing in my latest or recommended lists. Ideally, this would also update my recommended posts so as to recommend fewer posts like that to me. (Note: the hide-from-front-page button could be this if A. It worked even on promoted/starred posts, and B. it wasn’t hidden in a three-dot menu where it’s frustrating to access)
a ‘read later’ button which will put the post into a reading list for me that I can come back to later.
a toggle button for ‘show all’ / ‘show only unread’ so that I could easily switch between the two modes.
These features would help me keep my ‘front page’ feeling cleaner and more focused.

Nathan Helm-Burger 2 May 2024 19:58 UTC
4 points
0
in reply to: Garrett Baker’s comment on: D0TheMath’s Shortform
Yeah, I agree that releasing open-weights non-frontier models doesn’t seem like a frontier capabilities advance. It does seem potentially like an open-source capabilities advance.

That can be bad in different ways. Let me pose a couple hypotheticals.
1. What if frontier models were already capable of causing grave harms to the world if used by bad actors, and it is only the fact that they are kept safety-fine-tuned and restricted behind APIs that is preventing this? In such a case, it’s a dangerous thing to have open-weight models catching up.
2. What if there is some threshold beyond which a model would be capable enough of recursive self-improvement with sufficient scaffolding and unwise pressure from an incautious user. Again, the frontier labs might well abstain from this course. Especially if they weren’t sure they could trust the new model design created by the current AI. They would likely move slowly and cautiously at least. I would not expect this of the open-source community. They seem focused on pushing the boundaries of agent-scaffolding and incautiously exploring the whatever they can.
So, as we get closer to danger, open-weight models take on more safety significance.

Nathan Helm-Burger 2 May 2024 19:46 UTC
15 points
5
on: Why is AGI/ASI Inevitable?
I think people can in theory collectively decide not to build AGI or ASI.

Certainly you as an individual can choose this! Where things get tricky is when asking whether that outcome seems probable, or coming up with a plan to bring that outcome about. Similarly, as a child I wondered, “Why can’t people just choose not to have wars, just decide not to kill each other?”

People have selfish desires, and group loyalty instincts, and limited communication and coordination capacity, and the world is arranged in such a way that sometimes this leads to escalating cycles of group conflict that are net bad for everyone involved.

That’s the scenario I think we are in with AI development also. Everyone would be safer if we didn’t, but getting everyone to agree not to and hold to that agreement even in private seems intractably hard.

[Edit: Here’s a link to Steven Pinker’s writing on the Evolution of War. I don’t think, as he does, that the world is trending strongly towards global peace, but I do think he has some valid insights into the sad lose-lose nature of war.]

Nathan Helm-Burger 1 May 2024 20:16 UTC
2 points
0
in reply to: Mateusz Bagiński’s comment on: KAN: Kolmogorov-Arnold Networks
I’m not so sure. You might be right, but I suspect that catastrophic forgetting may still be playing an important role in limiting the peak capabilities of an LLM of given size. Would it be possible to continue Llama3 8B’s training much much longer and have it eventually outcompete Llama3 405B stopped at its normal training endpoint?

I think probably not? And I suspect that if not, that part (but not all) of the reason would be catastrophic forgetting. Another part would be limited expressivity of smaller models, another thing which the KANs seem to help with.

Nathan Helm-Burger 1 May 2024 17:41 UTC
4 points
0
on: KAN: Kolmogorov-Arnold Networks
Wow, this is super fascinating.

A juicy tidbit:

Catastrophic forgetting is a serious problem in current machine learning [24]. When a human masters a task and switches to another task, they do not forget how to perform the first task. Unfortunately, this is not the case for neural networks. When a neural network is trained on task 1 and then shifted to being trained on task 2, the network will soon forget about how to perform task 1. A key difference between artificial neural networks and human brains is that human brains have functionally distinct modules placed locally in space. When a new task is learned, structure re-organization only occurs in local regions responsible for relevant skills [25, 26], leaving other regions intact. Most artificial neural networks, including MLPs, do not have this notion of locality, which is probably the reason for catastrophic forgetting.

Nathan Helm-Burger 1 May 2024 2:26 UTC
4 points
0
in reply to: faul_sname’s comment on: Nathan Helm-Burger’s Shortform
Yeah, I was playing around with using a VAE to compress the logits output from a language transformer. I did indeed settle on treating the vocab size (e.g. 100,000) as the ‘channels’.

Nathan Helm-Burger 30 Apr 2024 17:09 UTC
8 points
0
on: Nathan Helm-Burger’s Shortform
So when trying to work with language data vs image data, an interesting assumption of the ml vision research community clashes with an assumption of the language research community. For a language model, you represent the logits as a tensor with shape [batch_size, sequence_length, vocab_size]. For each position in the sequence, there are a variety of likelihood values of possible tokens for that position.
In vision models, the assumption is that the data will be in the form [batch_size, color_channels, pixel_position]. Pixel position can be represented as a 2d tensor or flattened to 1d.
See the difference? Sequence position comes first, pixel position comes second. Why? Because a color channel has a particular meaning, and thus it is intuitive for a researcher working with vision data to think about the ‘red channel’ as a thing which they might want to separate out to view. What if we thought of 2nd-most-probable tokens the same way? Is it meaningful to read a sequence of all 1st-most-probable tokens, then read a sequence of all 2nd-most-probable tokens? You could compare the semantic meaning, and the vibe, of the two sets. But this distinction doesn’t feel as natural for language logits as it does for color channels.

Nathan Helm-Burger 29 Apr 2024 18:23 UTC
3 points
0
on: Disentangling Competence and Intelligence
This is something I’ve been thinking about as well, and I think you do a good job explaining it. There’s definitely more to breakdown and analyze within competence and intelligence. Such as simulation being a distinct sort of part of intelligence. A measure of how many moves a player can think ahead in a strategy game like chess or Go. How large of a possibility-tree can they build in the available time? With what rate of errors? How quickly does the probability of error increase as the tree increases in size? How does their performance decrease as the complexity of the variables needed to be tracked for an accurate simulation increase?