George Ingebretsen

Karma: 152

(EE)CS undergraduate at UC Berkeley

Current intern at CHAI

Previously: high-level interpretability with @Jozdien, SLT with @Lucius Bushnaq, robustness with Kellin Pelrine

I often change my mind and don’t necessarily endorse things I’ve written in the past

georgeingebretsen.github.io

George Ingebretsen Apr 21, 2025, 1:15 AM
1 point
0
on: To be legible, evidence of misalignment probably has to be behavioral

Even if the internals-based method is extremely well supported theoretically and empirically (which seems quite unlikely), I don’t think this would suffice for this to trigger a strong response by convincing relevant people

Its hard for me to imagine a world where we really have internals-based methods that are “extremely well supported theoretically and empirically,” so I notice that I should take a second to try and imagine such a world before accepting the claim that internals-based evidence wouldn’t convince the relevant people...

Today, the relevant people probably wouldn’t do much in response to the interp team saying something like: “our deception SAE is firing when we ask the model bio risk questions, so we suspect sandbagging.”

But I wonder how much of this response is a product of a background assumption that modern-day interp tools are finicky and you can’t always trust them. So in a world where we really have internals-based methods that are “extremely well supported theoretically and empirically,” I wonder if it’d be treated differently?

(I.e. a culture that could respond more like: “this interp tool is a good indicator of whether or not that the model is deceptive, and just because you can get the model to say something bad doesn’t mean its actually bad” or something? Kinda like the reactions to the o1 apollo result)

Edit: Though maybe this culture change would take too long to be relevant.

George Ingebretsen Apr 17, 2025, 10:12 PM
3 points
0
in reply to: Garrett Baker’s comment on: Ryan Kidd’s Shortform

“Lots of very small experiments playing around with various parameters” … “then a slow scale up to bigger and bigger models”

This Dwarkesh timestamp with Jeff Dean & Noam Shazeer seems to confirm this.

“I’d also guess that the bottleneck isn’t so much on the number of people playing around with the parameters, but much more on good heuristics regarding which parameters to play around with.”

That would mostly explain this question as well: “If parallelized experimentation drives so much algorithmic progress, why doesn’t gdm just hire hundreds of researchers, each with small compute budgets, to run these experiments?”

It would also imply that it would be a big deal if they had an AI with good heuristics for this kind of thing.

George Ingebretsen Feb 25, 2025, 5:26 AM
2 points
0
on: Who’s track record of AI predictions would you like to see evaluated?
I would love to see an analysis and overview of predictions from the Dwarkesh podcast with Leopold. One for Situational awareness would be great too.

George Ingebretsen Feb 13, 2025, 12:04 AM
6 points
1
in reply to: Sterrs’s comment on: Sterrs’s Shortform
Seems like a pretty similar thesis to this: https://www.lesswrong.com/posts/fPvssZk3AoDzXwfwJ/universal-basic-income-and-poverty

George Ingebretsen Feb 11, 2025, 6:13 AM
19 points
5
on: George Ingebretsen’s Shortform
I expect that within a year or two, there will be an enormous surge of people who start paying a lot of attention to AI.

This could mean that the distribution of who has influence will change a lot. (And this might be right when influence matters the most?)

I claim: your effect on AI discourse post-surge will be primarily shaped by how well you or your organization absorbs this boom.

The areas I’ve thought the most about this phenomena are:
1. AI safety university groups
2. Non agi lab research organizations
3. AI bloggers / X influencers
(But this applies to anyone who’s impact primarily comes from spreading their ideas, which is a lot of people.)

I think that you or your organization should have an explicit plan to absorb this surge.

Unresolved questions:
- How much will explicitly planning for this actually help absorb the surge? (Regardless, it seems worth a google doc and a pomodoro session to at least see if there’s anything you can do to prepare)
- How important is it to make every-day people informed about AI risks? Or is influence so long-tailed that it only really makes sense to build reputation with highly influential people? (Though- note that this surge isn’t just for every day people — I expect that the entire memetic landscape will be totally reformed after AI becomes clearly a big deal, and that applies to big shot government officials along with your average joe)
I’d be curious to see how this looked with Covid: Did all the covid pandemic experts get an even 10x multiplier in following? Or were a handful of Covid experts highly elevated, while the rest didn’t really see much of an increase in followers? If the latter, what did those experts do to get everyone to pay attention to them?

George Ingebretsen Feb 11, 2025, 4:56 AM
3 points
0
on: Nonpartisan AI safety
Securing AI labs against powerful adversaries seems like something that almost everyone can get on board with. Also, posing it as a national security threat seems to be a good framing.

George Ingebretsen Feb 6, 2025, 1:22 AM
4 points
3
in reply to: Nikola Jurkovic’s comment on: nikola’s Shortform
Some more links from the philosophical side that I’ve found myself returning to a lot:
- The fun theory sequence
- Three worlds collide
(Lately, it’s seemed to me that focusing my time on nearer-term / early but post-AGI futures seems better than spending my time discussing ideas like these on the margin, but this may be more of a fact about myself than it is about other people, I’m not sure.)

George Ingebretsen Dec 6, 2024, 9:50 PM
2 points
0
in reply to: Shankar Sivarajan’s comment on: o1 tried to avoid being shut down
Ah, sorry- I meant it’s genuinely unclear how to classify this.

George Ingebretsen Dec 5, 2024, 8:39 PM
11 points
8
on: o1 tried to avoid being shut down
Once again, props to OAI for putting this in the system card. Also, once again, it’s difficult to sort out “we told it to do a bad thing and it obeyed” from “we told it to do a good thing and it did a bad thing instead,” but these experiments do seem like important information.

George Ingebretsen Nov 27, 2024, 1:24 AM
1 point
0
on: Working hurts less than procrastinating, we fear the twinge of starting

“By then I knew that everything good and bad left an emptiness when it stopped. But if it was bad, the emptiness filled up by itself. If it was good you could only fill it by finding something better.”

- Hemingway, A Moveable Feast

George Ingebretsen Nov 19, 2024, 8:25 PM
3 points
2
on: Announcing turntrout.com, my new digital home
The fatebook embedding is so cool! I especially appreciate that it hides other people’s predictions before you make your own. From what I can tell this isn’t done on Lesswrong right now and I think that would be really cool to see!
(I may be mistaken on how this works, but from what I can tell they look like this on LW right now)

George Ingebretsen Nov 18, 2024, 1:41 AM
3 points
7
on: “It’s a 10% chance which I did 10 times, so it should be 100%”
Great post, seems like a handy thing to remember.

George Ingebretsen Nov 2, 2024, 11:09 PM
3 points
0
on: George Ingebretsen’s Shortform
The scene in planecrash where Keltham gives his first lecture, as an attempt to teach some formal logic (and a whole bunch of important concepts that usually don’t get properly taught in school), is something I’d highly recommend reading! As far as I can remember, you should be able to just pick it up right here, and follow the important parts of the lecture without understanding the story

George Ingebretsen Oct 24, 2024, 3:04 AM
13 points
2
on: Overcoming Bias Anthology
How difficult would it be to turn this into an epub or pdf? Is there word of that coming soon? (or integrating into LW like the Codex?)

George Ingebretsen Oct 14, 2024, 3:29 AM
4 points
2
in reply to: Raemon’s comment on: My theory of change for working in AI healthtech
Realizing I kind of misunderstood the point of the post. Thanks!

George Ingebretsen Oct 13, 2024, 5:54 PM
3 points
2
on: My theory of change for working in AI healthtech
In the case that there are, like “ai-run industries” and “non-ai-run industries”, I guess I’d expect the “ai-run industries” to gobble up all of the resources to the point that even though ai’s aren’t automating things like healthcare, there just aren’t any resources left?

George Ingebretsen Oct 13, 2024, 5:40 PM
23 points
18
on: Most arguments for AI Doom are either bad or weak
To be clear, if you put doom at 2-20%, you’re still quite worried then? Like, wishing humanity was dedicating more resources towards ensuring AI goes well, trying to make the world better positioned to handle this situation, and saddened by the fact that most people don’t see it as an issue?

George Ingebretsen Oct 5, 2024, 8:49 PM
3 points
2
on: Jailbreak steering generalization
I’d be really interested to see how the harmfulness feature relates to multi-turn jailbreaks! We recently explored splitting a cipher attack into a multi-turn jailbreak (where instead of passing-in the word mappings + the ciphered harmful prompt all at once, you pass in the word mappings, let the model respond, and then pass-in the harmful prompt).
I’d expect to see something like when you “spread out the harm” enough, such that no one prompt contains any blaring red flags, the harmfulness feature never reaches the critical threshold, or something?
Scale recently published some great multi-turn work too!

George Ingebretsen Oct 5, 2024, 7:56 PM
1 point
0
on: the case for CoT unfaithfulness is overstated
Edit: I think I subconsciously remembered this paper and accidentally re-invented it.

George Ingebretsen Sep 23, 2024, 10:54 PM
6 points
2
on: George Ingebretsen’s Shortform
Should it be more tabooed to put the bottom line in the title?

Titles like “in defense of <bottom line>” or just “<bottom line>” seem to:
1. Unnecessarily make it really easy for people to select content to read based on the conclusion it comes to
2. Frame the post as having the goal of convincing you of <bottom line>, and setting up the readers expectation as such. This seems like it would either put you in pause critical thinking to defend My Team mode (if you agree with the title), or continuously search for counter-arguments mode (if you disagree with the title).