George Ingebretsen

Karma: 152

(EE)CS undergraduate at UC Berkeley

Current intern at CHAI

Previously: high-level interpretability with @Jozdien, SLT with @Lucius Bushnaq, robustness with Kellin Pelrine

I often change my mind and don’t necessarily endorse things I’ve written in the past

georgeingebretsen.github.io

George Ingebretsen Sep 9, 2024, 5:59 PM
21 points
3
on: George Ingebretsen’s Shortform
When making safety cases for alignment, its important to remember that defense against single-turn attacks doesn’t always imply defense against multi-turn attacks.

Our recent paper shows a case where breaking up a single turn attack into multiple prompts (spreading it out over the conversation) changes which models/guardrails are vulnerable to the jailbreak.

Robustness against the single-turn version didn’t imply robustness against the multi-turn version of the attack, and robustness against the multi-turn version didn’t imply robustness against the single-turn version of the attack.

George Ingebretsen’s Shortform

George IngebretsenSep 9, 2024, 5:59 PM

2 points

6 comments LW link

George Ingebretsen Sep 3, 2024, 1:36 AM
4 points
0
in reply to: harsimony’s comment on: The Hessian rank bounds the learning coefficient
The rank of a matrix = the number of non-zero eigenvalues of the matrix! So you can either use the top eigenvalues to count the non-zeros, or you can use the fact that an $n \times n$ matrix always has $n$ eigenvalues to determine the number of non-zero eigenvalues by counting the bottom zero-eigenvalues.

Also for more detail on the “getting hessian eigenvalues without calculating the full hessian” thing, I’d really recommend Johns explanation in this linear algebra lecture he recorded.

George Ingebretsen Aug 27, 2024, 8:08 PM
5 points
0
on: Why Large Bureaucratic Organizations?
Sure, securing an economic surplus is sometimes part of an interesting challenge, and it can presumably get one invited to lots of cool parties, but controlling surplus is typically not as central and necessary to “achievement” and “association” as to “power”.
I guess that the ultra-deadly ingredient here is the manager gaining status when more people are hired, but hardly has any personal stake in the money that gets spent on new hires.
If given the choice between receiving the salary of a would-be new hire, or getting a new bs hire underling for status, I’d definitely expect most people to take the double salary option.
Like I don’t expect these two contrasting experiences to really stack up to each other. I think if it’s all the same person weighing these two options, the extra money would blow the status option out of the water.
That’s a pretty clean story for why in smaller, say 2-5 person companies, having less bs jobs is something I’d predict (though I don’t have sources to confirm this prediction). In these smaller companies, when the person you’re hiring gets payed by a noticeable hit in your own paycheck, I wonder if the experience of “ugh, this ineffective person is costing me money” just dramatically cancels out the status thing.
And then potentially the issue here is that big companies tend to separate the ugh-this-costs-me-money person from the woohoo-more-status person?

George Ingebretsen Aug 13, 2024, 7:39 AM
3 points
0
on: [ASoT] Simulators show us behavioural properties by default
Thought this paper (published after this post) seemed relevant: Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

George Ingebretsen Jun 13, 2024, 3:14 PM
2 points
0
in reply to: Richard_Kennaway’s comment on: Probably Not a Ghost Story
Ah shoot, I didn’t catch the ambiguity—it was just my partner asking me to turn off the lights, which is much less weird. (I edited the post to make it clearer, thanks!)
Still, it must have had some Kabbalistic significance.

Probably Not a Ghost Story

George IngebretsenJun 12, 2024, 10:55 PM

27 points

4 comments3 min readLW link

George Ingebretsen Jun 11, 2024, 4:22 PM
1 point
0
in reply to: Arjun Panickssery’s comment on: Your LLM Judge may be biased
Also “Large Language Models Are Not Robust Multiple Choice Selectors” (Zheng et al., 2023)

This work shows that modern LLMs are vulnerable to option position changes in MCQs due to their inherent “selection bias”, namely, they prefer to select specific option IDs as answers (like “Option A”). Through extensive empirical analyses with 20 LLMs on three benchmarks, we pinpoint that this behavioral bias primarily stems from LLMs’ token bias, where the model a priori assigns more probabilistic mass to specific option ID tokens (e.g., A/B/C/D) when predicting answers from the option IDs. To mitigate selection bias, we propose a label-free, inference-time debiasing method, called PriDe, which separates the model’s prior bias for option IDs from the overall prediction distribution. PriDe first estimates the prior by permutating option contents on a small number of test samples, and then applies the estimated prior to debias the remaining samples. We demonstrate that it achieves interpretable and transferable debiasing with high computational efficiency. We hope this work can draw broader research attention to the bias and robustness of modern LLMs.

George Ingebretsen May 16, 2024, 12:18 AM
3 points
0
in reply to: Aidan Ewart’s comment on: Decomposing independent generalizations in neural networks via Hessian analysis
Was an explanation of the unsupervised algorithm / sphere search ever published somewhere?

George Ingebretsen Oct 17, 2023, 12:20 AM
1 point
on: Lessons I’ve Learned from Self-Teaching
Random Anki tips
When using Anki for a scattered assortment of topics, is it best to group topics into different decks or not worry about it and just have a single big deck.
If making multiple decks, would Anki still give you a single “do this today” set?
I’ve only used it in the context of studying foreign language.

George Ingebretsen

Ge­orge Inge­bret­sen’s Shortform

Prob­a­bly Not a Ghost Story

George Ingebretsen’s Shortform

Probably Not a Ghost Story