Itay Yona

Karma: 26

Itay Yona Jun 18, 2025, 12:33 PM
2 points
0
on: The Curious Case of the bos_token
Great stuff! Shamelessly sharing my work: this emergent behavior is also the reason for a known LLM failure mode in repeating the same token.

Interpreting the Repeated Token Phenomenon in Large Language Models
https://arxiv.org/abs/2503.08908

Itay Yona Apr 6, 2024, 4:17 PM
3 points
0
in reply to: Gunnar_Zarncke’s comment on: Inferring the model dimension of API-protected LLMs
The true rank is revealed because the output dimensionality is vocab_size, which is >> hidden_dim. It is unclear how to get something equivalent to that from the cortex. It is possible to record multiple neurons (population) and use dimensionality reduction (usually some sort of manifold learning) to learn the true dimensionality of the population. It is useful in some areas of the brain such as the hippocampal formation.

Reflections on Trusting Trust & AI

Itay YonaJan 16, 2023, 6:36 AM

10 points

1 comment3 min readLW link

(mentaleap.ai)

Itay Yona Dec 27, 2022, 10:33 PM
3 points
0
in reply to: Florian Magin’s comment on: Analogies between Software Reverse Engineering and Mechanistic Interpretability
Thanks, that’s a good insight. The graph representation of code is very different than automated decompiling like hex-rays in my opinion. I agree that graph representation is probably the most critical step towards a more high-level analysis and understanding. I am not sure why you claim it required decades of tools because since the dawn of computer-science turing-machines were described with graphs.
In any case this is an interesting point as it suggest we might want to focus on finding graph-like concepts which will be useful for describing the different states of a neural network computation, and later developing IDA-like tool :)
since we share similar backgrounds and aspiration feel free to reach out:
https://www.linkedin.com/in/itay-yona-b40a7756/

Itay Yona Dec 27, 2022, 10:19 PM
5 points
0
in reply to: LawrenceC’s comment on: Analogies between Software Reverse Engineering and Mechanistic Interpretability
I strongly agree! When you study towards RE it is critical to understand lots of details about how the machine works, and most people I knew were already familiar with those. They were lacking the skills of using their low-level understanding to actually conduct useful research effectively.
It is natural to pay much less attention to 1->2 phase since there are much more intermediate researchers than complete newbies or experts. It is interesting because when discussing with the intermediate researchers they might think they are discussing with person 1 instead of person 3.

Thanks you gave me something to think about :)

Analogies between Software Reverse Engineering and Mechanistic Interpretability

Neel Nanda and Itay Yona

Dec 26, 2022, 12:26 PM

34 points

6 comments11 min readLW link

(www.neelnanda.io)

Itay Yona Jun 6, 2022, 10:10 PM
1 point
0
on: Worth thinking about a meme-theoretic frame for AI safety?
[In my opinion]
Memes are self-replicating concepts (given you have enough humans to spread them). Highly capable minds are different as they contain predictive models of: world, self, and others. This allows them to manipulate both objects in the world, and other people to fulfill their needs. Since memes don’t have these capacities, and even though they are related to human behavior, they should not be accounted as the cause of human behavior. Even if the best way to explain human behavior is through memes, they don’t necessarily account of most of the decision-making process.
[/In my opinion]

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer