rajathsalegame

Karma: 0

Exploring phase space

rajathsalegame Sep 25, 2024, 9:12 PM
1 point
0
in reply to: tailcalled’s comment on: Why I’m bearish on mechanistic interpretability: the shards are not in the network
I agree that it is dubious at the moment. I just think it’s too early to tell and the field itself will undoubtedly grow in complexity over the coming years.

Your point about the spontaneity of cells forming stands, although I wasn’t phrasing the analogy at the level of thermodynamics / physics.

rajathsalegame Sep 24, 2024, 5:01 PM
1 point
0
in reply to: tailcalled’s comment on: Why I’m bearish on mechanistic interpretability: the shards are not in the network
In the same way that cells were understood to be indivisible, atomic units of biology hundreds of years ago—before the discovery of sub-cellular structures like organelles, proteins, and DNA—we currently understand features to be fundamental units of neural network representations that we are examining with tools like mechanistic interpretability.

This is not to say that the definition of what constitutes a “feature” is clear at all—in fact, its lack of consensus reflects the extremely immature (but exciting!) state of interpretability research today. I am not claiming that this is a pure bijection; in fact, one of the pivotal ways in which mechanistic interpretability and biology diverge is the fact that defining and understanding feature emergence will most definitely come outside of simple model decomposition into weight + activation spaces (for example, understanding dataset-dependent computation flow as you mentioned above). In contrast, most of biology’s advancement has come from decomposing cellular complexity into smaller and smaller pieces.
I suspect this will not be the final story for interpretability, but it is mechanistic interpretability is an interesting first chapter.

rajathsalegame Sep 24, 2024, 4:47 AM
1 point
0
on: Model evals for dangerous capabilities
Out of curiosity, do you have any thoughts on the importance / feasibility of formal verification / mathematically “provable” safety based approaches in these evals you mention?

rajathsalegame Sep 24, 2024, 4:30 AM
1 point
0
in reply to: tailcalled’s comment on: Why I’m bearish on mechanistic interpretability: the shards are not in the network
I would argue that the AI equivalent of these tiny organisms are “features,” which are just beginning to be defined in a structured, mathematical way.

rajathsalegame Sep 20, 2024, 3:39 AM
1 point
−2
on: Laziness death spirals
This was an interesting read and points to a simple truth that I think is often forgotten: Newton’s first law applies to basically everything in life, not just physical systems. The “resets” you describe are definitely valid but by no means a comprehensive list of “opposing” forces that can help drive you in the other direction to reverse your momentum (in a positive way). The two other main ones that I believe are missing, yet fundamental are:

- Diet: the food we eat affects our mental/emotional tendencies to procrastinate vs get things done through pretty intricate biological + neuroscientific mechanisms
- Exercise: similar to diet, but perhaps harder to get started
All these can be thought of as different “forces” that can influence our momentum in one way or the other. Side note: it would be interesting to develop some sort of grounded pyschological theory on how different external stimuli affect our mindspace. Some of it is covered in the Vedas (https://en.wikipedia.org/wiki/Gu%E1%B9%87a).

rajathsalegame Sep 20, 2024, 3:19 AM
1 point
0
in reply to: tailcalled’s comment on: Why I’m bearish on mechanistic interpretability: the shards are not in the network
On second thought, I agree that gazing at the cosmos is not a fair comparison: rather, I would compare mechanistic interpretability to the early experiments of the Dutch microbiologist van Leeuwenhoek as he first looked at protozoa and bacteria under a microscope.. They weren’t the most accurate or informative experiments in the large scheme of things, but they were necessary for others to develop a more sophisticated understanding of biology.

It’s very likely that the field of mechanistic interpretability will grow beyond simply examining weights in a model, to higher order understandings of the computational flow within a model (gradient descent and itself data were mentioned in this thread)--I agree that simply examining weights/activations is not a sufficient paradigm for understanding neural computation—but it is a start.

rajathsalegame Sep 18, 2024, 2:00 PM
1 point
0
on: Why I’m bearish on mechanistic interpretability: the shards are not in the network
It would be perverse to try to understand a king in terms of his molecular configuration, rather than in the contact between the farmer and the bandit. The molecules of the king are highly diminished phenomena, and if they have information about his place in the ecology, that information is widely spread out across all the molecules and easily lost just by missing a small fraction of them.
Agreed, but in the same vein that empirical observations and low-tech experiments gazing at the cosmos laid the foundation upon which we were able to build grander and more complex theories of the universe, it would be premature to claim that this line of inquiry will not give us future mechanistic theories that are profound in nature. I am in agreement that these tools, at least at the moment, are largely frivolous and feature-specific without capturing more abstract notions of reality.

That being said, in terms of timescales, we are in a pre-Newtonian era, where we lack even basic, albeit fundamental laws for understanding how these models work.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer