MadHatter comments on Hard-Coding Neural Computation

MadHatter Dec 14, 2021, 1:00 AM
1 point
0
Thanks for your comments/questions, they’re very insightful.
In general, there are as many encoding spaces in a Transformer as there are computational nodes, and a traditional Transformer will have little incentive to use the same semantics for any two of the spaces. (There’s a little bit of an incentive because of the residual connections, which will (I think?) kind of tie the semantics of the various hidden-size-sized embeddings spaces.)
In particular, the middle layer of the dense-relu-dense feedforward layer is usually chosen to be significantly larger (4x) than the hidden size, and so it’s not even theoretically possible to represent it using the same basis. I’ve found that it sometimes makes sense to use anonymous seme names like x1 x2 x3 etc in the feed-forward layer for this reason. In my experience so far I’ve found the feed-forward layers to be most useful for conjunctions and disjunctions—and there are a quadratic number of possible conjunctions and disjunctions of even two neurons, let alone 3 or 4. So it seems to me that this might give a tiny hint as to why people have found that the intermediate embedding space of the feed-forward layer needs to be so large.
Of course, there is a potentially huge gap between what I am clever enough to think of as a use for them and what good old gradient descent is clever enough to think of. We can only easily lower-bound the potential uses of them; upper-bounding the capabilities of a component will prove much more challenging.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer