EllenaR

Karma: 122

EllenaR Oct 1, 2023, 5:39 PM
1 point
0
in reply to: Yang Yang’s comment on: Interpreting OpenAI’s Whisper
Working on that one—the code is not in a shareable state yet but I will link a notebook here once it is!

EllenaR Oct 1, 2023, 5:38 PM
1 point
0
in reply to: WCargo’s comment on: Interpreting OpenAI’s Whisper
I wouldn’t expect an LLM to do this. An LLM wants to predict the most likely next word, so is going to assign high probabilities to semantically similar words (hence why they are clustered in embedding space). Whisper is trying to do speech-to-text, so as well as needing to know about semantic similarity of words it also needs to know about words that sound the same. Eg if it thinks it heard ‘rug’, it is pretty likely that the person speaking actually said ‘mug’ hence these words are clustered. Does that make sense?

EllenaR Oct 1, 2023, 5:34 PM
7 points
0
in reply to: Algon’s comment on: Interpreting OpenAI’s Whisper
1. Not found any yet but that certainly doesn’t mean there aren’t any!
2. As per my reply to Neel’s comment below yes—most heads ~(4/6) per layer are highly localized and you can mask the attention window with no degradation to performance. A few per layer are responsible for all the information mixing between sequence positions. Re Source vs Destination, as per language model interp destination is the ‘current’ sequence position and source are the position it is attending to. 3a) Didn’t look into this—I think Whisper does speaker diarization but quite badly so I would imagine so b) Either it hallucinates or it just transcribes one speaker
3. Either no transcript or hallucinations (eg makes up totally unrelated text)
4. What would be the purpose of this? - If you mean stitch together the Whisper encoder plus LLama as the decoder then fine-tune the decoder for a specific task this would be very easy (assuming you have enough compute and data)

EllenaR Oct 1, 2023, 5:18 PM
2 points
0
in reply to: Neel Nanda’s comment on: Interpreting OpenAI’s Whisper
Re other layers in the encoder: There are only 4 layers in Whisper tiny, couldn’t find any ‘listenable’ features in the earlier layers 0,1 so I’m guessing they activate more on frequency patterns than human recognisable sounds. Simple linear probes trained on layers 2 and 3 suggest they learn language features (eg is_french) and is_speech. Haven’t looked into it any more than that though.

Re localisation of attention - ‘I’d predict that most but not all encoder heads are highly localised’ - this looks true when you look at the attn patterns per head. As you said most heads (4/6) in each layer are highly localised—you can mask them up to k=10. But there are 1 or 2 heads in each layer that are not so localized and are responsible for the degradation seen when you mask them.

Interpreting OpenAI’s Whisper

EllenaRSep 24, 2023, 5:53 PM

116 points

13 comments7 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer