Reading through Backdoors as an analogy for deceptive alignment prompted me to think about a LW feature I might be interested in. I don’t have much math background, and have always found it very effortful to parse math-heavy posts. I expect there are other people in a similar boat.
In modern programming IDEs it’s common to have hoverovers for functions and variables, and I think it’s sort of crazy that we don’t have that for math. So, I’m considering a LessWrong feature that:
takes in a post (i.e. when you save or go to publish a draft)
identifies the LaTeX terms in the post
creates a glossary for what each term means. (This should probably require confirmation by the author)
makes a hoverover for each term so when you mouseover it reminds you.
On “Backdoors”, I asked the LessWrong-integrated LLM: “what do the Latex terms here mean”?
It replied :
The LaTeX symbols in this passage represent mathematical notations. Let me explain each of them:
F : This represents a class of functions. The curly F denotes that it’s a set or collection of functions.
f∈F : This means that f is a function that belongs to (is an element of) the class F.
f∗ : The asterisk superscript typically denotes a modified or special version of something. Here, it represents the backdoored function.
ε : This Greek letter (epsilon) is often used to denote a small quantity or tolerance.
x∗ : Similar to f∗, this represents a special input, in this case, the backdoor trigger.
These symbols are used to formally describe the game between the attacker and defender:
F is the set of all possible functions the attacker can choose from.
f is the original function chosen by the attacker.
f∗ is the backdoored function created by the attacker.
ε is the tolerance for how similar f and f∗ must be on most inputs.
x∗ is the backdoor trigger input, on which f and f∗ behave differently.
This looks at least pretty close to the intended meaning, but curious if it feels like it’s wrong or missing nuance in any places @Jacob_Hilton@Mark Xu. Also curious whether this would be helpful to people.
I didn’t look at this case in particular. In my experience, LLMs are quite good at regurgitating definitions from long math texts—which otherwise can take some effort to find by hand.
This does seem pretty cool. I also think it wouldn’t be too hard to personalize the LLM to whatever level of math background you do have. So the things that you wrote make sense with a basic undergraduate background, but people with more formal math experience might want something that is more tailored to their higher-level understanding.
This does make confirmation by the author harder, which I do think seems kind of important.
In the examples above, there two pieces of “here’s what this terminology typically means” (which seems less useful if you’re already quite familiar), and “here’s what it represents in this context”. Would this be as simple as letting mathematicians toggle the first part off?
I’ve fantasized about a good version of this feature for math textbooks since college—would be excited to beta test or provide feedback about any such things that get explored! (I have a couple math-heavy posts I’d be down to try annotating in this way.)
Other related features that’d be easy to incorporate into this are John’s ideas from here:
Imagine a tool in which I write out mathematical equations on the left side, and an AI produces prototypical examples, visuals, or stories on the right, similar to what a human mathematician might do if we were to ask what the mathematician were picturing when looking at the math. (Presumably the interface would need a few iterations to figure out a good way to adjust the AI’s visualization to better match the user’s.)
Imagine a similar tool in which I write on the left, and on the right an AI produces pictures of “what it’s imagining when reading the text”. Or predicted emotional reactions to the text, or engagement level, or objections, etc.
Debugger functionality in some IDEs shows variable-values next to the variables in the code. Imagine that, except with more intelligent efforts to provide useful summary information about the variable-values. E.g. instead of showing all the values in a big tensor, it might show the dimensions. Or it might show Fermi estimates of runtime of different chunks of the code.
Similarly, in an environment for writing mathematics, we could imagine automated annotation with asymptotic behavior, units, or example values. Or a sidebar with an auto-generated stack trace showing how the current piece connects to everything else I’m working on.
I think those would also be pretty useful, including for people writing the math-heavy posts.
Reading through Backdoors as an analogy for deceptive alignment prompted me to think about a LW feature I might be interested in. I don’t have much math background, and have always found it very effortful to parse math-heavy posts. I expect there are other people in a similar boat.
In modern programming IDEs it’s common to have hoverovers for functions and variables, and I think it’s sort of crazy that we don’t have that for math. So, I’m considering a LessWrong feature that:
takes in a post (i.e. when you save or go to publish a draft)
identifies the LaTeX terms in the post
creates a glossary for what each term means. (This should probably require confirmation by the author)
makes a hoverover for each term so when you mouseover it reminds you.
On “Backdoors”, I asked the LessWrong-integrated LLM: “what do the Latex terms here mean”?
It replied :
This looks at least pretty close to the intended meaning, but curious if it feels like it’s wrong or missing nuance in any places @Jacob_Hilton @Mark Xu. Also curious whether this would be helpful to people.
Wow! This would be awesome!
I didn’t look at this case in particular. In my experience, LLMs are quite good at regurgitating definitions from long math texts—which otherwise can take some effort to find by hand.
The LLM output looks correct to me.
This does seem pretty cool. I also think it wouldn’t be too hard to personalize the LLM to whatever level of math background you do have. So the things that you wrote make sense with a basic undergraduate background, but people with more formal math experience might want something that is more tailored to their higher-level understanding.
This does make confirmation by the author harder, which I do think seems kind of important.
In the examples above, there two pieces of “here’s what this terminology typically means” (which seems less useful if you’re already quite familiar), and “here’s what it represents in this context”. Would this be as simple as letting mathematicians toggle the first part off?
I’ve fantasized about a good version of this feature for math textbooks since college—would be excited to beta test or provide feedback about any such things that get explored! (I have a couple math-heavy posts I’d be down to try annotating in this way.)
Along the same lines, I found this analogy by concrete example exceptionally elucidative.
That seems like it’d be very helpful, yes!
Other related features that’d be easy to incorporate into this are John’s ideas from here:
I think those would also be pretty useful, including for people writing the math-heavy posts.