Some realizations about memory and learning I’ve been thinking about recently
EDIT: here are some great posts on memory which are a deconfused version of this shortform (and written by EY’s wife!)
Anki (and SRS in general) is a tool for efficiently writing directed graph edges to the brain. thinking about encoding knowledge as a directed graph can help with making good Anki cards.
Memory techniques are somewhat-analogous to data structures as well, e.g. the link method corresponds to a doubly linked list
“Memory techniques” should be called “Memory principles” (or even laws).
The “Code is Data” concept makes me realize memorization is more widely applicable, you could e.g. memorize the algorithm for integration in calculus. Many “creative” processes like integration can be reduced to an algorithm.
Truely part of you is not orthogonal to memory techniques principles, it uses the fact that a densely connected graph is less likely to be disconnected from randomly deleting edges, similar to how the link and story methods. Just because you aren’t making silly images doesn’t mean you aren’t using the principles.
(untested idea for math): Journal about your thought processes after solving each problem, then generalize to form a problem solving algorithm / checklist and memorize the algorithm
I’ll write some posts when I get stuff working, I feel a Sense That More Is Possible in this area, but I don’t want to write stuff till I can at least say it works well for me.
In a recent post, John mentioned how Corrigability being a subset of Human Values means we should consider using Corrigability as an alignment target. This is a useful perspective, but I want to register that X⊆Y doesn’t always imply that doing X is “easier” than Y, this is similar to the problems with problem factorization for alignment but even stronger! Even if we only want to solve X and not Y,Xcan still be harder!
For a few examples of this:
Acquiring half a Strawberry by itself is harder than a full strawberry (You have to get a full strawberry, then cut it in half) (This holds for X=MacBook, Person, Penny too)
Let L be a lemma used in a proof of T (meaning L⊆T in some sense). It may be that T can be immediately proved via a known more general theorem T′. In this case L is harder to directly prove then T.
When writing an essay, writing section 3 alone can be harder than writing the whole essay, because it interacts with the other parts, you learn from writing the previous parts, etc. (Sidenote: There’s a trivial sense[1] in which writing section 3 can be no harder than writing the whole essay, but in practice we don’t care as the whole point to considering a decomposition is to do better.)
In general, depending on how “natural” the subproblem in the factorization is, subproblems can be harder than solving the original problem. I believe this may (30%) be the case with corrigibility; mainly because (1) Corrigability is anti-natural in some ways, and (2) Humans are pretty good at human values while being not-that-corrigible.
I believe this may (30%) be the case with corrigibility
Surprising agreement with my credence! First skim I thought “Uli isn’t thinking correctly about how humans may have an explicit value for corrigible things, so if humans have 10B values, and we have an adequate value theory, solving corrigibility only requires searching for 1 value in the brain, while solving value-alignment requires searching for 10B values”, and I decided I thought this class of arguments brings something roughly corresponding to ‘corrigibility is easier’ to 70%. But then I looked at your credence, and turned out we agreed.
Mmm, I think it matters a lot which of the 10B[1] values is harder to instill, I think most of the difficulty is in corrigibility. Strong corrigibility seems like it basically solves alignment. If this is the case then corrigibility is a great thing to aim for, since it’s the real “hard part” as opposed to random human values. I’m ranting now though… :L
I think it’s way less than 10B, probably <1000 though I haven’t thought about this much and don’t know what you’re counting as one “value” (If you mean value shard maybe closer to 10B, if you mean human interpretable value I think <1000)
I theorize this is a good way to build mathematical maturity, at least the “parse advanced math” part. I remember when I became mathematically mature enough to read Math Wikipedia, I want to go further in this direction till I can read math-y papers like Wikipedia.
This seems like an interesting idea. I have this vague sense that if I want to go into alignment I should know a lot of maths, but when I ask myself why, the only answers I can come up with are:
Because people I respect (Eliezer, Nate, John) seem to think so (BAD REASON)
Because I might run into a problem and need more maths to solve it (Not great reason since I could learn the maths I need then)
Because I might run into a problem and not have the mathematical concepts needed to even recognise it as solvable or to reduce it to a Reason 2 level problem (Good reason)
I wonder if reading a book or two like that would provide a good amount of benefit towards Reason 3 without requiring years of study.
#3 is good. another good reason is so you have enough mathematical maturity to understand fancy theoretical results.
I’m probably overestimating the importance of #4, really I just like having the ability to pick up a random undergrad/early-grad math book and understand what’s going on, and I’d like to extend that further up the tree :)
Quantum computing since Democritus is great, I understand Godel’s results now! And a bunch of complexity stuff I’m still wrapping my head around.
The Road to Reality is great, I can pretend to know complex analysis after reading chapters 5,7,8 and most people can’t tell the difference! Here’s a solution to a problem in chapter 7 I wrote up.
I’ve only skimmed parts of the Princeton guides, and different articles are written by different authors—but Tao’s explanation of compactness (also in the book) is fantastic, I don’t remember specific other things I read.
Started reading “All the math you missed” but stopped before I got to the new parts, did review linear algebra usefully though. Will definitely read more at some point.
I read some of The Napkin’s guide to Group Theory, but not much else. Got a great joke from it:
PS: It seems like a good idea for alignment people (like Steve) to have the capacity to run novel human-brain experiments like these. If we don’t currently have this capacity, well… Free dignity points to be gained :)
I’m going to argue meditation/introspection skill is a key part of an alignment researcher’s repituaire. I’ll start with a somewhat fake Taxonomy of approaches to understanding intelligence/agency/value formation
Study artificial intelligences
From outside (run experiments, predict results, theorize)
From inside (interpretability)
Study biological intelligences
From outside (psychology experiments, theorizing about human value & intelligence formation, study hypnosis[1])
From inside (neuroscience, meditation, introspection)
I believe introspection/meditation is a neglected way to study intelligence among alignment researchers
You can run experiments on your own mind at any time. Lots of experimental bits free for the taking
I expect interviewing high level meditators to miss most of the valueble illegible intuitions (both from lack of direct experience, and lacking the technical knowledge to integrate that experience with)
It has known positive side effects like improved focus, reduced stress etc (yes it is possible to wirehead, I believe it to be worth the risk if you’re careful though.)
I threw this out offhand, unsure if it’s a good idea, but maybe figuring out hypnosis will teach us something about the mind? (Also hypnosis could be in the “outside” or “inside” category.)
I disagree with your stated reasons for meditating (I have meditated a bunch and believe it to be valuable, but not worth the time if valence is not one of your top priorities).
Feedback is less clear (not so many experimental bits), experiments are repeated very often (and often at least I don’t have the focus to think about better methods), I expect positive side-effects to be very small due to the Algernon argument.
I think I would attribute having become better at the 12th virtue of rationality (the void) due to meditation practice, but this is quite hard to tell. Maybe also better at not fooling myself in social situations, slightly less consequentialist/analytical in macro-thinking (but no noticeable decrease in micro-thinking such as programming), and more adept at understanding æsthetics.
Thanks for the comment. I agree valence should be the top priority & that cognitive gains are unlikely.
The main thing I was pointing at is “surely there’s useful bits about intelligence to be gathered from inside said intelligence, and people don’t seem to be putting much effort here”, but on reflection the last part seems wrong. Some neuroscientists are enlightened and haven’t figured everything out yet.
Your experience is interesting, I also want to get better at the void :)
Use maiachess to get a probability distribution over opponent moves based on their ELO. for extra credit fine-tune on that specific player’s past games.
Compute expectiminimax search over maia predictions. Bottom out with stockfish value when going deeper becomes impractical. (For MVP bottom out with stockfish after a couple ply, no need to be fancy.) Also note: We want to maximize (P(win)) not centipawn advantage.
For extra credit, tune hyperparameters via self-play against maia (simulated human). Use lichess players as a validation set.
Petition to rename “noticing confusion” to “acting on confusion” or “acting to resolve confusion”. I find myself quite good at the former but bad at the latter—and I expect other rationalists are the same.
For example: I remember having the insight thought leading to lsusr’s post on how self-reference breaks the orthogonality thesis, but never pursued the line of questioning since it would require sitting down and questioning my beliefs with paper for a few minutes, which is inconvenient and would interrupt my coding.
Exercise: What mistake is the following sentiment making?
If there’s only a one in a million chance someone can save the world, then there’d better be well more than a million people trying.
Answer:
The whole challenge of “having a one in a million chance of saving the world” is the wrong framing, the challenge is having a positive impact in the first case (for example: by not destroying the world or making things worse, e.g. from s-risks). You could think of this as a setting the zero point thing going on, though I like to think of it in terms of Bayes and Pascel’s wagers:
In terms of Bayes: You’re fixating on the expected value contributed from 10−6(BIG) and ignoring the rest of the 1−10−6 hypothesis space. In most cases, there are corresponding low probability events which “cancel out” the EV contributed from 10−6(BIG)’s direct reasoning.
(I will also note that, empirically, it could be argued Eliezer was massively net-negative from a capabilities advancements perspective; having causal links to founding of deepmind & openai. I bring this up to point out how nontrivial having a positive impact at all is, in a domain like ours)
Are most uncertainties we care about logical rather than informational? All empirical ML experiments are pure computations a Bayesian superintelligence could do in its head. How much of our uncertainty comes from computational limits in practice, versus actual information bottlenecks?
The cross-entropy is defined as the expected surprise when drawing from p(x), which we’re modeling as q(x). Our map is q(x) while p(x) is the territory.
H(p,q)=∑xp(x)log1q(x)
Now it should be intuitively clear that H(p,q)≥H(p,p) because an imperfect model q(x) will (on average) surprise us more than the perfect model p(x).
To measure unnecessary surprise from approximating p(x) by q(x) we define
DKL(p∥q)=H(p,q)−H(p,p)
This is KL-divergence! The average additional surprise from our map approximating the territory.
Now it’s time for an exercise, in the following figure q∗(x) is the Gaussian that minimizes DKL(p∥q) or DKL(q∥p), can you tell which is which?
Left is minimizing DKL(p∥q) while the right is minimizing DKL(q∥p).
Reason as follows:
If p is the territory then the left q∗ is a better map (of p) than the right q∗.
If p is the map, then the territory q∗ on the right leads to us being less surprised than the territory on the left, because on the on left p will be very surprised at data in the middle, despite it being likely according to the territory q∗.
On the left we fit the map to the territory, on the right we fit the territory to the map.
Some realizations about memory and learning I’ve been thinking about recently EDIT: here are some great posts on memory which are a deconfused version of this shortform (and written by EY’s wife!)
Anki (and SRS in general) is a tool for efficiently writing directed graph edges to the brain. thinking about encoding knowledge as a directed graph can help with making good Anki cards.
Memory techniques are somewhat-analogous to data structures as well, e.g. the link method corresponds to a doubly linked list
“Memory techniques” should be called “Memory principles” (or even laws).
The “Code is Data” concept makes me realize memorization is more widely applicable, you could e.g. memorize the algorithm for integration in calculus. Many “creative” processes like integration can be reduced to an algorithm.
Truely part of you is not orthogonal to memory
techniquesprinciples, it uses the fact that a densely connected graph is less likely to be disconnected from randomly deleting edges, similar to how the link and story methods. Just because you aren’t making silly images doesn’t mean you aren’t using the principles.(untested idea for math): Journal about your thought processes after solving each problem, then generalize to form a problem solving algorithm / checklist and memorize the algorithm
[edited]
Good comments, thanks for sharing both.
’d love to hear more about practical insights on how to get better at recalling + problem-solving.
I’ll write some posts when I get stuff working, I feel a Sense That More Is Possible in this area, but I don’t want to write stuff till I can at least say it works well for me.
= finding shortest paths on a weighted directed graph, where the shortest path cost must be below some threshold :)
In a recent post, John mentioned how Corrigability being a subset of Human Values means we should consider using Corrigability as an alignment target. This is a useful perspective, but I want to register that X⊆Y doesn’t always imply that doing X is “easier” than Y, this is similar to the problems with problem factorization for alignment but even stronger! Even if we only want to solve X and not Y, X can still be harder!
For a few examples of this:
Acquiring half a Strawberry by itself is harder than a full strawberry (You have to get a full strawberry, then cut it in half) (This holds for X=MacBook, Person, Penny too)
Let L be a lemma used in a proof of T (meaning L⊆T in some sense). It may be that T can be immediately proved via a known more general theorem T′. In this case L is harder to directly prove then T.
When writing an essay, writing section 3 alone can be harder than writing the whole essay, because it interacts with the other parts, you learn from writing the previous parts, etc. (Sidenote: There’s a trivial sense[1] in which writing section 3 can be no harder than writing the whole essay, but in practice we don’t care as the whole point to considering a decomposition is to do better.)
In general, depending on how “natural” the subproblem in the factorization is, subproblems can be harder than solving the original problem. I believe this may (30%) be the case with corrigibility; mainly because (1) Corrigability is anti-natural in some ways, and (2) Humans are pretty good at human values while being not-that-corrigible.
Just write the whole thing then throw everything else away!
Surprising agreement with my credence! First skim I thought “Uli isn’t thinking correctly about how humans may have an explicit value for corrigible things, so if humans have 10B values, and we have an adequate value theory, solving corrigibility only requires searching for 1 value in the brain, while solving value-alignment requires searching for 10B values”, and I decided I thought this class of arguments brings something roughly corresponding to ‘corrigibility is easier’ to 70%. But then I looked at your credence, and turned out we agreed.
Mmm, I think it matters a lot which of the 10B[1] values is harder to instill, I think most of the difficulty is in corrigibility. Strong corrigibility seems like it basically solves alignment. If this is the case then corrigibility is a great thing to aim for, since it’s the real “hard part” as opposed to random human values. I’m ranting now though… :L
I think it’s way less than 10B, probably <1000 though I haven’t thought about this much and don’t know what you’re counting as one “value” (If you mean value shard maybe closer to 10B, if you mean human interpretable value I think <1000)
There are a series of math books that give a wide overview of a lot of math. In the spirit of comprehensive information gathering, I’m going to try to spend my “fun math time” reading these.
I theorize this is a good way to build mathematical maturity, at least the “parse advanced math” part. I remember when I became mathematically mature enough to read Math Wikipedia, I want to go further in this direction till I can read math-y papers like Wikipedia.
This seems like an interesting idea. I have this vague sense that if I want to go into alignment I should know a lot of maths, but when I ask myself why, the only answers I can come up with are:
Because people I respect (Eliezer, Nate, John) seem to think so (BAD REASON)
Because I might run into a problem and need more maths to solve it (Not great reason since I could learn the maths I need then)
Because I might run into a problem and not have the mathematical concepts needed to even recognise it as solvable or to reduce it to a Reason 2 level problem (Good reason)
I wonder if reading a book or two like that would provide a good amount of benefit towards Reason 3 without requiring years of study.
#3 is good. another good reason is so you have enough mathematical maturity to understand fancy theoretical results.
I’m probably overestimating the importance of #4, really I just like having the ability to pick up a random undergrad/early-grad math book and understand what’s going on, and I’d like to extend that further up the tree :)
3 is my main reason for wanting to learn more pure math, but I use 1 and 2 to help motivate me
which of these books are you most excited about and why? I also want to do more fun math reading
(Note; I haven’t finished any of them)
Quantum computing since Democritus is great, I understand Godel’s results now! And a bunch of complexity stuff I’m still wrapping my head around.
The Road to Reality is great, I can pretend to know complex analysis after reading chapters 5,7,8 and most people can’t tell the difference! Here’s a solution to a problem in chapter 7 I wrote up.
I’ve only skimmed parts of the Princeton guides, and different articles are written by different authors—but Tao’s explanation of compactness (also in the book) is fantastic, I don’t remember specific other things I read.
Started reading “All the math you missed” but stopped before I got to the new parts, did review linear algebra usefully though. Will definitely read more at some point.
I read some of The Napkin’s guide to Group Theory, but not much else. Got a great joke from it:
Here’s a collection of links to recent “mind-reading” related results using deep learning. Comment ones I missed!
Decoding images in the brain via a diffusion model
Reconstructing internal language while watching videos using a language model
Brain embeddings with shared geometry to artificial contextual embeddings, as a code for representing language in the human brain
PS: It seems like a good idea for alignment people (like Steve) to have the capacity to run novel human-brain experiments like these. If we don’t currently have this capacity, well… Free dignity points to be gained :)
Online internal speech decoding from single neurons in a human participant
Semantic reconstruction of continuous language from non-invasive brain recordings
You’re probably in lesswrong docs mode, either switch to markdown or press ctrl+k to insert link around selected text
Do you know how to get out of docs mode? (nvm—got it—thanks!)
I’m going to argue meditation/introspection skill is a key part of an alignment researcher’s repituaire. I’ll start with a somewhat fake Taxonomy of approaches to understanding intelligence/agency/value formation
Study artificial intelligences
From outside (run experiments, predict results, theorize)
From inside (interpretability)
Study biological intelligences
From outside (psychology experiments, theorizing about human value & intelligence formation, study hypnosis[1])
From inside (neuroscience, meditation, introspection)
I believe introspection/meditation is a neglected way to study intelligence among alignment researchers
You can run experiments on your own mind at any time. Lots of experimental bits free for the taking
I expect interviewing high level meditators to miss most of the valueble illegible intuitions (both from lack of direct experience, and lacking the technical knowledge to integrate that experience with)
It has known positive side effects like improved focus, reduced stress etc (yes it is possible to wirehead, I believe it to be worth the risk if you’re careful though.)
What are you waiting for, get started![2]
I threw this out offhand, unsure if it’s a good idea, but maybe figuring out hypnosis will teach us something about the mind? (Also hypnosis could be in the “outside” or “inside” category.)
Or read the mind illuminated, mastering the core teachings of the Buddha, joy on demand. Better yet find a teacher.
I disagree with your stated reasons for meditating (I have meditated a bunch and believe it to be valuable, but not worth the time if valence is not one of your top priorities).
Feedback is less clear (not so many experimental bits), experiments are repeated very often (and often at least I don’t have the focus to think about better methods), I expect positive side-effects to be very small due to the Algernon argument.
I think I would attribute having become better at the 12th virtue of rationality (the void) due to meditation practice, but this is quite hard to tell. Maybe also better at not fooling myself in social situations, slightly less consequentialist/analytical in macro-thinking (but no noticeable decrease in micro-thinking such as programming), and more adept at understanding æsthetics.
Thanks for the comment. I agree valence should be the top priority & that cognitive gains are unlikely.
The main thing I was pointing at is “surely there’s useful bits about intelligence to be gathered from inside said intelligence, and people don’t seem to be putting much effort here”, but on reflection the last part seems wrong. Some neuroscientists are enlightened and haven’t figured everything out yet.
Your experience is interesting, I also want to get better at the void :)
It could still be true that on the margin experiential information about intelligence is useful.
I think60% I buy this.
Quick thoughts on creating a anti-human chess engine.
Use maiachess to get a probability distribution over opponent moves based on their ELO. for extra credit fine-tune on that specific player’s past games.
Compute expectiminimax search over maia predictions. Bottom out with stockfish value when going deeper becomes impractical. (For MVP bottom out with stockfish after a couple ply, no need to be fancy.) Also note: We want to maximize (P(win)) not centipawn advantage.
For extra credit, tune hyperparameters via self-play against maia (simulated human). Use lichess players as a validation set.
???
Profit.
People Power
To get a sense that more is possible consider
The AI box experiment, and its replication
Mentalists like Derren Brown (which is related to 1)
How the FBI gets hostages back with zero leverage (they aren’t allowed to pay ransoms)
(This is an excerpt from a post I’m writing which I may or may not publish. the link aggregation here might be useful in of itself)
Petition to rename “noticing confusion” to “acting on confusion” or “acting to resolve confusion”. I find myself quite good at the former but bad at the latter—and I expect other rationalists are the same.
For example: I remember having the insight thought leading to lsusr’s post on how self-reference breaks the orthogonality thesis, but never pursued the line of questioning since it would require sitting down and questioning my beliefs with paper for a few minutes, which is inconvenient and would interrupt my coding.
Exercise: What mistake is the following sentiment making?
Answer:
The whole challenge of “having a one in a million chance of saving the world” is the wrong framing, the challenge is having a positive impact in the first case (for example: by not destroying the world or making things worse, e.g. from s-risks). You could think of this as a setting the zero point thing going on, though I like to think of it in terms of Bayes and Pascel’s wagers:
In terms of Bayes: You’re fixating on the expected value contributed from 10−6(BIG) and ignoring the rest of the 1−10−6 hypothesis space. In most cases, there are corresponding low probability events which “cancel out” the EV contributed from 10−6(BIG)’s direct reasoning.
(I will also note that, empirically, it could be argued Eliezer was massively net-negative from a capabilities advancements perspective; having causal links to founding of deepmind & openai. I bring this up to point out how nontrivial having a positive impact at all is, in a domain like ours)
Are most uncertainties we care about logical rather than informational? All empirical ML experiments are pure computations a Bayesian superintelligence could do in its head. How much of our uncertainty comes from computational limits in practice, versus actual information bottlenecks?
KL-divergence and map territory distinction
Crosspost from my blog
The cross-entropy is defined as the expected surprise when drawing from p(x), which we’re modeling as q(x). Our map is q(x) while p(x) is the territory.
H(p,q)=∑xp(x)log1q(x)
Now it should be intuitively clear that H(p,q)≥H(p,p) because an imperfect model q(x) will (on average) surprise us more than the perfect model p(x).
To measure unnecessary surprise from approximating p(x) by q(x) we define
DKL(p∥q)=H(p,q)−H(p,p)
This is KL-divergence! The average additional surprise from our map approximating the territory.
Now it’s time for an exercise, in the following figure q∗(x) is the Gaussian that minimizes DKL(p∥q) or DKL(q∥p), can you tell which is which?
Left is minimizing DKL(p∥q) while the right is minimizing DKL(q∥p).
Reason as follows:
If p is the territory then the left q∗ is a better map (of p) than the right q∗.
If p is the map, then the territory q∗ on the right leads to us being less surprised than the territory on the left, because on the on left p will be very surprised at data in the middle, despite it being likely according to the territory q∗.
On the left we fit the map to the territory, on the right we fit the territory to the map.