Oh nice, another post I don’t need to write anymore :-D
Some disjointed thoughts on this I had:
Feedback loops can be characterized along at least three axes:
Speed: How quickly you get feedback from actions you take. Archery has a very fast feedback loop: You shoot an arrow and one or two seconds later you see what the outcome is.
Noise: How noisy the feedback is. High-frequency trading has fast feedback loops, but they have a lot of noise, and finding the signal is the difficult part.
Richness: How much information you’re getting. Dating is one example: Online dating has extremely poor feedback loops: only a couple of bits (did the other person respond, what did they respond) per interaction, while talking & flirting with people in person has extremely rich feedback (the entire visual+acustic+sensory field (plus perhaps smell? Don’t know much about human pheromones))—probably kilobytes per minimal motor-action, and megabytes per second.
Fast & low-noise & rich feedback loops are the best, and improving the feedback loop in any of those dimensions is super valuable.
As an example, forecasting has meh feedback loops: they can be very slow (days at least, but more likely months or years (!)), the feedback is kind of poor (only a few bits per forecast), but at least there’s not that much noise (you forecast what the question says, but maybe this is why forecasters really don’t like questions resolving on technicalities—the closest thing to noise).
But one can improve the richness of the forecasting feedback loop by writing out ones reasoning, so one can update on the entire chain of thought once the resolution comes. Similarly, programming has much better feedback loops than mathematics, which is why I’d recommend that someone learn programming before math (in general learn things with fast & rich feedback loops earlier and slow & poor ones later).
Also, feedback loops feel to me like they’re in the neighbourhood of both flow & addiction? Maybe flow is a feedback loop with a constant or increasing gradient, while addiction is a feedback loop with a decreasing gradient (leading into a local & shallow minimum).
When I started reading the Sequences, I started doing forecasting on Metaculus within 3 months (while still reading them). I think being grounded at that time in actually having to do reasoning with probabilities & receiving feedback in the span of weeks made the experience of reading the Sequences much more lasting to me. I also think that the lack of focus on any rationality verification made it significantly harder to develop an art of rationality. If you have a metric you have something to grind on, even if you abandon it later.
Corollary: It’d probably be much better if people used proof assistants like Lean or Coq to teach mathematics. There is some overhead, sure, but they have much better feedback loops than normal mathematics.
As someone who is currently getting a PhD in mathematics I wish I could use Lean. The main problem for me is that the area I work in hasn’t been formalized in Lean yet. I tried for like a week, but didn’t get very far… I only managed to implement the definition of Poisson point process (kinda). I concluded that it wasn’t worth spending my time to create this feedback loop and I’d rather work based on vibes.
I am jealous of the next generation of mathematicians that are forced to write down everything using formal verification. They will be better than the current generation.
In a slightly different direction than proof assistants, I’m reminded of Terry Tao’s recent experience trying out GPT-4 to play the role of collaborator:
As I noted at this MathOverflow answer (with a concurrence by Bill Thurston), one of the most intellectually satisfying experiences as a research mathematician is interacting at the blackboard with one or more human co-authors who are exactly on the same wavelength as oneself while working collaboratively on the same problem. I do look forward to the day that I can have a similar conversation with an AI attuned to my way of thinking, or (in the more distant future) talking to an attuned AI version of a human colleague when that human colleague is not available for whatever reason. (Though in the latter case there are some non-trivial issues regarding security, privacy, intellectual property, liability, etc. that would likely need to be resolved first before such public AI avatars could be safely deployed.)
I have experimented with prompting GPT-4 to play the role of precisely such a collaborator on a test problem, with the AI instructed to suggest techniques and directions rather than to directly attempt solve the problem (which the current state-of-the-art LLMs are still quite terrible at). Thus far, the results have been only mildly promising; the AI collaborator certainly serves as an enthusiastic sounding board, and can sometimes suggest relevant references or potential things to try, though in most cases these are references and ideas that I was already aware of and could already evaluate, and were also mixed in with some less relevant citations and strategies. But I could see this style of prompting being useful for a more junior researcher, or someone such as myself exploring an area further from my own area of expertise. And there have been a few times now where this tool has suggested to me a concept that was relevant to the problem in a non-obvious fashion, even if it was not able to coherently state why it was in fact relevant. So while it certainly isn’t at the level of a genuinely competent collaborator yet, it does have potential to evolve into one as the technology improves (and is integrated with further tools, as I describe in my article).
I could feed GPT-4 the first few PDF pages of a recent math preprint and get it to generate a half-dozen intelligent questions that an expert attending a talk on the preprint could ask. I plan to use variants of such prompts to prepare my future presentations or to begin reading a technically complex paper. Initially, I labored to make the prompts as precise as possible, based on experience with programming or scripting languages. Eventually the best results came when I unlearned that caution and simply threw lots of raw text at the AI. …
I now routinely use GPT-4 to answer casual and vaguely phrased questions that I would previously have attempted with a carefully prepared search-engine query. I have asked it to suggest first drafts of complex documents I had to write.
Which isn’t to say that his experience has been all positive; the usual hallucination issues still crop up:
Current large language models (LLM) can often persuasively mimic correct expert response in a given knowledge domain (such as my own, research mathematics). But as is infamously known, the response often consists of nonsense when inspected closely. Both humans and AI need to develop skills to analyze this new type of text. The stylistic signals that I traditionally rely on to “smell out” a hopelessly incorrect math argument are of little use with LLM-generated mathematics. Only line-by-line reading can discern if there is any substance. Strangely, even nonsensical LLM-generated math often references relevant concepts. With effort, human experts can modify ideas that do not work as presented into a correct and original argument.
One related direction where some progress is likely to be made in the near future is in using LLMs to semi-automate some aspects of formalizing a mathematical proof in a formal language such as Lean; see this recent talk by Jason Rute for a survey of the current state of the art. There are already some isolated examples in which a research paper is submitted in conjunction with a formally verified version of the proofs, and these new tools may make this practice more common. One could imagine journals offering an expedited refereeing process for such certified submissions in the near future, as the referee is freed to focus on other aspects of the paper such as exposition and impact.
I think it would be interesting to discover/teach general heuristics on how to improve each axis respectively.
A side note here: I have always thought about why people (myself included) prefer to do certain things (such as playing video games/watching movies/etc) to doing other things (such as reading research papers, solving novel problems in science, etc). When I play a strategy game I am solving problems and when I am doing AI research I am still solving problems. I love doing them both, but there is a part of me (The devil side ?) that prefers to play strategy games to doing AI research for my master’s. Even though both require intense focus and utilize a lot of system 2 thinking, meaning need a lot will power. I have been thinking about this a lot. I think this is mainly because the feedback loop for playing a strategy game is:
Faster because each match takes about 1 − 2 hours in contrast to research which takes at least a day to see the results of a single experiment.
Less Noisy because you directly experience the results of your actions. This is even more true in FPS games. This is more true in 1v1 video games compared to team based video games and is in contrast to academic research where your result is dependent on open-source software, coordination with your professor, academic politics, etc.
More rich because there are tons of video game graphics involved. e.g: You can see your score going up or down directly and don’t need to track it yourself. In contrast to research where you have to track everything yourself and the graphics and diagrams are pretty dull.
I think people working in video game design with the goal of making video games that are more rewarding and addictive can provide some insights into heuristics for improving each of the axes.
Oh nice, another post I don’t need to write anymore :-D
Some disjointed thoughts on this I had:
Feedback loops can be characterized along at least three axes:
Speed: How quickly you get feedback from actions you take. Archery has a very fast feedback loop: You shoot an arrow and one or two seconds later you see what the outcome is.
Noise: How noisy the feedback is. High-frequency trading has fast feedback loops, but they have a lot of noise, and finding the signal is the difficult part.
Richness: How much information you’re getting. Dating is one example: Online dating has extremely poor feedback loops: only a couple of bits (did the other person respond, what did they respond) per interaction, while talking & flirting with people in person has extremely rich feedback (the entire visual+acustic+sensory field (plus perhaps smell? Don’t know much about human pheromones))—probably kilobytes per minimal motor-action, and megabytes per second.
Fast & low-noise & rich feedback loops are the best, and improving the feedback loop in any of those dimensions is super valuable.
As an example, forecasting has meh feedback loops: they can be very slow (days at least, but more likely months or years (!)), the feedback is kind of poor (only a few bits per forecast), but at least there’s not that much noise (you forecast what the question says, but maybe this is why forecasters really don’t like questions resolving on technicalities—the closest thing to noise).
But one can improve the richness of the forecasting feedback loop by writing out ones reasoning, so one can update on the entire chain of thought once the resolution comes. Similarly, programming has much better feedback loops than mathematics, which is why I’d recommend that someone learn programming before math (in general learn things with fast & rich feedback loops earlier and slow & poor ones later).
Also, feedback loops feel to me like they’re in the neighbourhood of both flow & addiction? Maybe flow is a feedback loop with a constant or increasing gradient, while addiction is a feedback loop with a decreasing gradient (leading into a local & shallow minimum).
When I started reading the Sequences, I started doing forecasting on Metaculus within 3 months (while still reading them). I think being grounded at that time in actually having to do reasoning with probabilities & receiving feedback in the span of weeks made the experience of reading the Sequences much more lasting to me. I also think that the lack of focus on any rationality verification made it significantly harder to develop an art of rationality. If you have a metric you have something to grind on, even if you abandon it later.
Corollary: It’d probably be much better if people used proof assistants like Lean or Coq to teach mathematics. There is some overhead, sure, but they have much better feedback loops than normal mathematics.
As someone who is currently getting a PhD in mathematics I wish I could use Lean. The main problem for me is that the area I work in hasn’t been formalized in Lean yet. I tried for like a week, but didn’t get very far… I only managed to implement the definition of Poisson point process (kinda). I concluded that it wasn’t worth spending my time to create this feedback loop and I’d rather work based on vibes.
I am jealous of the next generation of mathematicians that are forced to write down everything using formal verification. They will be better than the current generation.
In a slightly different direction than proof assistants, I’m reminded of Terry Tao’s recent experience trying out GPT-4 to play the role of collaborator:
Terry sounded more enthusiastic here:
Which isn’t to say that his experience has been all positive; the usual hallucination issues still crop up:
And going back to proof assistants:
Consider sharing more details at the Lean Zulip chat—they enjoy hearing about unformalized areas.
I think it would be interesting to discover/teach general heuristics on how to improve each axis respectively.
A side note here: I have always thought about why people (myself included) prefer to do certain things (such as playing video games/watching movies/etc) to doing other things (such as reading research papers, solving novel problems in science, etc). When I play a strategy game I am solving problems and when I am doing AI research I am still solving problems. I love doing them both, but there is a part of me (The devil side ?) that prefers to play strategy games to doing AI research for my master’s. Even though both require intense focus and utilize a lot of system 2 thinking, meaning need a lot will power. I have been thinking about this a lot. I think this is mainly because the feedback loop for playing a strategy game is:
Faster because each match takes about 1 − 2 hours in contrast to research which takes at least a day to see the results of a single experiment.
Less Noisy because you directly experience the results of your actions. This is even more true in FPS games. This is more true in 1v1 video games compared to team based video games and is in contrast to academic research where your result is dependent on open-source software, coordination with your professor, academic politics, etc.
More rich because there are tons of video game graphics involved. e.g: You can see your score going up or down directly and don’t need to track it yourself. In contrast to research where you have to track everything yourself and the graphics and diagrams are pretty dull.
I think people working in video game design with the goal of making video games that are more rewarding and addictive can provide some insights into heuristics for improving each of the axes.