Any guesses at the difficulty? My first impression was that this is not going to be solved any time soon. I just don’t think current techniques are good enough to write flawless lean code given a difficult objective.
I think grand challenges in AI are are sometimes useful, but when they are at this level I am a bit pessimistic. I don’t think this is necessarily AI-complete, but it’s perhaps close.
Can you quantify soon :) ? For example, I’d be willing to bet at 1⁄3 odds that this will be solved in the next 10 years conditional on a certain amount of effort being put in and more like 1⁄1 odds for the next 20 years. It’s hard to quantify the conditional piece but I’d cash it out as something like if researchers put in the same amount of effort into this that they put into NLP/image recognition benchmarks. I don’t think that’ll happen, so this is purely a counterfactual claim, but maybe it will help ground any subsequent discussion with some sort of concrete claim?
By soon I mean 5 years. Interestingly, I have a slightly higher probability that it will be solved within 20 years, which highlights the difficulty of saying ambiguous things like “soon.”
That is interesting! I should be clear that my odds ratios are pretty tentative given the uncertainty around the challenge. For example, I literally woke up this morning and thought that my 1⁄3 odds might be too conservative given recent progress on 8th grade science tests and theorem proving.
I created three PredictionBook predictions to track this if anyone’s interested (5 years, 10 years, 20 years).
In my MSc courses the lecturer gives proofs of important theorems, while unimportant problems are given as homework. This is bad for me, because it makes me focus on actually figuring out not too important stuff. I think it works like this because the course instructors want to • make the student do at least something and • check whether the student has learned the course material.
Ideally I would like to study using interactive textbooks where everything is a problem to solve on my own. Such a textbook wouldn’t show an important theorem’s proof right away. Instead it would show me the theorem’s statement and ask me to prove it. There should be hints available and, obviously, I should be able to see the author’s proof when I want to see it.
Also, for textbooks about Turing machines, recursive functions, and stuff like that: having an interpreter of Turing machines would be very nice. (googling Turing machine interpreter and using whatever you find is a bad idea, because they all have different flavors)
I found this to vary by field. When I studied topology and combinatorics we proved all the big important things as homework. When I studied automata theory and measure theory, we did what your teacher is doing.
Often in psychology articles I see phrases like “X is associated with Y”. These articles’ sections often read like the author thinks that X causes Y. But if they had evidence that X causes Y, surely they would’ve written exactly that. And in such cases I feel that I want to punish them, so in my mind I instead read it as “Y causes X”, just for contrarianism’s sake. Or, sometimes, I imagine what variable Z can exist which causes both X and Y. I think the latter is a useful exercise.
Examples:
It appears that some types of humor are more effective than others in reducing stress. Chen and Martin (2007) found that humor that is affiliative (used to engage or amuse others) or self-enhancing (maintaining a humorous perspective in the face of adversity) is related to better mental health. In contrast, coping through humor that is self-defeating (used at one’s own expense) or aggressive (criticizing or ridiculing others) is related to poorer mental health.
The author says that non-self-defeating non-agressive humor helps reduce stress. But notice the words “related”. For the first “related”, it seems plausible that not having a good mental health causes you to lose humor. For the second “related”, I think it’s very probable that poor mental health, such as depression and low self esteem, causes self-defeating humor.
How does humor help reduce the effects of stress and promote wellness? Several explanations have been proposed (see Figure 4.7). One possibility is that humor affects appraisals of stressful events. Jokes can help people put a less threatening spin on their trials and tribulations. Kuiper, Martin, and Olinger (1993) demonstrated that students who used coping humor were able to appraise a stressful exam as a positive challenge, which in turn lowered their perceived stress levels.
Or it could be that students, who are well prepared for the exams or simply tend to not be afraid of them, will obviously have lower perceived stress levels, and maybe will be able to think about the exams as a positive challenge, hence they″ able to joke about them in this way.
It’s possible in this example, that the original paper Kuiper, Martin, and Olinger (1993) actually did an intervention making students use humor, in which case the causality must go from humor to stress reduction. But I don’t want to look at every source, so screw you author of Psychology Applied to Modern Life (both quotes are from it) for not making it clear whether that study found causation or only correlation.
A few examples would help—the academic papers I see often call out this problem, and suggest possible Zs themselves. Generally, X and Y are more easily or precisely measured than the likely Zs, so make for better publications.
I definitely see the problem in popular articles and policy justification.
In my understanding, here are the main features of deep convolutional neural networks (DCNN) that make them work really well. (Disclaimer: I am not a specialist in CNNs, I have done one masters level deep learning course, and I have worked on accelerating DCNNs for 3 months.) For each feature, I give my probability, that having this feature is an important component of DCNN success, compared to having this feature to the extent that an average non-DCNN machine learning model has it (e.g. DCNN has weight sharing, an average model doesn’t have weight sharing).
DCNNs heavily use transformations, which are the same for each window of the input − 95%
For any set of pixels of the input, large distances between pixels in the set make the DCNN model interactions between these pixels less accurately − 90% (perhaps usage of dilution in some DCNNs is a counterargument to this)
Large depth (together with the use of activation functions) lets us model complicated features, interactions, logic − 82%
Having a lot of parameters lets us model complicated features, interactions, logic − 60%
Given 3 and 4, SGD-like optimization works unexpectedly fast for some reason − 40%
Given 3 and 4, SGD-like optimization with early stopping doesn’t overfit too much for some reason − 87% (I am not sure if S in SGD is important, and how important is early stopping)
Given 3 and 4, ReLu-like activation function works really well (compared to, for example, sigmoid).
Modern deep neural network libraries are easy to use compared to the baseline of not having specific well-developed libraries − 60%
Deep neural networks work really fast, when using modern deep neural network libraries and modern hardware − 33%
DCNNs find such features in photos, which are invisible to the human eye and to most ML algorithms − 20%
Dropout helps reducing overfitting a lot − 25%
Batch normalization improve quality of the model a lot for some reason − 15%
Batch normalization makes the optimization much faster − 32%
Skip connections (or residual connections, I am not sure if there’s a difference) help a lot − 20%
Let me make it more clear how I was assigning the probabilities and why I created this list. I am trying to come up with a tensor network based machine learning model, which would have the main advantages of DCNNs, but which would not, itself, be a deep relu neural network. So I decided to make this list to see which important components my model has.
How to download the documentation of a programming library for offline use.
On the documentation website, look for “downloads” section. Preferrably choose HTML format, because then it will be nicely searchable—I can even create a krunner web shortcut for searching it. Example: Numpy—find “HTML+zip”.
If you need the documentation hosted on https://readthedocs.io: in the bottom left press “Read the docs” a download type from “Downloads”. Search field won’t work in the HTML version, so feel free to download whatever format you like. Example: Elpy. Warning: for some libraries (e.g. more-itertools) the downloaded version is basically broken, so you should check if what you’ve downloaded is complete.
It turns out, Pytorch’s pseudorandom number generator generates different numbers on different GPUs even if I set the same random seed. Consider the following file do_different_gpus_randn_the_same.py:
Due to this, I am going to generate all pseudorandom numbers on my CPU and then transfer them to GPU for reproducibility’s sake like foo = torch.randn(500, 500, device="cpu").to("cuda").
You’re going to need to do more than that if you want full reproducibility, because GPUs aren’t even deterministic in the first place, and that is big enough to affect DRL/DL results.
Tbh what I want right now is a very weak form of reproducibility. I want the experiments I am doing nowadays to work the same way on my own computer every time. That works for me so far.
Please recommend me cheap (in brain-time and text length) ways to represent different levels of confidence in English texts and speech. More concretely, I want English words and ways to use them in sentences without modifying the sentence’s structure. Examples:
This is a reason to think that SGD-like algorithms can’t learn it. → This is a reason to think that maybe SGD-like algorithms can’t learn it. // sounds kinda bad, I wouldn’t write it like that for people who are picky about style; also, what other words work here instead of maybe?
because the inputs and the parameters are noisy → because likely the inputs and the parameters are noisy // seems unclear
Non-examples:
SGD-like algorithms can’t learn this circuit because the inputs and the parameters are noisy during training. → It’s plausible that SGD-like algorithms can’t learn this circuit, because it’s plausible that the inputs and the parameters are noisy during training. // the structure of the rest of the sentence had to be modified, also I had to spend brain-time to think about how to formulate this. also it’s very long
I am looking for a gears-level introductory course (or a textbook, or anything) in cooking. I want to cook tasty healthy food in an efficient way. I am already often able to cook tasty food, but other times I fail, and often I don’t understand what went wrong and how cooking even works.
Tim Ferriss book The 4-hour Chef does a good job at explaining the how and why and is efficiency oriented.
Recently, I however moved to the “throw things in the instant pot and let it do the cooking method of solving cooking and find it to be often more efficient then the normal way of cooking.
A new piece of math notation I’ve invented which I plan to use whenever I am writing proofs for myself (rather than for other people).
Sometimes when writing a proof, for some long property P(x) I want to write:
It follows from foo, that there exists x such that P(x). Let x be such that P(x). Then …
I don’t like that I need to write P(x) twice here. And the whole construction is too long for my liking, especially when the reason foo why such x exists is obvious. And if I omit the first sentence It follows from foo, that there exists x such that P(x). and just write
Let x be such that P(x). Then …
then it’s not clear what I mean. It could be that I want to show that such x exists and that from its existence some statement of interest follows. Or it could be that I want to prove some statement of form
For each x such that P(x), it holds that Q(x).
Or it could even be that I want to show that something follows from existence of such x, but I am not asserting that such x exists.
The new notation I came up with is to write L∃t in cases when I want to assert that such x exists and to bound the variable x in the same place. An example (an excerpt, not a complete proof):
Suppose D is a countably infinite set, suppose S is a set of subsets of D, suppose X∈S.
L∃t b be a bijection from N onto D.
L∃t step:S→S such that ∀Y∈Sstep(Y)=Y, if Y is ⊆-maximal in S, otherwise step(Y)=Y∪{b(min({n∈N∣b(n)∉Y∧Y∪{b(n)}∈S}))}.
By recursion theorem, L∃t ⟨Xn∣n∈N⟩ be such that X0=X, ∀n≥1Xn=step(Xn−1).
A competition on solving math problems via AI is coming. https://imo-grand-challenge.github.io/
The problems are from the International math olympiad (IMO)
They want to formalize all the problems in Lean (theorem prover) language. They haven’t figured out how to do that, e.g. how to formalize problems of form “determine the set of objects satisfying the given property”, as can be seen in https://github.com/IMO-grand-challenge/formal-encoding/blob/master/design/determine.lean
A contestant must submit a computer program that will take a problem’s description as input and output a solution and its proof in Lean.
I would guess that this is partly a way to promote Lean. I think it would be interesting to pose questions about this on metaculus.
Any guesses at the difficulty? My first impression was that this is not going to be solved any time soon. I just don’t think current techniques are good enough to write flawless lean code given a difficult objective.
I think grand challenges in AI are are sometimes useful, but when they are at this level I am a bit pessimistic. I don’t think this is necessarily AI-complete, but it’s perhaps close.
Can you quantify soon :) ? For example, I’d be willing to bet at 1⁄3 odds that this will be solved in the next 10 years conditional on a certain amount of effort being put in and more like 1⁄1 odds for the next 20 years. It’s hard to quantify the conditional piece but I’d cash it out as something like if researchers put in the same amount of effort into this that they put into NLP/image recognition benchmarks. I don’t think that’ll happen, so this is purely a counterfactual claim, but maybe it will help ground any subsequent discussion with some sort of concrete claim?
By soon I mean 5 years. Interestingly, I have a slightly higher probability that it will be solved within 20 years, which highlights the difficulty of saying ambiguous things like “soon.”
That is interesting! I should be clear that my odds ratios are pretty tentative given the uncertainty around the challenge. For example, I literally woke up this morning and thought that my 1⁄3 odds might be too conservative given recent progress on 8th grade science tests and theorem proving.
I created three PredictionBook predictions to track this if anyone’s interested (5 years, 10 years, 20 years).
In my MSc courses the lecturer gives proofs of important theorems, while unimportant problems are given as homework. This is bad for me, because it makes me focus on actually figuring out not too important stuff. I think it works like this because the course instructors want to • make the student do at least something and • check whether the student has learned the course material.
Ideally I would like to study using interactive textbooks where everything is a problem to solve on my own. Such a textbook wouldn’t show an important theorem’s proof right away. Instead it would show me the theorem’s statement and ask me to prove it. There should be hints available and, obviously, I should be able to see the author’s proof when I want to see it.
Also, for textbooks about Turing machines, recursive functions, and stuff like that: having an interpreter of Turing machines would be very nice. (googling Turing machine interpreter and using whatever you find is a bad idea, because they all have different flavors)
I found this to vary by field. When I studied topology and combinatorics we proved all the big important things as homework. When I studied automata theory and measure theory, we did what your teacher is doing.
Often in psychology articles I see phrases like “X is associated with Y”. These articles’ sections often read like the author thinks that X causes Y. But if they had evidence that X causes Y, surely they would’ve written exactly that. And in such cases I feel that I want to punish them, so in my mind I instead read it as “Y causes X”, just for contrarianism’s sake. Or, sometimes, I imagine what variable Z can exist which causes both X and Y. I think the latter is a useful exercise.
Examples:
The author says that non-self-defeating non-agressive humor helps reduce stress. But notice the words “related”. For the first “related”, it seems plausible that not having a good mental health causes you to lose humor. For the second “related”, I think it’s very probable that poor mental health, such as depression and low self esteem, causes self-defeating humor.
Or it could be that students, who are well prepared for the exams or simply tend to not be afraid of them, will obviously have lower perceived stress levels, and maybe will be able to think about the exams as a positive challenge, hence they″ able to joke about them in this way.
It’s possible in this example, that the original paper Kuiper, Martin, and Olinger (1993) actually did an intervention making students use humor, in which case the causality must go from humor to stress reduction. But I don’t want to look at every source, so screw you author of Psychology Applied to Modern Life (both quotes are from it) for not making it clear whether that study found causation or only correlation.
A few examples would help—the academic papers I see often call out this problem, and suggest possible Zs themselves. Generally, X and Y are more easily or precisely measured than the likely Zs, so make for better publications.
I definitely see the problem in popular articles and policy justification.
I’ve added 2 examples.
In my understanding, here are the main features of deep convolutional neural networks (DCNN) that make them work really well. (Disclaimer: I am not a specialist in CNNs, I have done one masters level deep learning course, and I have worked on accelerating DCNNs for 3 months.) For each feature, I give my probability, that having this feature is an important component of DCNN success, compared to having this feature to the extent that an average non-DCNN machine learning model has it (e.g. DCNN has weight sharing, an average model doesn’t have weight sharing).
DCNNs heavily use transformations, which are the same for each window of the input − 95%
For any set of pixels of the input, large distances between pixels in the set make the DCNN model interactions between these pixels less accurately − 90% (perhaps usage of dilution in some DCNNs is a counterargument to this)
Large depth (together with the use of activation functions) lets us model complicated features, interactions, logic − 82%
Having a lot of parameters lets us model complicated features, interactions, logic − 60%
Given 3 and 4, SGD-like optimization works unexpectedly fast for some reason − 40%
Given 3 and 4, SGD-like optimization with early stopping doesn’t overfit too much for some reason − 87% (I am not sure if S in SGD is important, and how important is early stopping)
Given 3 and 4, ReLu-like activation function works really well (compared to, for example, sigmoid).
Modern deep neural network libraries are easy to use compared to the baseline of not having specific well-developed libraries − 60%
Deep neural networks work really fast, when using modern deep neural network libraries and modern hardware − 33%
DCNNs find such features in photos, which are invisible to the human eye and to most ML algorithms − 20%
Dropout helps reducing overfitting a lot − 25%
Batch normalization improve quality of the model a lot for some reason − 15%
Batch normalization makes the optimization much faster − 32%
Skip connections (or residual connections, I am not sure if there’s a difference) help a lot − 20%
Let me make it more clear how I was assigning the probabilities and why I created this list. I am trying to come up with a tensor network based machine learning model, which would have the main advantages of DCNNs, but which would not, itself, be a deep relu neural network. So I decided to make this list to see which important components my model has.
How to download the documentation of a programming library for offline use.
On the documentation website, look for “downloads” section. Preferrably choose HTML format, because then it will be nicely searchable—I can even create a krunner web shortcut for searching it. Example: Numpy—find “HTML+zip”.
If you need pytorch, torchvision, or sklearn—simply download https://github.com/unknownue/PyTorch.docs.
If you need the documentation hosted on https://readthedocs.io: in the bottom left press “Read the docs” a download type from “Downloads”. Search field won’t work in the HTML version, so feel free to download whatever format you like. Example: Elpy. Warning: for some libraries (e.g. more-itertools) the downloaded version is basically broken, so you should check if what you’ve downloaded is complete.
In some weird cases ReadTheDocs documentation for the latest version might of a library might be unlisted in the downloads secion of ReadTheDocs. For example, if you click the readthedocs icon in the bottom right of https://click.palletsprojects.com/en/master/, you won’t find a download link for version 8.0. In this case copy the hyperlink https://media.readthedocs.org/pdf/pallets-click/latest/pallets-click.pdf or https://media.readthedocs.org/pdf/pallets-click/stable/pallets-click.pdf and replace pallets-click with the name of the project you want. It doesn’t work for all projects, but it works for some.
Use httrack to mirror the documentation website. In my experience it doesn’t take long. Do it like
$ httrack https://click.palletsprojects.com/en/7.x/
. This will download everything hosted in https://click.palletsprojects.com/en/7.x/ and will not go outside of this server directory. In this case the search field won’t work.It turns out, Pytorch’s pseudorandom number generator generates different numbers on different GPUs even if I set the same random seed. Consider the following file do_different_gpus_randn_the_same.py:
On my system, I get the following for two runs on two different GPUs:
Due to this, I am going to generate all pseudorandom numbers on my CPU and then transfer them to GPU for reproducibility’s sake like
foo = torch.randn(500, 500, device="cpu").to("cuda")
.You’re going to need to do more than that if you want full reproducibility, because GPUs aren’t even deterministic in the first place, and that is big enough to affect DRL/DL results.
Tbh what I want right now is a very weak form of reproducibility. I want the experiments I am doing nowadays to work the same way on my own computer every time. That works for me so far.
Please recommend me cheap (in brain-time and text length) ways to represent different levels of confidence in English texts and speech. More concretely, I want English words and ways to use them in sentences without modifying the sentence’s structure. Examples:
This is a reason to think that SGD-like algorithms can’t learn it. → This is a reason to think that maybe SGD-like algorithms can’t learn it. // sounds kinda bad, I wouldn’t write it like that for people who are picky about style; also, what other words work here instead of maybe?
because the inputs and the parameters are noisy → because likely the inputs and the parameters are noisy // seems unclear
Non-examples:
SGD-like algorithms can’t learn this circuit because the inputs and the parameters are noisy during training. → It’s plausible that SGD-like algorithms can’t learn this circuit, because it’s plausible that the inputs and the parameters are noisy during training. // the structure of the rest of the sentence had to be modified, also I had to spend brain-time to think about how to formulate this. also it’s very long
Not sure this is what you ask for but it seems relevant: What probability do people attach to likelihood adjectives?: https://hbr.org/2018/07/if-you-say-something-is-likely-how-likely-do-people-think-it-is
Being explicit about levels of confidence inherently needs brain-time because you have to think about what your level of confidence is.
I am looking for a gears-level introductory course (or a textbook, or anything) in cooking. I want to cook tasty healthy food in an efficient way. I am already often able to cook tasty food, but other times I fail, and often I don’t understand what went wrong and how cooking even works.
I’ve heard good reviews of “Salt, Fat, Acid, Heat” as the definitive “gears level cooking” book, but have never read it myself.
Tim Ferriss book The 4-hour Chef does a good job at explaining the how and why and is efficiency oriented.
Recently, I however moved to the “throw things in the instant pot and let it do the cooking method of solving cooking and find it to be often more efficient then the normal way of cooking.
A new piece of math notation I’ve invented which I plan to use whenever I am writing proofs for myself (rather than for other people).
Sometimes when writing a proof, for some long property P(x) I want to write:
It follows from foo, that there exists x such that P(x). Let x be such that P(x). Then …
I don’t like that I need to write P(x) twice here. And the whole construction is too long for my liking, especially when the reason foo why such x exists is obvious. And if I omit the first sentence It follows from foo, that there exists x such that P(x). and just write
Let x be such that P(x). Then …
then it’s not clear what I mean. It could be that I want to show that such x exists and that from its existence some statement of interest follows. Or it could be that I want to prove some statement of form
For each x such that P(x), it holds that Q(x).
Or it could even be that I want to show that something follows from existence of such x, but I am not asserting that such x exists.
The new notation I came up with is to write L∃t in cases when I want to assert that such x exists and to bound the variable x in the same place. An example (an excerpt, not a complete proof):
Suppose D is a countably infinite set, suppose S is a set of subsets of D, suppose X∈S.
L∃t b be a bijection from N onto D.
L∃t step:S→S such that ∀Y∈S step(Y)=Y, if Y is ⊆-maximal in S, otherwise step(Y)=Y∪{b(min({n∈N∣b(n)∉Y∧Y∪{b(n)}∈S}))}.
By recursion theorem, L∃t ⟨Xn∣n∈N⟩ be such that X0=X, ∀n≥1 Xn=step(Xn−1).
L∃t M=⋃n∈NXn.
...
Many biohacking guides suggest using melatonin. Does liquid melatonin spoil under high temperature if put in tea (95 degree Celcius)?
More general question: how do I even find answers to questions like this one?
When I had a quick go-ogle search I started with:
“melatonin stability temperature”
then
“N-Acetyl-5-methoxytryptamine”
A quick flick through a few abstracts I can’t see anything involving temperatures higher than 37 C i.e. body temperature.
Melatonin is a protein, many proteins denature at temperatures above 41 C.
My (jumped to) conclusion:
No specific data found.
Melatonin may not be stable at high temperatures, so avoid putting it in hot tea.