Human beings can not do most math without pencil and paper and a lot of pondering. Whereas there are a number of papers showing specialized transformers can do math and code at a more sophisticated level than I would have expected before seeing the results.
Houshalter
The Pile includes 7GB of math problems generated by deepmind basically as you describe. I don’t believe the models trained on it can do any of them, but my testing wasn’t properly done.
They fit a simplistic model where the two variables were independent and the contribution of each decays exponentially. This leads to the shocking conclusion that the two inputs are independent and decay exponentially...
I mean the model is probably fine for it’s intended purpose; finding the rough optimal ratio of parameters and data for a given budget. It might mean that current models have suboptimal compute budgets. But it doesn’t imply anything beyond that, like some hard limit to scaling given our data supply.
If the big tech companies really want to train a giant model, but run out of data (unlikely)… well it may not be compute optimal, but there is nothing stopping them from doing multiple passes over the same data. If they even get to the point that it starts to overfit (unlikely), there’s a plethora of regularization methods to try.
The temporal difference learning algorithm is an efficient way to do reinforcement learning. And probably something like it happens in the human brain. If you are playing a game like chess, it may take a long time to get enough examples of wins and losses, for training an algorithm to predict good moves. Say you play 128 games, that’s only 7 bits of information, which is nothing. You have no way of knowing which moves in a game were good and which were bad. You have to assume all moves made during a losing game were bad. Which throws out a lot of information.
Temporal difference learning can learn “capturing pieces is good” and start optimizing for that instead. This implies that “inner alignment failure” is a constant fact of life. There are probably players that get quite far in chess doing nothing more than optimizing for piece capture.
I used to have anxiety about the many worlds hypothesis. It just seems kind of terrifying, constantly splitting into hell-worlds and the implications of quantum immortality. But it didn’t take long for it to stop bothering me and to even suppress thoughts about it. After all such thoughts don’t lead to a reward and cause problems and an RL brain should punish them.
But that’s kind of terrifying itself isn’t it? I underwent a drastic change to my utility function. And even the emergence of anti-rational heuristics for suppressing thoughts. Which a rational bayesian should never do (at least not for these reasons.)
Anyway gwern has a whole essay on multi-level optimization algorithms like this, that I haven’t seen linked yet: https://www.gwern.net/Backstop
It’s back btw. If it ever goes down again you can probably get it on wayback machine. And yes the /r/bad* subreddits are full of terrible academia snobbery. Badmathematics is the best of the bunch because mathematics is at least kind of objective. So they mostly talk about philosophy of mathematics.
The problem is formal models of probability theory have problems with logical uncertainty. You can’t assign a nonzero probability to a false logical statement. All the reasoning about probability theory is around modelling uncertainty in the unkown external world. This is an early attempt to think about logical uncertainty. Which MIRI has now published papers on and tried to formalize.
Just calling them “log odds” is fine and they are widely used in real work.
Btw what does “Response to previous version” mean? Was this article significantly editted? It doesn’t seem so confrontational reading it now.
That’s unlikely. By the late 19th century there was no stopping the industrial revolution. Without coal maybe it would have slowed down a bit. But science was advancing at a rapid pace, and various other technologies from telephones to electricity were well on their way. It’s hard for us to imagine a world without coal, since we took that path. But I don’t see why it couldn’t be done. There would probably be a lot more investment in hydro and wind power (both of which were a thing before the industrial revolution.) And eventually solar. Cars would be hard, but electric trains aren’t inconceivable.
we have nuclear weapons that are likely visible if fired en mass.
Would we be able to detect nuclear weapons detonated light years away? We have trouble detecting detonations on our own planet! And even if we did observe them, how would we recognize it as an alien invasion vs local conflict, or god knows what else.
The time slice between us being able to observe the stars, and post singularity, is incredibly tiny. It’s very unlikely two different worlds will overlap so that one world is able to see the other destroyed and rush a singularity. I’m not even sure if we would rush a singularity if we observed aliens, or if it would make any difference.
First of all, the Earth has been around for a very very long time. Even slowly expanding aliens should have hit us by now. The galaxy isn’t that big relative to the vast amounts of time they have probably been around. I don’t feel like this explains the fermi paradox.
If aliens wanted to prevent us from fleeing, this is a terribly convoluted way of doing it. Just shoot a self replicating nanobot at us near the speed of light, and we would be dealt with. We would never see it coming. They could have done this thousands of years ago, if not millions. And it would be vastly more effective at snuffing out competition than this weird strategy. No need to even figure out which planets might evolve intelligent life. Just shoot all of them, it’s cheap.
You could time them so they all hit their targets at the same time and give no warning. Or have them just do the minimal amount of destruction necessary so they aren’t visible from space.
Well we have plausible reason to believe in aliens. The copernican principle, that the Earth isn’t particularly special and the universe is enormous. There’s literally no reason to believe angels and demons are plausible.
And god do I hate skeptics and how they pattern match everything “weird” to religion. Yes aliens are weird. That doesn’t mean they have literally the same probability of existing as demons.
I think a concrete example is good for explaining this concept. Imagine you flip a coin and then put your hand over it before looking. The state of the coin is already fixed on one value. There is no probability or randomness involved in the real world now. The uncertainty of it’s value is entirely in your head.
From Surely You’re Joking Mr. Feynman:
Topology was not at all obvious to the mathematicians. There were all kinds of weird possibilities that were “counterintuitive.” Then I got an idea. I challenged them: “I bet there isn’t a single theorem that you can tell me—what the assumptions are and what the theorem is in terms I can understand—where I can’t tell you right away whether it’s true or false.”
It often went like this: They would explain to me, “You’ve got an orange, OK? Now you cut the orange into a finite number of pieces, put it back together, and it’s as big as the sun. True or false?”
“No holes.”
“Impossible!
“Ha! Everybody gather around! It’s So-and-so’s theorem of immeasurable measure!”
Just when they think they’ve got me, I remind them, “But you said an orange! You can’t cut the orange peel any thinner than the atoms.”
“But we have the condition of continuity: We can keep on cutting!”
“No, you said an orange, so I assumed that you meant a real orange.”
So I always won. If I guessed it right, great. If I guessed it wrong, there was always something I could find in their simplification that they left out.
Actually, there was a certain amount of genuine quality to my guesses. I had a scheme, which I still use today when somebody is explaining something that I’m trying to understand: I keep making up examples. For instance, the mathematicians would come in with a terrific theorem, and they’re all excited. As they’re telling me the conditions of the theorem, I construct something which fits all the conditions. You know, you have a set (one ball)—disjoint (two halls). Then the balls turn colors, grow hairs, or whatever, in my head as they put more conditions on. Finally they state the theorem, which is some dumb thing about the ball which isn’t true for my hairy green ball thing, so I say, “False!”
If it’s true, they get all excited, and I let them go on for a while. Then I point out my counterexample.
“Oh. We forgot to tell you that it’s Class 2 Hausdorff homomorphic.”
“Well, then,” I say, “It’s trivial! It’s trivial!” By that time I know which way it goes, even though I don’t know what Hausdorff homomorphic means.
I guessed right most of the time because although the mathematicians thought their topology theorems were counterintuitive, they weren’t really as difficult as they looked. You can get used to the funny properties of this ultra-fine cutting business and do a pretty good job of guessing how it will come out.
Yudkowsky has changed his views a lot over the last 18 years though. A lot of his earlier writing is extremely optimistic about AI and it’s timeline.
This is by far my favorite form of government. It’s a great response whenever the discussion of “democracy is the best form of government we have” comes up. Some random notes in no particular order:
Sadly getting support for this in the current day is unlikely because of the huge negative associations with IQ tests. Even literacy tests for voters are illegal because of a terrible history of fake tests being used by poll workers to exclude minorities. (Yes the tests were fake like this one, where all the answers are ambiguous and can be judged as correct or incorrect depending on how the test grader feels about you.)
This doesn’t actually require the IQ testing portion though. I believe the greatest problem with democracy is that voters are mostly uninformed. And they have no incentive to get informed. A congress randomly sampled from the population though, would be able to hear issues and debates in detail. Even if they are average IQ, I think it would be much better than the current system. And you could use this congress of “average” representatives to vote for other leaders like judges and presidents, who would be more selected for intelligence.
In fact you could just use this system to randomly select voters from the population. Get them together so they can discuss and debate in detail, and know their votes really matter. And then have them vote on the actual leaders and representatives like a normal election. I believe something like this is mentioned at the end of the article.
Of course I still like and approve of the IQ filtering idea. But I think these two ideas are independent, and the IQ portion is always going to be the most controversial.
I think the sortition should be entirely opt-in, just like normal voting is. This selects for people that actually care about politics and want to be representatives. Which might select for IQ a bit on it’s own. And prevents you from getting uninterested people that are bored out of their mind by politics.
One could argue such a system would be unrepresentative of minority groups. If they have lower IQs or are less likely to opt in. However the current system isn’t representative at all. Look at the makeup of congress now. Different demographics are more or less likely to vote in elections as it is. And things like gerrymandering and just regular geographic-based voting distort representation a lot. And yet somehow it still mostly works, and I don’t think this system could be any worse in that dimension.
But if it is a concern, you could just resample groups to represent the general population. So if women are half as likely to opt-in, women that do opt-in should be made twice as likely to be selected. I’m not sure if this is a good or desirable thing to do, just that it would quell these objections.
Selecting for the top 1% of IQ is too much filtering. You really don’t want to create an incentive to game IQ tests. At least not too much. And remember IQ tests are not perfect, they can be practiced to improve your score. You also don’t want a bunch of representatives that are freaks of nature, that have brains really good at Raven’s Matrices and nothing else. There are multiple dimensions to intelligence, and while they correlate, the correlation isn’t 100%. I’d arbitrarily go with the top 5% - the best scorer out of 20. Even that seems high.
All the discussion about how the system could be corrupted is ridiculous. People had the same objections to regular democracy. How do we trust that the poll workers and vote counters are reliable? What’s to stop a vast conspiracy of voting fraud?
Somehow we’ve mostly solved these problems and votes are trusted. When issues arise, we have a court system that seems to be relatively fair about resolving them. And it’s still not perfect. We have stuff like gerrymandering that wouldn’t be an issue with sortition based systems.
I hope the mods don’t remove this for violating the politics rule. While it is technically about political systems, it’s only in a meta sense. Talking about the political system itself, not specific policies or ideologies. There is nothing particularly left or right wing about these ideas. I don’t think anyone is likely to be mindkilled by it.
In the first draft of the lord of the rings, the Balrog ate the hobbits and destroyed middle Earth. Tolkien considered this ending unsatisfactory, if realistic, and wisely decided to revise it.
It’s really going to depend on your interests. I guess I’ll just dump my favorite channels here.
I enjoy some math channels like Numberphile, computerphile, standupmaths, 3blue1brown, Vi Hart, Mathologer, singingbanana, and some of Vsauce.
For “general interesting random facts” there’s Tom Scott, Wendover Productions, CGP Grey, Lindybeige, Shadiversity. and Today I Found Out.
Science/Tech/etc: engineerguy, Kurzgesagt, and C0nc0rdance.
Miscellaneous: kaptainkristian, CaptainDisillusion, and the more recent videos of suckerpinch.
Politics: I unsubscribed from most political content a long time ago. But Last Week Tonight and Vox are pretty good.
Humor: That’s pretty subjective, but I think everyone should know about The Onion. Also Fitzthislewitz.
Well there is a lot of research into treatments for dementia, like the neurogenesis drug I mentioned above. I think it’s quite plausible they will stumble upon general cognitive enhancers that improve healthy people.
Just because it’s genetic doesn’t mean it’s incurable. Some genetic diseases have been cured. I’ve read of drugs that increase neurogenesis, which could plausibly increase IQ. Scientists have increased the intelligence of mice by replacing their glial cells with better human ones.
I wasn’t aware that method had a name, but I’ve seen that idea suggested before when this topic comes up. For neural networks in particular, you can just look at the gradients of the inputs to see how it’s output changes as you change each input.
I think the problem people have, is that just tells you what the machine is doing. Not why. Machine learning can never really offer understanding.
For example, there was a program created specifically for the purpose of training human understandable models. It worked by fitting the simplest possible mathematical expression to the data. And the hope was that simple mathematical expressions would be easy to interpret by humans.
One biologist found an expression that perfectly fit his data. It was simple, and he was really excited by it. But he couldn’t understand what it meant at all. And he couldn’t publish it, because how can you publish an equation without any explanation?
Same as with the GAN thing. You condition it on producing a correct answer (or whatever the goal is.) So if you are building a question answering AI, you have it model the probability distribution something like P(human types this character | human correctly answers question). This could be done simply by only feeding it examples of correctly answered questions as it’s training set. Or you could have it predict what a human might respond if they had n days to think about it.
Though even that may not be necessary. What I had in mind was just having the AI read MIRI papers and produce new ones just like them. Like a superintelligent version of what people do today with markov chains or RNNs to produce writing in the style of an author.
Yes these methods do limit the AI’s ability a lot. It can’t do anything a human couldn’t do, in principle. But it can automate the work of humans and potentially do our job much faster. And if human ability isn’t enough to build an FAI, well you could always set it to do intelligence augmentation research instead.
I’m not sure what my exact thoughts were back then. I was/am at least skeptical of the specific formula used as it seems arbitrary. It is designed intentionally to have certain properties like exponentially diminishing returns. So it’s not exactly a “wild implication” that it has these properties.
I recently fit the Chinchilla formula to the data from the first LLaMA paper: https://i.imgur.com/u1Tm5EU.png
This was over an unrelated disagreement elsewhere about whether Chinchilla’s predictions still held or made sense. As well as the plausibility of training tiny models to far greater performance.
First, the new parameters are wildly different than the old ones. Take that for what you will, but they are hardly set in stone. Second even with the best fit, the formula still doesn’t really match the shape of the observed curves. I think it’s just not the right curve.
As for reusing data I’ve seen sources claim reusing data up in language models to four times had no negative effect. And up to like 40 times was possible before it really stopped helping. I think LLMs currently do not use much regularization and other tricks that were done in other fields when data was limited. Those might push it further.
If data became truly scarce, there may be other tricks to extend the data we have further. You also have all of the data from the people that talk to these things all day and upvote and downvote their responses. (I don’t think anyone has even tried making an AI that intentionally asks users questions about things it wants to learn more about, like a human would do.)