Dan Elton blog: https://moreisdifferent.substack.com/ website: http://www.moreisdifferent.com twitter: https://twitter.com/moreisdifferent
delton137
Boston ACX Spring Schelling Point Meetup
Board games @ Aeronaut Brewing
That’s really cool, thanks for sharing!
Since nobody else posted these:
Bay Area is Sat Dec 17th (Eventbrite) (Facebook)
South Florida (about an hour north of Miami) is Sat Dec 17th (Eventbrite) (Facebook)
On current hardware, sure.
It does look like scaling will hit a wall soon if hardware doesn’t improve, see this paper: https://arxiv.org/abs/2007.05558
But Gwern has responded to this paper pointing out several flaws… (having trouble finding his response right now..ugh)
However, we have lots of reasons to think Moore’s law will continue … in particular future AI will be on custom ASICs / TPUs / neuromorphic chips, which is a very different story. I wrote about this long ago, in 2015. Such chips, especially asynchronous and analog ones, can be vastly more energy efficient.
I disagree, in fact I actually think you can argue this development points the opposite direction, when you look at what they had to do to achieve it and the architecture they use.
I suggest you read Ernest Davis’ overview of Cicero. Cicero is a special-purpose system that took enormous work to produce—a team of multiple people labored on it for three years. They had to assemble a massive dataset from 125,300 online human games. They also had to get expert annotations on thousands of preliminary outputs. Even that was not enough.. they had to generate synthetic datasets as well to fix issues with the system! Even then, the dialogue module required a specialized filter to remove nonsense. This is a break from the scaling idea that says to solve new problems you just need to scale existing architectures to more parameters (and train on a large enough dataset).
Additionally, they argue that this system appears very unlikely to generalize to other problems, or even to slight modifications of the game of Diplomacy. It’s not even clear how well it would generalize to non-blitz games. If the rules were modified slightly, the entire system would likely have to be retrained.I also want to point out that scientific research is not easy as you make it sound. Professors spend the bulk of their time writing proposals, so perhaps AI could help there by summarizing existing literature. Note though a typical paper, even a low-value one, generally takes a graduate student with specialized training about a year to complete, assuming the experimental apparatus and other necessary infrastructure are all in place. Not all science is data-driven either, science can also be observation-driven or theory-driven.
I’ve looked into these methods a lot, in 2020 (I’m not so much up to date on the latest literature). I wrote a review in my 2020 paper, “Self-explaining AI as an alternative to interpretable AI”.
There are a lot of issues with saliency mapping techniques, as you are aware (I saw you link to the “sanity checks” paper below). Funnily enough though, the super simple technique of occlusion mapping does seem to work very well, though! It’s kinda hilarious actually that there are so many complicated mathematical techniques for saliency mapping, but I have seen no good arguments as to why they are better than just occlusion mapping. I think this is a symptom of people optimizing for paper publishing and trying to impress reviewers with novelty and math rather than actually building stuff that is useful.You may find this interesting: “Exemplary Natural Images Explain CNN Activations Better than State-of-the-Art Feature Visualization”. What they show is that a very simple model-agnostic technique (finding the image that maximizes an output) allows people to make better predictions about how a CNN will behave than Olah’s activation maximization method, which produces images that can be hard to understand. This is exactly the sort of empirical testing I suggested in my Less Wrong post from Nov last year.
The comparison isn’t super fair because Olah’s techniques were designed for detailed mechanistic understanding, not allowing users to quickly be able to predict CNN behaviour. But it does show that simple techniques can have utility for helping users understand at a high level how an AI works.
Rationality reading group—Free books!
Board Games at Aeronaut Brewing
Board gaming @ Aeronaut Brewing
Jason Crawford on the Progress Studies movement
There’s no doubt a world simulator of some sort is probably going to be an important component in any AGI, at the very least for planning—Yan LeCun has talked about this a lot. There’s also this work where they show a VAE type thing can be configured to run internal simulations of the environment it was trained on.
In brief, a few issues I see here:
You haven’t actually provided any evidence that GPT does simulation other than “Just saying “this AI is a simulator” naturalizes many of the counterintuitive properties of GPT which don’t usually become apparent to people until they’ve had a lot of hands-on experience with generating text.” What counterintuitve properties, exactly? Examples I’ve seen show GPT-3 is not simulating the environment being described in the text. I’ve seen a lot impressive examples too, but I find it hard to draw conclusions on how the model works by just reading lots and lots of outputs… I wonder what experiments could be done to test your idea that it’s running a simulation.
Even for very simple to simulate processes such as addition or symbol substitution, GPT has, in my view, trouble learning them, even though it does Grok those things eventually. For things like multiplication, the accuracy it has depends on how often the numbers appear in the training data (https://arxiv.org/abs/2202.07206), which is a bit telling, I think.
Simulating the laws of physics is really hard.. trust me on this (I did a Ph.D. in molecular dynamics simulation). If it’s doing any simulation at all, it’s got to be some high level heuristic type stuff. If it’s really good, it might be capable of simulating basic geometric constraints (although IIRC GPT is superb at spatial reasoning). Even humans are really bad at properly simulating physics accurately (researchers found that most people do really poorly on a test of basic physics based reasoning, like basic kinematics (will this ball curve left, right , or go straight, etc)). I imagine gradient descent is going to be much more likely to settle on shortcut rules and heuristics rather than implementing a complex simulation.
Peperine (black pepper extract) can help make quercetin more bioavailable. They are co-administered in many studies on the neuroprotective effects of quercetin: https://scholar.google.com/scholar?hl=en&as_sdt=0,22&q=piperine+quercetin
I find slower take-off scenarios more plausible. I like the general thrust of Christiano’s “What failure looks like”. I wonder if anyone has written up a more narrative / concrete account of that sort of scenario.
The thing you are trying to study (“returns on cognitive reinvestment”) is probably one of the hardest things in the world to understand scientifically. It requires understanding both the capabilities of specific self-modifying agents and the complexity of the world. It depends what problem you are focusing on too—the shape of the curve may be very different for chess vs something like curing disease. Why? Because chess I can simulate on a computer, so throwing more compute at it leads to some returns. I can’t simulate human biology in a computer—we have to actually have people in labs doing complicated experiments just to understand one tiny bit of human biology.. so having more compute / cognitive power in any given agent isn’t necessarily going to speed things along.. you also need a way of manipulating things in labs (either humans or robots doing lots of experiments). Maybe in the future an AI could read massive numbers of scientific papers and synthesize them into new insights, but precisely what sort of “cognitive engine” is required to do that is also very controversial (could GPT-N do it?).
Are you familiar with the debate about Bloom et al and whether ideas are getting harder to find? (https://guzey.com/economics/bloom/ , https://www.cold-takes.com/why-it-matters-if-ideas-get-harder-to-find/). That’s relevant to predicting take-off.
The other post I always point people too is this one by Chollet.
I don’t necessarily agree with it but I found it stimulating and helpful for understanding some of the complexities here.
So basically, this is a really complex thing.. throwing some definitions and math at it isn’t going to be very useful, I’m sorry to say. Throwing math and definitions at stuff is easy. Modeling data by fitting functions is easy. Neither is very useful in terms of actually being able to predict in novel situations (ie extrapolation / generalization), which is what we need to predict AI take-off dynamics. Actually understanding things mechanistically and coming up with explanatory theories that can withstand criticism and repeated experimental tests is very hard. That’s why typically people break hard questions/problems down into easier sub-questions/problems.
How familiar are you with Chollet’s paper “On the Measure of Intelligence”? He disagrees a bit with the idea of “AGI” but if you operationalize it as “skill acquisition efficiency at the level of a human” then he has a test called ARC which purports to measure when AI has achieved human-like generality.
This seems to be a good direction, in my opinion. There is an ARC challenge on Kaggle and so far AI is far below the human level. On the other hand, “being good at a lot of different things”, ie task performance across one or many tasks, is obviously very important to understand and Chollet’s definition is independent from that.
Boston ACX board games @ Aeronaut brewing
Thanks, it’s been fixed!!
I would modify the theory slightly by noting that the brain may become hypersensitive to sensations arising from the area that was originally damaged, even after it has healed. Sensations that are otherwise normal can then trigger pain. I went to the website about pain reprocessing therapy and stumbled upon an interview with Alan Gordon where he talked about this. I suspect that high level beliefs about tissue damage etc play a role here also in causing the brain to become hyper focused on sensations coming from a particular region and to interpret them as painful.
Something else that comes to mind here is the rubber hand illusion. Watch this video—and look at the flinches! Interesting, eh?
edit: (ok, the rubber hand illusion isn’t clearly related, but it’s interesting!)