Yuxi Liu is a PhD student in Computer Science at the Berkeley Artificial Intelligence Research Lab, researching on the scaling laws of large neural networks.
Personal website: https://yuxi-liu-wired.github.io/
Yuxi Liu is a PhD student in Computer Science at the Berkeley Artificial Intelligence Research Lab, researching on the scaling laws of large neural networks.
Personal website: https://yuxi-liu-wired.github.io/
Plenty of linguists and connectionists thought it was possible, if only to show those damned Chomskyans that they were wrong!
To be specific, some of the radical linguists believed in pure distributional semantics, or that there is no semantics beyond syntax. I don’t know anyone in particular, but considering how often Chomsky, Pinker, etc were fighting against the “blank slate” theory, they definitely existed.
The following people likely believed that it is possible to learn a language purely from reading using a general learning architecture like neural networks (blank-slate):
James L. McClelland and David Rumelhart.
They were the main proponents of neural networks in the “past tense debate”. Generally, anyone on the side of neural networks in the past tense debate probably believed this.
B. F. Skinner.
Radical syntacticians? Linguists have failed to settle the question of “Just what is semantics? How is it different from syntax?”, and some linguists have taken the radical position “There is no semantics. Everything is syntax.”. Once that is done, there simply is no difficulty: just learn all the syntax, and there is nothing left to learn.
Possibly some of the participants in the “linguistics wars” believed in it. Specifically, some believed in “generative semantics”, whereby semantics is simply yet more generative grammar, and thus not any different from syntax (also generative grammar). Chomsky, as you might imagine, hated that, and successfully beat it down.
Maybe some people in distributional semantics? Perhaps Leonard Bloomfield? I don’t know enough about the history of linguistics to tell what Bloomfield or the “Bloomfieldians” believed in exactly. However, considering that Chomsky was strongly anti-Bloomsfield, it is a fair bet that some Bloomsfieldians (or self-styled “neo-Bloomsfieldians”) would support blank-slate learning of language, if only to show Chomskyans that they’re wrong.
https://www.gov.cn/zhengce/202407/content_6963770.htm
中共中央关于进一步全面深化改革 推进中国式现代化的决定 (2024年7月18日中国共产党第二十届中央委员会第三次全体会议通过)
(51)完善公共安全治理机制。健全重大突发公共事件处置保障体系,完善大安全大应急框架下应急指挥机制,强化基层应急基础和力量,提高防灾减灾救灾能力。完善安全生产风险排查整治和责任倒查机制。完善食品药品安全责任体系。健全生物安全监管预警防控体系。加强网络安全体制建设,建立人工智能安全监管制度。
I checked the translation:
(51) Improve the public security governance mechanism. Improve the system for handling major public emergencies, improve the emergency command mechanism under the framework of major safety and emergency response, strengthen the grassroots emergency foundation and force, and improve the disaster prevention, mitigation and relief capabilities. Improve the mechanism for investigating and rectifying production safety risks and tracing responsibilities. Improve the food and drug safety responsibility system. Improve the biosafety supervision, early warning and prevention and control system. Strengthen the construction of the internet security system and establish an artificial intelligence safety supervision-regulation system.
As usual, utterly boring.
You have inspired me to do the same with my writings. I just updated my entire website to PD, with CC0 as a fallback (releasing under Public Domain being unavailable on GitHub, and apparently impossible under some jurisdictions??)
I don’t fully understand why other than to gesture at the general hand-wringing that happens any time someone proposes doing something new in human reproduction.
I have the perfect quote for this.
A breakthrough, you say? If it’s in economics, at least it can’t be dangerous. Nothing like gene engineering, laser beams, sex hormones or international relations. That’s where we don’t want any breakthroughs. ”
(Galbraith, 1990) A Tenured Professor, Houghton Mifflin; Boston.
Just want to plug my 2019 summary of the book that started it all.
How to take smart notes (Ahrens, 2017) — LessWrong
It’s a good book, for sure. I use Logseq, which is similar to Roam but more fitted to my habits. I never bought into the Roam hype (rarely even heard of it), but this makes me glad I never went into it.
In an intelligence community context, the American spy satellites like the KH program achieved astonishing things in photography, physics, and rocketry—things like handling ultra-high-resolution photography in space (with its unique problems like disposing of hundreds of gallons of water in space) or scooping up landing satellites in helicopters were just the start. (I was skimming a book the other day which included some hilarious anecdotes—like American spies would go take tourist photos of themselves in places like Red Square just to assist trigonometry for photo analysis.) American presidents obsessed over the daily spy satellite reports, and this helped ensure that the spy satellite footage was worth obsessing over. (Amateurs fear the CIA, but pros fear NRO.)
What is that book with the fun anecdotes?
I use a fairly basic Quarto template for website. The code for the entire site is on github.
The source code is actually right there in the post. Click the button Code
, then click View Source
.
https://yuxi-liu-wired.github.io/blog/posts/perceptron-controversy/
Concretely speaking, are you to suggest that a 2-layered fully connected network trained by backpropagation, with ~100 neurons in each layer (thus ~20000 weights), would have been uneconomical even in the 1960s, even if they had backprop?
I am asking this because the great successes in 1990s connectionism, including LeNet digit recognition, NETtalk, and the TD-gammon, all were on that order of magnitude. They seem within reach for the 1960s.
Concretely speaking, TD-gammon cost about 2e13 FLOPs to train, and in 1970, 1 million FLOP/sec cost 1 USD, so with 10000 USD of hardware, it would take about 1 day to train.
And interesting that you mentioned magnetic cores. The MINOS II machine built in 1962 by the Stanford Research Institute group had precisely a grid of magnetic core memory. Can’t they have scaled it up and built some extra circuitry to allow backpropagation?
Corroborating the calculation, according to some 1960s literature, magnetic core logic could go up to 10 kHz. So if we have ~1e4 weights updated 1e4 times a second, that would be 1e8 FLOP/sec right there. TD-gammon would take ~1e5 seconds ~ 1 day, the same OOM as the previous calculation.
I was thinking of porting it full-scale here. It is in R-markdown format. But all the citations would be quite difficult to port. They look like [@something2000].
Does LessWrong allow convenient citations?
In David Rodin’s Posthuman Life, a book that is otherwise very obtuse and obscurely metaphysical, there is an interesting argument for making posthumans before we know what they might be (indeed, he rejected the precautionary principle on the making of posthumans):
CLAIM. We have an obligation to make posthumans, or not prevent their appearance.
PROOF.
Principle of accounting: we have an obligation to understand posthumans
Speculative posthumanism: there could be radical posthumans
Radical posthumans are impossible to understand unless we actually meet them
We can only meet radical posthumans if we make them (intentionally or accidentally).
This creates an ethical paradox, the posthuman impasse.
we are unable to evaluate any posthuman condition. Since posthumans could result from some iteration of our current technical activity, we have an interest in understanding what they might be like. If so, we have an interest in making or becoming posthumans.
to plan for the future evolution of humans, we should evaluate what posthumans are like, which kinds are good, which kinds are bad, before we make them.
most kinds of posthumans can only be evaluated after they appear.
completely giving up on making posthumans would lock humanity at the current level, which means we give up on great goods for fear of great bads. This is objectionable by arguments similar to those employed by transhumanists.
The quote
All energy must ultimately be spent pointlessly and unreservedly, the only questions being where, when, and in whose name… Bataille interprets all natural and cultural development upon the earth to be side-effects of the evolution of death, because it is only in death that life becomes an echo of the sun, realizing its inevitable destiny, which is pure loss.
Is from page 39 of The Thirst for Annihilation (Chapter 2, The curse of the sun).
Note that the book was published in 1992, early for Nick Land. In this book, Nick Land mixes Bataille’s theory with his own. I have read Chapter 2 again just then and it is definitely more Bataille than Land.
Land has two faces. On the “cyberpunk face”, he writes against top-down control. In this regard he is in sync with many of the typical anarchists, but with a strong emphasis on technology. In Machinic Desire, he called it “In the near future the replicants — having escaped from the off-planet exile of private madness—emerge from their camouflage to overthrow the human security system.”.
On the “intelligence face”, he writes for maximal intelligence, even when it leads to a singleton. A capitalist economy becoming bigger and more efficient is desirable precisely because it is the most intelligent thing in this patch of the universe. In the Pythia Unbound essay, “Pythia” seems likely to become such a singleton.
In either face, maximizing waste-heat isn’t his deal.
A small comment about Normative Realism: From my reading, Wilfrid Sellars’ theory has a strong effect on Normative Realism. The idea went like this:
Agents are players in a game of “giving and asking reasons”. To be an agent is simply to follow the rules of the game. To not play the game would be either self-inconsistent, or be community-inconsistent. In either case, a group of agents can only do science if they are players of the game.
With this argument, he aimed to secure the “manifest image of man” against the “scientific image of man”. Namely, free will has to be implemented or simulated by APIs of the program.
Assuming that being able to do science is a necessary condition for dominance and power (in the Darwinian game of survival), we either meet agents, or beings who are so weak that we do not need to worry (shades of social Darwinism).
Brief note: the “analysis by synthesis” idea is called “vision as inverse graphic” in computer graphics research.
For reservoir computing, there are concrete results. It is not just magic.
No. Any decider will be unfair in some way, whether it knows anything about history at all. The decider can be a coin flipper and it would still be biased. One can say that the unfairness is baked into the reality of base-rate difference.
The only way to fix this is not fixing the decider, but to just somehow make the base-rate difference disappear, or to compromise on the definition of fairness so that it’s not so stringent, and satisfiable.
And in common language and common discussion of algorithmic bias, “bias” is decidedly NOT merely a statistical definition. It always contains a moral judgment: violation of a fairness requirement. To say that a decider is biased is to say that the statistical pattern of its decision violates a fairness requirement.
The key message is that, by the common language definition, “bias” is unavoidable. No amount of trying to fix the decider will make it fair. Blinding it to the history will do nothing. The unfairness is in the base rate, and in the definition of fairness.
I’m following common speech where “biased” means “statistically immoral, because it violates some fairness requirement”.
I showed that with base rate difference, it’s impossible to satisfy three fairness requirements. The decider (machine or not) can completely ignore history. It could be a coin-flipper. As long as the decider is imperfect, it would still be unfair in one of the fairness requirements.
And if the base rates are not due to historical circumstances, this impossibility still stands.
I cannot see anything that is particularly innovative in the paper, though I’m not an expert on this.
Maybe ask people working on poker AI, like Sandholm, directly. Perhaps something like many details of the particular program (and the paper is full of these details) must be assembled in order for this to work cheaply enough to be trained.
Yes, (Kleinberg et al, 2016)… Do not read it. Really, don’t. The derivation is extremely clumsy (and my professor said so too).
The proof has been considerably simplified in subsequent works. Look around for papers that cite that paper should give a published paper that does the simplification...
Relevant quotes:
Original text is from Discourse on Heaven of Xunzi:
雩而雨,何也?曰:無佗也,猶不雩而雨也。日月食而救之,天旱而雩,卜筮然後決大事,非以為得求也,以文之也。故君子以為文,而百姓以為神。以為文則吉,以為神則凶也。
The Britannica says:
Another celebrated essay is “A Discussion of Heaven,” in which he attacks superstitious and supernatural beliefs. One of the work’s main themes is that unusual natural phenomena (eclipses, etc.) are no less natural for their irregularity—hence are not evil omens—and therefore men should not be concerned at their occurrence. Xunzi’s denial of supernaturalism led him into a sophisticated interpretation of popular religious observances and superstitions. He asserted that these were merely poetic fictions, useful for the common people because they provided an orderly outlet for human emotions, but not to be taken as true by educated men. There Xunzi inaugurated a rationalistic trend in Confucianism that has been congenial to scientific thinking.
Heaven never intercedes directly in human affairs, but human affairs are certain to succeed or fail according to a timeless pattern that Heaven determined before human beings existed...
Thus rituals are not merely received practices or convenient social institutions; they are practicable forms in which the sages aimed to encapsulate the fundamental patterns of the universe. No human being, not even a sage, can know Heaven, but we can know Heaven’s Way, which is the surest path to a flourishing and blessed life. Because human beings have limited knowledge and abilities, it is difficult for us to attain this deep understanding, and therefore the sages handed down the rituals to help us follow in their footsteps.
for anyone not wanting to go in and see the Kafka, I copied some useful examples:
ANNA ROGERS: I was considering making yet another benchmark, but I stopped seeing the point of it. Let’s say GPT-3 either can or cannot continue [generating] these streams of characters. This tells me something about GPT-3, but that’s not actually even a machine learning research question. It’s product testing for free.
JULIAN MICHAEL: There was this term, “API science,’’ that people would use to be like: “We’re doing science on a product? This isn’t science, it’s not reproducible.” And other people were like: “Look, we need to be on the frontier. This is what’s there.”
TAL LINZEN (associate professor of linguistics and data science, New York University; research scientist, Google): For a while people in academia weren’t really sure what to do.
R. THOMAS MCCOY: Are you pro- or anti-LLM? That was in the water very, very much at this time.
JULIE KALLINI (second-year computer science Ph.D. student, Stanford University): As a young researcher, I definitely sensed that there were sides. At the time, I was an undergraduate at Princeton University. I remember distinctly that different people I looked up to — my Princeton research adviser [Christiane Fellbaum] versus professors at other universities — were on different sides. I didn’t know what side to be on.
LIAM DUGAN: You got to see the breakdown of the whole field — the sides coalescing. The linguistic side was not very trusting of raw LLM technology. There’s a side that’s sort of in the middle. And then there’s a completely crazy side that really believed that scaling was going to get us to general intelligence. At the time, I just brushed them off. And then ChatGPT comes out.