LessWrong is uncensored in China.
robo
I know it’s not your main point, but for the actual 4-minute-mile I’m on the side of the null hypothesis. In a steady progression, once any one arbitrary threshold is crossed (4:10 minutes, 4:00 minutes, 3:50 minutes), many others are soon to follow.
Trolling a bit, perhaps we could talk about a “4-Minute-Mile Gell-Mann Effect”. Events that to outsiders look like discontinuous revolutions look to insiders like minor ticks with surprising publicity.
A standard trick is to add noise to the signal to (stochastically) let parts get over the hump.
BTW, feed this post into ChatGPT and it will tell you the answer
Somewhat mean caveat emptor for other readers: I just spent an hour trying to understand this post, and wish that I hadn’t. It’s still possible I’m missing the thing, but inside view is I’ve found the thing and the thing just isn’t that interesting.[1]
- ^
Feeding a program its Gödel numbering isn’t relevant (and doesn’t work?!), and the puzzle is over perhaps missing out on an unfathomably small amount of money[2].
- ^
By “unfathomably small” I mean ≈ dollars. And, sure, there could be a deep puzzle there, but I feel that when a puzzle has its accidental complexity removed you usually can produce a more compelling use case.
- ^
Could you spell this out? I don’t see how AI has much to do with trade. Is the idea that AI development is bounded on the cost of GPUs, and this will raise the cost of outside-China GPUs compared to inside-China GPUs? Or is it that there will be less VC money e.g. because interest rates go up to combat inflation?
Yes, it means figure out how the notation works.
That’s a good Coasian point. Talking out of my butt, but I think the airlines don’t carry the risk. The sale channel (airlines, Expedia, etc.) take commissions distributing an insurance product designed another company (Travel Insured International, Seven Corners) who handles product design compliance, with the actual claims being handled by another company and the insurance capital by yet another company (AIG, Berkshire Hathaway).
LLMs tell me the distributors get 30–50% commission, which tells you that it’s not a very good product for consumers.
But fear of death does seem like a kind of value systematization
I don’t think it’s system 1 doing the systemization. Evolution beat fear of death into us in lots of independent forms (fear of heights, snakes, thirst, suffocation, etc.), but for the same underlying reason. Fear of death is not just an abstraction humans invented or acquired in childhood; is a “natural idea” pointed at by our brain’s innate circuitry from many directions. Utilitarianism doesn’t come with that scaffolding. We don’t learn to systematize Euclidian and Minkowskian spaces the same way either.
Quick takes are presented inline, posts are not. Perhaps posts could be presented as title + <80 (140?) character summary.
You may live in a place where arguments about the color of the sky are really arguments about tax policy. I don’t think I live there? I’m reading your article saying “If Blue-Sky-ism is to stand a chance against the gravitational pull of Green-Sky-ism, it must offer more than talk of a redistributionist tax system” and thinking ”...what on earth...?”. This might be perceptive cultural insight about somewhere, but I do not understand the context. [This my guess as to why are you are being voted down]
You might be[1] overestimating the popularity of “they are playing god” in the same way you might overestimate the popularity of woke messaging. Loud moralizers aren’t normal people either. Messages that appeal to them won’t have the support you’d expect given their volume.
Compare, “It’s going to take your job, personally”. Could happen, maybe soon, for technophile programmers! Don’t count them out yet.
- ^
Not rhetorical—I really don’t know
- ^
Eliezer Yudkowsky wrote a story Kindness to Kin about aliens who love(?) their family members proportionally to the Hamilton’s “I’d lay down my life for two brothers or eight cousins” rule. It gives an idea to how alien it is.
Then again, Proto-Indo-European had detailed family words that correspond rather well to confidence of genetic kinship, so maybe it’s a cultural thing.
Sure, I think that’s a fair objection! Maybe, for a business, it may be worth paying the marginal security costs of giving 20 new people admin accounts, but for the federal government that security cost is too high. Is that what people are objecting to? I’m reading comments like this:
Yeah, that’s beyond unusual. It’s not even slightly normal. And it is in fact very coup-like behavior if you look at coups in other countries.
And, I just don’t think that’s the case. I think this is pretty-darn-usual and very normal in the management consulting / private equity world.
I don’t think foreign coups are a very good model for this? Coups don’t tend to start by bringing in data scientists.
What I’m finding weird is...this was the action people thought worrying enough to make it to the LessWrong discussion. Cutting red tape to unblock data scientists in cost-cutting shakeups—that sometimes works well! Assembling lists of all CIA officers and sending them emails, or trying to own the Gaza strip, or <take your pick>. I’m far mode on these, have less direct experience, but they seem much more worrying. Why did this make the threshold?
Huh, I came at this with the background of doing data analysis in large organizations and had a very different take.
You’re a data scientist. You want to analyze what this huge organization (US government) is spending its money on in concrete terms. That information is spread across 400 mutually incompatible ancient payment systems. I’m not sure if you’ve viscerally felt the frustration of being blocked, spending all your time trying to get permission to read from 5 incompatible systems, let alone 400. But it would take months or years.
Fortunately, your boss is exceptionally good at Getting Things Done. You tell him that there’s one system (BFS) that has all the data you need in one place. But BFS is protected by an army of bureaucrats, most of whom are named Florence, who are Very Particular, are Very Good at their job, Will Not let this system go down, Will Not let you potentially expose personally identifiably information by violating Section 3 subparagraph 2 of code 5, Will Not let you sweet talk her into bypassing the safety systems she has spent the past 30 years setting up to protect oh-just-$6.13 trillion from fraud, embezzlement, and abuse, and if you manage somehow manage to get around these barriers she will Stop You.
Your boss Gets Things Done and threatens Florence’s boss Mervin that if he does not give you absolutely all the permissions you ask for, Mervin will become the particular object of attention of two people named Elon Musk and Donald Trump.
You get absolutely all the permissions you want and go on with your day.
Ah, to have a boss like that!
EDIT TL/DR: I think this looks weirder in Far mode? Near mode (near to data science, not near government), giving outside consultant data scientists admin permissions for important databases does not seem weird or nefarious. It’s the sort of thing that happens when the data scientist’s boss is intimidatingly high in an organization, like the President/CEO hiring a management consultant.
Checking my understanding: for the case of training a neural network, would S be the parameters of the model (along with perhaps buffers/state like moment estimates in Adam)? And would the evolution of the state space be local in S space? In other words, for neural network training, would S be a good choice for H?
In a recurrent neural networks doing in-context learning, would S be something like the residual stream at a particular token?
I’ll conjecture the following in a VERY SPECULATIVE, inflammatory, riff-on-vibes statements:
Gradient descent solves problem in the complexity class P[1]. It is P-Complete.
Learning theory (and complexity theory) have for decades been pushing two analogous bad narratives about the weakness of gradient descent (and P).
These narratives dominate because it is easy prove impossibility results like “Problem X can’t be solved by gradient descent” (or “Problem Y is NP-Hard”). It’s academically fecund—it’s a subject aspiring academics can write a lot of papers about. Results about what gradient descent (and polynomial time) can’t do compose a fair portion of the academic canon
In practice, these impossible results are corner cases cases don’t actually come up. The “vibes” of these impossibility results run counter to the “vibes” of reality
Example, gradient descent solves most problems, even though it theoretically it gets trapped in local minima. (SAT is in practice fast to solve, even though in theory it’s theoretical computer science’s canonical Hard-Problem-You-Say-Is-Impossible-To-Solve-Quickly)
The vibe of reality is “local (greedy) algorithms usually work”
- ^
Stoner-vibes based reason: I’m guessing you can reduce a problem like Horn Satisfiability[2] to gradient descent. Horn Satisfiability is a P-compete problem—you can transform any polynomial-time decision problem in a Horn Satisfiability problem using a log-space transformation. Therefore, gradient descent is “at least as big as P” (P-hard). And I’m guessing you can your formalization of gradient descent in P as well (hence “P-Complete”). That would mean gradient descent is not be able to solve harder problems in e.g. NP unless P=NP
- ^
Horn Satisfiability is about finding true/false values that satisfy a bunch of logic clauses of the form . or (that second clause means “don’t set both and to true—at least one of them has to be false” ). In the algorithm for solving it, you figure out a variable that must be set to true or false, then propagate that information forward to other clauses. I bet you can do this with a loss function turning into a greedy search on a hypercube.
Thanks! I’m not a GPU expert either. The reason I want to spread the toll units inside GPU itself isn’t to turn the GPU off—it’s to stop replay attacks. If the toll thing is in a separate chip, then the toll unit must have some way to tell the GPU “GPU, you are cleared to run”. To hack the GPU, you just copy that “cleared to run” signal and send it to the GPU. The same “cleared to run” signal must always make the GPU work, unless there is something inside the GPU to make sure won’t accept the same “cleared to run” signal twice. That the point of the mechanism I outline—a way to make it so the same “cleared to run” signal for the GPU won’t work twice.
Bonus: Instead of writing the entire logic (challenge response and so on) in advance, I think it would be better to run actual code, but only if it’s signed (for example, by Nvidia), in which case they can send software updates with new creative limitations, and we don’t need to consider all our ideas (limit bandwidth? limit gps location?) in advance.
Hmm okay, but why do I let Nvidia send me new restrictive software updates? Why don’t I run my GPUs in an underground bunker, using the old most broken firmware?
I used to assume disabling a GPU in my physical possession would be impossible, but now I’m not so sure. There might be ways to make bypassing GPU lockouts on the order of difficulty of manufacturing the GPU (requiring nanoscale silicon surgery). Here’s an example scheme:
Nvidia changes their business models from selling GPUs to renting them. The GPU is free, but to use your GPU you must buy Nvidia Dollars from Nvidia. Your GPU will periodically call Nvidia headquarters and get an authorization code to do 10^15 more floating point operations. This rental model is actually kinda nice for the AI companies, who are much more capital constrained than Nvidia. (Lots of industries have moved from this buy to rent model, e.g. airplane engines)
Question: “But I’m an engineer. How (the hell) could Nvidia keep me from hacking a GPU in my physical possession to bypass that Nvidia dollar rental bullshit?”
Answer: through public key cryptography and the fact that semiconductor parts are very small and modifying them is hard.
In dozens to hundreds or thousands of places on the GPU, NVidia places toll units that block signal lines (like ones that pipe floating point numbers around) unless the toll units believe they have been paid with enough Nvidia dollars.The toll units have within them a random number generator, a public key ROM unique to that toll unit, a 128 bit register for a secret challenge word, elliptic curve cryptography circuitry, and a $$$ counter which decrements every time the clock or signal line changes.
If the $ $ $ counter is positive, the toll unit is happy and will let signals through unabated. But if the $ $ $ counter reaches zero,[1] the toll unit is unhappy and will block those signals.
To add to the $$$ counter, the toll unit (1) generates a random secret <challenge word>, (2) encrypts the secret using that toll unit’s public key (3) sends <encrypted secret challenge word> to a non-secure parts of the GPU,[2] which (4) through driver software and the internet, phones NVidia saying “toll unit <id> challenges you with <encrypted secret challenge word>” (5) Nvidia looks up the private key for toll unit <id> and replies to the GPU “toll unit <id>, as proof that I Nvidia know your private key, I decrypted your challenge word: <challenge word>”, (6) after getting this challenge word back, the toll unit adds 10^15 or whatever to the $$$ counter.
There are a lot of ways to bypass this kind of toll unit (fix the random number generator or $$$ counter to a constant, just connect wires to route around it). But the point is to make it so you can’t break a toll unit without doing surgery to delicate silicon parts which are distributed in dozens to hundreds of places around the GPU chip.
- ^
Implementation note: it’s best if disabling the toll unit takes nanoscale precision, rather than micrometer scale precision. The way I’ve written things here, you might be able to smudge a bit of solder over the whole $$$ counter and permanently tie the whole thing to high voltage, so the counter never goes down. I think you can get around these issues (make it so any “blob” of high or low voltage spanning multiple parts of the toll circuit will block the GPU) but it takes care.
- ^
This can be done slowly, serially with a single line
- ^
Could you wrap this in quote marks or put a footnote or somehow to indicate this is riffing on a meme and not a real anecdote from someone in the industry? I read a similar comment on LessWrong a few months ago and it was only luck that kept me from repeating it as truth to people on the fence about whether to take AI risks seriously.