And for a visceral description of a kind of bullying that’s plainly bad, read the beginning of Worm: https://parahumans.wordpress.com/2011/06/11/1-1/
justinpombrio
I double-downvoted this post (my first ever double-downvote) because it crosses a red line by advocating for verbal and physical abuse of a specific group of people.
Alexej: this post gives me the impression that you started with a lot of hate and went looking for justifications for it. But if you have some real desire for truth seeking, here are some counterarguments:
Yeah, I think “computational irreducibility” is an intuitive term pointing to something which is true, important, and not-obvious-to-the-general-public. I would consider using that term even if it had been invented by Hitler and then plagiarized by Stalin :-P
Agreed!
OK, I no longer claim that. I still think it might be true
No, Rice’s theorem is really not applicable. I have a PhD in programming languages, and feel confident saying so.
Let’s be specific. Say there’s a mouse named Crumbs (this is a real mouse), and we want to predict whether Crumbs will walk into the humane mouse trap (they did). What does Rice’s theorem say about this?
There are a couple ways we could try to apply it:
-
We could instantiate the semantic property P with “the program will output the string ‘walks into trap’”. Then Rice’s theorem says that we can’t write a program Q that takes as input a program R and says whether R outputs ‘walks into trap’. For any Q we write, there will exist a program R that defeats it. However, this does not say anything about what the program R looks like! If R is simply
print('walks into trap')
, then it’s pretty easy to tell! And if R is the Crumbs algorithm running in Crumb’s brain, Rice’s theorem likewise does not claim that we’re unable tell if it outputs ‘walks into trap’. All the theorem says is that there exists a program R that Q fails on. The proof of the theorem is constructive, and does give a specific program as a counter-example, but this program is unlikely to look anything like Crumb’s algorithm. The counter-example program R runs Q on P and then does the opposite of it, while Crumbs does not know what we’ve written for Q and is probably not very good at emulating Python. -
We could try to instantiate the counter-example program R with Crumb’s algorithm. But that’s illegal! It’s under an existential, not a forall. We don’t get to pick R, the theorem does.
Actually, even this kind of misses the point. When we’re talking about Crumb’s behavior, we aren’t asking what Crumbs would do in a hypothetical universe in which they lived forever, which is the world that Rice’s theorem is talking about. We mean to ask what Crumbs (and other creatures) will do today (or perhaps this year). And that’s decidable! You can easily write a program Q that takes a program R and checks if R outputs ‘walks into trap’ within the first N steps! Rice’s theorem doesn’t stand in your way even a little bit, if all you care about is behavior after a fixed finite amount of time!
Here’s what Rice’s theorem does say. It says that if you want to know whether an arbitrary critter will walk into a trap after an arbitrarily long time, including long after the heat death of the universe, and you think you have a program that can check that for any creature in finite time, then you’re wrong. But creatures aren’t arbitrary (they don’t look like the very specific, very scattered counterexample programs that are constructed in the proof of Rice’s theorem), and the duration of time we care about is finite.
If you care to have a theorem, you should try looking at Algorithmic Information Theory. It’s able to make statements about “most programs” (or at least “most bitstrings”), in a way that Rice’s theorem cannot. Though I don’t think it’s important you have a theorem for this, and I’m not even sure that there is one.
-
Rice’s theorem (a.k.a. computational irreducibility) says that for most algorithms, the only way to figure out what they’ll do with certainty is to run them step-by-step and see.
Rice’s theorem says nothing of the sort. Rice’s theorem says:
For every semantic property P, For every program Q that purports to check if an arbitrary program has property P, There exists a program R such that Q(R) is incorrect: Either P holds of R but Q(R) returns false, or P does not hold of R but Q(R) returns true
Notice that the tricky program
R
that’s causing your property-checkerQ
to fail is under an existential. This isn’t saying anything about most programs, and it isn’t even saying that there’s a subset of programs that are tricky to analyze. It’s saying that after you fix a property P and a property checker Q, there exists a program R that’s tricky for Q.There might be a more relevant theorem from algorithmic information theory, I’m not sure.
Going back to the statement:
for most algorithms, the only way to figure out what they’ll do with certainty is to run them step-by-step and see
This is only sort of true? Optimizing compilers rewrite programs into equivalent programs before they’re run, and can be extremely clever about the sorts of rewrites that they do, including reducing away parts of the program without needing to run them first. We tend to think of the compiled output of a program as “the same” program, but that’s only because compilers are reliable at producing equivalent code, not because the equivalence is straightforward.
a.k.a. computational irreducibility
Rice’s theorem is not “also known as” computational irreducibility.
By the way, be wary of claims from Wolfram. He was a serious physicist, but is a bit of an egomaniac these days. He frequently takes credit for others’ ideas (I’ve seen multiple clear examples) and exaggerates the importance of the things he’s done (he’s written more than one obituary for someone famous, where he talks more about his own accomplishments than the deceased’s). I have a copy of A New Kind of Science, and I’m not sure there’s much of value in it. I don’t think this is a hot take.
for most algorithms, the only way to figure out what they’ll do with certainty is to run them step-by-step and see
I think the thing you mean to say is that for most of the sorts of complex algorithms you see in the wild, such as the algorithms run by brains, there’s no magic shortcut to determine the algorithm’s output that avoids having to run any of the algorithm’s steps. I agree!
I think we’re in agreement on everything.
Excellent. Sorry for thinking you were saying something you weren’t!
still not have an answer to whether it’s spinning clockwise or counterclockwise
More simply (and quite possibly true), Nobuyuki Kayahara rendered it spinning either clockwise or counterclockwise, lost the source, and has since forgotten which way it was going.
I like “veridical” mildly better for a few reasons, more about pedagogy than anything else.
That’s a fine set of reasons! I’ll continue to use “accurate” in my head, as I already fully feel that the accuracy of a map depends on which territory you’re choosing for it to represent. (And a map can accurately represent multiple territories, as happens a lot with mathematical maps.)
Another reason is I’m trying hard to push for a two-argument usage
Do you see the Spinning Dancer going clockwise? Sorry, that’s not a veridical model of the real-world thing you’re looking at.
My point is that:
The 3D spinning dancer in your intuitive model is a veridical map of something 3D. I’m confident that the 3D thing is a 3D graphical model which was silhouetted after the fact (see below), but even if it was drawn by hand, the 3D thing was a stunningly accurate 3D model of a dancer in the artist’s mind.
That 3D thing is the obvious territory for the map to represent.
It feels disingenuous to say “sorry, that’s not a veridical map of [something other than the territory map obviously represents]”.
So I guess it’s mostly the word “sorry” that I disagree with!
By “the real-world thing you’re looking at”, you mean the image on your monitor, right? There are some other ways one’s intuitive model doesn’t veridically represent that such as the fact that, unlike other objects in the room, it’s flashing off and on at 60 times per second, has a weirdly spiky color spectrum, and (assuming an LCD screen) consists entirely of circularly polarized light.
It was made by a graphic artist. I’m not sure their exact technique, but it seems at least plausible to me that they never actually created a 3D model.
This is a side track, but I’m very confident a 3D model was involved. Plenty of people can draw a photorealistic silhouette. The thing I think is difficult is drawing 100+ silhouettes that match each other perfectly and have consistent rotation. (The GIF only has 34 frames, but the original video is much smoother.) Even if technically possible, it would be much easier to make one 3D model and have the computer rotate it. Annnd, if you look at Nobuyuki Kayahara’s website, his talent seems more on the side of mathematics and visualization than photo-realistic drawing, so my guess is that he used an existing 3D model for the dancer (possibly hand-posed).
This is fantastic! I’ve tried reasoning along these directions, but never made any progress.
A couple comments/questions:
Why “veridical” instead of simply “accurate”? To me, the accuracy of a map is how well it corresponds to the territory it’s trying to map. I’ve been replacing “veridical” with “accurate” while reading, and it’s seemed appropriate everywhere.
Do you see the Spinning Dancer going clockwise? Sorry, that’s not a veridical model of the real-world thing you’re looking at. [...] after all, nothing in the real world of atoms is rotating in 3D.
I think you’re being unfair to our intuitive models here.
The GIF isn’t rotating, but the 3D model that produced the GIF was rotating, and that’s the thing our intuitive models are modeling. So exactly one of [spinning clockwise] and [spinning counterclockwise] is veridical, depending on whether the graphic artist had the dancer rotating clockwise or counterclockwise before turning her into a silhouette. (Though whether it happens to be veridical is entirely coincidental, as the silhouette is identical to the one that would have been produced had the dancer been spinning in the opposite direction.)
If you look at the photograph of Abe Lincoln from Feb 27, 1860, you see a 3D scene with a person in it. This is veridical! There was an actual room with an actual person in it, who dressed that way and touched that book. The map’s territory is 164 years older than the map, but so what.
(My favorite example of an intuitive model being wildly incorrect is Feynman’s story of learning to identify kinds of galaxies from images on slides. He asks his mentor “what kind of galaxy is this one, I can’t identify it”, and his mentor says it’s a smudge on the slide.)
Very curious what part of this people think is wrong.
Here’s a simple argument that simulating universes based on Turing machine number can give manipulated results.
Say we lived in a universe much like this one, except that:
The universe is deterministic
It’s simulated by a very short Turing machine
It has a center, and
That center is actually nearby! We can send a rocket to it.
So we send a rocket to the center of the universe and leave a plaque saying “the answer to all your questions is Spongebob”. Now any aliens in other universes that simulate our universe and ask “what’s in the center of that universe at time step 10^1000?” will see the plaque, search elsewhere in our universe for the reference, and watch Spongebob. We’ve managed to get aliens outside our universe to watch Spongebob.
I feel like it would be helpful to speak precisely about the universal prior. Here’s my understanding.
It’s a partial probability distribution over bit strings. It gives a non-zero probability to every bit string, but these probabilities add up to strictly less than 1. It’s defined as follows:
That is, describe Turing machines by a binary
code
, and assign each one a probability based on the length of its code, such that those probabilities add up to exactly 1. Then magically run all Turing machines “to completion”. For those that halt leaving abitstring
on their tape, attribute the probability of that Turing machine to thatbitstring
. Now we have a probability distribution overbitstring
s, though the probabilities add up to less than one because not all of the Turing machines halted.You cannot compute this probability distribution, but you can compute lower bounds on the probabilities of its bitstrings. (The Nth lower bound is the probability distribution you get from running the first N TMs for N steps.)
Call a TM that halts poisoned if its output is determined as follows:
The TM simulates a complex universe full of intelligent life, then selects a tiny portion of that universe to output, erasing the rest.
That intelligent life realizes this might happen, and writes messages in many places that could plausibly be selected.
It works, and the TM’s output is determined by what the intelligent life it simulated chose to leave behind.
If we approximate the universal prior, the probability contribution of poisoned TMs will be precisely zero, because we don’t have nearly enough compute to simulate a poisoned TM until it halts. However, if there’s an outer universe with dramatically more compute available, and it’s approximating the universal prior using enough computational power to actually run the poisoned TMs, they’ll effect the probability distribution of the bitstrings, making bitstrings with the messages they choose to leave behind more likely.
So I think Paul’s right, actually (not what I expected when I started writing this). If you approximate the UP well enough, the distribution you see will have been manipulated.
The feedback is from Lean, which can validate attempted formal proofs.
This is one of the bigger reasons why I really don’t like RLHF—because inevitably you’re going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.
What would these humans do differently, if they knew about philosophy? Concretely, could you give a few examples of “Here’s a completion that should be positively reinforced because it demonstrates correct understanding of language, and here’s a completion of the same text that should be negatively reinforced because it demonstrates incorrect understanding of language”? (Bear in mind that the prompts shouldn’t be about language, as that would probably just teach the model what to say when it’s discussing language in particular.)
It’s impossible for the Utility function of the Ai to be amenable to humans if it doesn’t use language the same way
What makes you think that humans all use language the same way, if there’s more than one plausible option? People are extremely diverse in their perspectives.
As you’re probably aware, the fine tuning is done by humans rating the output of the LLM. I believe this was done by paid workers, who were probably given a list of criteria like that it should be helpful and friendly and definitely not use slurs, and who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated “incorrect understanding of language”?
I have (tried to) read Wittgenstein, but don’t know what outputs would or would not constitute an “incorrect understanding of language”. Could you give some examples? The question is whether the tuners would rate those examples positively or negatively, and whether examples like those would arise during five tuning.
You say “AI”, though I’m assuming you’re specifically asking about LLMs (large language models) like GPT, Llama, Claude, etc.
LLMs aren’t programmed, they’re trained. None of the code written by the developers of LLMs has anything to do with concepts, sentences, dictionary definitions, or different languages (e.g. English vs. Spanish). The code only deals with general machine learning, and streams of tokens (which are roughly letters, but encoded a bit differently).
The LLM is trained on huge corpuses of text. The LLM learns concepts, and what a sentence is, and the difference between English and Spanish, purely from the text. None of that is explicitly programmed into it; the programmers have no say in the matter.
As far as how it comes to understands language, and how that related to Wittgenstein’s thoughts on language, we don’t know much at all. You can ask it. And we’ve done some experiments like that recent one with the LLM that was made to think it was the Golden Gate Bridge, which you probably heard about. But that’s about it; we don’t really know how LLMs “think” internally. (We know what’s going on at a low-level, but not at a high-level.)
Exactly.
However, If I already know that I have the disease, and I am not altruistic to my copies, playing such game is a wining move to me?
Correct. But if you don’t have the disease, you’re probably also not altruistic to your copies, so you would choose not to participate. Leaving the copies of you with the disease isolated and unable to “trade”.
Not “almost no gain”. My point is that it can be quantified, and it is exactly zero expected gain under all circumstances. You can verify this by drawing out any finite set of worlds containing “mediators”, and computing the expected number of disease losses minus disease gains as:
num(people with disease)*P(person with disease meditates)*P(person with disease who meditates loses the disease) - num(people without disease)*P(person without disease meditates)*P(person without disease who meditates gains the disease)
My point is that this number is always exactly zero. If you doubt this, you should try to construct a counterexample with a finite number of worlds.
My point still stands. Try drawing out a specific finite set of worlds and computing the probabilities. (I don’t think anything changes when the set of worlds becomes infinite, but the math becomes much harder to get right.)
There is a 0.001 chance that someone who did not have the disease will get it. But he can repeat the procedure.
No, that doesn’t work. It invalidates the implicit assumption you’re making that the probability that a person chooses to “forget” is independent of whether they have the disease. Ultimately, you’re “mixing” the various people who “forgot”, and a “mixing” procedure can’t change the proportion of people who have the disease.
When you take this into account, the conclusion becomes rather mundane. Some copies of you can gain the disease, while a proportional number of copies can lose it. (You might think you could get some respite by repeatedly trading off “who” has the disease, but the forgetting procedure ensures that no copy ever feels respite, as that would require remembering having the disease.)
I think formalizing it in full will be a pretty nontrivial undertaking, but formalizing isolated components feels tractable, and is in fact where I’m currently directing a lot of my time and funding.
Great. Yes, I think that’s the thing to do. Start small! I (and presumably others) would update a lot from a new piece of actual formal mathematics from Chris’s work. Even if that work was, by itself, not very impressive.
(I would also want to check that that math had something to do with his earlier writings.)
My current understanding is that he believes that his current written work should be sufficient for modern mathematicians and scientists to understand his core ideas
Uh oh. The “formal grammar” that I checked used formal language, but was not even close to giving a precise definition. So Chris either (i) doesn’t realize that you need to be precise to communicate with mathematicians, or (ii) doesn’t understand how to be precise.
Please be prepared for the possibility that Chris is very smart and creative, and that he’s had some interesting ideas (e.g. Syndiffeonesis), but that his framework is more of a interlocked collection of ideas than anything mathematical (despite using terms from mathematics). Litany of Tarsky and all that.
That’s not what that paper says. It says that IQ over 110 or so (quite above median) correlates less strongly (but still positively) with creativity. In Chinese children, age 11-13.