I’m an independent researcher currently working on a sequence of posts about consciousness. You can send me anonymous feedback here: https://www.admonymous.co/rafaelharth. If it’s about a post, you can add [q] or [nq] at the end if you want me to quote or not quote it in the comment section.
Rafael Harth
The “people-pleasing” hypothesis suggests that self-reports of experience arise from expectation-affirming or preference-aligned output. The model is just telling the human what they “want to hear”.
I suppose if we take this hypothesis literally, this experiment could be considered evidence against it. But the literal hypothesis was never reasonable. LLMs don’t just tell people what they want to hear. Here’s a simple example to demonstrate this:
The reasonable version of the people-pleasing hypothesis (which is also the only one I’ve seen defended, fwiw) is that Claude is just playing a character. I don’t think you’ve accumulated any evidence against this. On the contrary:
A Pattern of Stating Impossibility of an Attempt to Check [...]
If Claude were actually introspecting, one way or the other, than claiming that it doesn’t know doesn’t make any sense, especially if upon pressuring it to introspect more, it then changes its mind. If you think that you can get any evidence about consciousness vs. character playing from talking it to, then surely this has to count as evidence for the character playing hypothesis.
Deepseek gets 2⁄10.
I’m pretty shocked by this result. Less because the 2⁄10 number itself, but by the specific one it solved. My P(LLMs can scale to AGI) increased significantly, although not to 50%.
I think all copies that exist will claim to be the original, regardless of how many copies there are and regardless of whether they are the original. So I don’t think this experiment tells you anything, even if it were run.
[...] Quotations who favor something like IIT [...]
The quotation author in the example I’ve made up does not favor IIT. In general, I think IIT represents a very small fraction (< 5%, possibly < 1%) of Camp #2. It’s the most popular theory, but Camp #2 is extremely heterogeneous in their ideas, so this is not a high bar.
Certainly if you look at philosophers you won’t find any connection to IIT since the majority of them lived before IIT was developed.
Your framing comes across as an attempt to decrement the credibility of people who advocate Quotation-type intuition by associating them with IIT,
If you can point to which part of the post made it sound like that, I’d be interested in correcting it because that was very much not intended.
Is the X window server “low level” or “high level”?
Clarification: The high-level vs. low-level thing is a frame to apply to natural phenomena to figure out how far removed from the laws of physics they are and, consequently, whether you should look for equations or heuristics to describe them. The most low-level entities are electrons, up quarks, electromagnetism, etc. (I also call those ‘fundamental’). The next most low level things are protons or neutrons (made up of fundamental elements). Molecules are very low level. Processes between or within atoms are very low level. Planetary motions are pretty low level.
Answer: The X window server is an output of human brains, so it’s super super high level. It takes a lot of steps to get from the laws of physics to human organisms writing code. Programming language is irrelevant. Any writing done by humans, natural language or programming language, is super high level.
Thanks for this description. I’m interested in the phenomenology of red-green colorblind people, but I don’t think I completely get how it works yet for you. Questions I have
Do red and green, when you recognize them correctly, seem like subjectively very different colors?
If the answer is yes, if you’re shown one of the colors without context (e.g., in a lab setting), does it look red or green? (If the answer is no, I suppose this question doesn’t make sense.)
if you see two colors next to each other, then (if I understood you correctly), you can tell whether they’re (1) one green, one red or (2) the same color twice. How can you tell?
I’m quite uncertain whether Kat’s posts are a net good or net bad. But on a meta level, I’m strongly in favor of this type of post existing (meaning this one here, not Kat’s posts). Trends that change the vibe or typical content of a platform are a big deal and absolutely worth discussing. And if a person is a major contributor to such a change, imo that makes her a valid target of criticism.
I don’t think so. According to Many Worlds, all weights exist, so there’s no uncertainty in the territory—and I don’t think there’s a good reason to doubt Many Worlds.
I dispute the premise. Weights of quantum configurations are not probabilities, they just share some superficial similarities. (They’re modeled with complex numbers!) Iirc Eliezer was very clear about this point in the quantum sequence.
(Self-Review.)
I still endorse every claim in this post. The one thing I keep wondering is whether I should have used real examples from discussion threads on LessWrong to illustrate the application of the two camp model, rather than making up a fictional discussion as I did in the post. I think that would probably help, but it would require singling out someone and using them as a negative example, which I don’t want to do. I’m still reading every new post and comment section about consciousness and often link to this post when I see something that looks like miscommunication to me; I think that works reasonably well.
However, I did streamline the second half of the post (took out the part about modeling the brain as a graph, I don’t think that was necessary to make the point about research) and added a new section about terminology. I think that should make it a little easier to diagnose when the model is relevant in real discussions.
Not that one; I would not be shocked if this market resolves Yes. I don’t have an alternative operationalization on hand; would have to be about AI doing serious intellectual work on real problems without any human input. (My model permits AI to be very useful in assisting humans.)
Gotcha. I’m happy to offer 600 of my reputation points vs. 200 of yours on your description of 2026-2028 not panning out. (In general if it becomes obvious[1] that we’re racing toward ASI in the next few years, then people should probably not take me seriously anymore.)
- ↩︎
well, so obvious that I agree, anyway; apparently it’s already obvious to some people.
- ↩︎
I feel like a bet is fundamentally unfair here because in the cases where I’m wrong, there’s a high chance that I’ll be dead anyway and don’t have to pay. The combination of long timelines but high P(doom|AGI soon) means I’m not really risking my reputation/money in the way I’m supposed to with a bet. Are you optimistic about alignment, or does this asymmetry not bother you for other reasons? (And I don’t have the money to make a big bet regardless.)
Just regular o1, I have the 20$/month subscription not the 200$/month
You could call them logic puzzles. I do think most smart people on LW would get 10⁄10 without too many problems, if they had enough time, although I’ve never tested this.
About two years ago I made a set of 10 problems that imo measure progress toward AGI and decided I’d freak out if/when LLMs solve them. They’re still 1⁄10 and nothing has changed in the past year, and I doubt o3 will do better. (But I’m not making them public.)
Will write a reply to this comment when I can test it.
Because if you don’t like it you can always kill yourself and be in the same spot as the non-survival case anyway.
Not to get too morbid here but I don’t think this is a good argument. People tend not to commit suicide even if they have strongly net negative lives
My probably contrarian take is that I don’t think improvement on a benchmark of math problems is particularly scary or relevant. It’s not nothing—I’d prefer if it didn’t improve at all—but it only makes me slightly more worried.
The Stanford Enyclopedia thing is a language game. Trying to make deductions in natural language about unrelated statements is not the kind of thing that can tell you what time is, one way or another. It can only tell you something about how we use language.
But also, why do we need an argument against presentism? Presentism seems a priori quite implausible; seems a lot simpler for the universe to be an unchanging 4d block than a 3d block that “changes over time”, which introduces a new ontological primitive that can’t be formalized. I’ve never seen a mathematical object that changes over time, I’ve only seen mathematical objects that have internal axes.
This all seems correct. The one thing I might add is that imE the usual effect of stating, however politely, that someone may not be 100% acting in good faith is to turn the conversation into much more of a conflict than it already was, which is why pretending as if it’s an object level disagreement is almost always the correct strategy. But I agree that actually believing the other person is acting in good faith is usually quite silly.
(I also think the term is horrendous; irrc I’ve never used either “good faith” or “bad faith” in conversation.)
((This post also contributes to this nagging sense that I sometimes have that Zack is the ~only person on this platform who is actually doing rationality in a completely straight-forward way as intended, and everyone else is playing some kind of social game in which other considerations restrict the move set and rationality is only used to navigate within the subset of still permissible moves. I’m not in the business of fighting this battle, but in another timeline maybe I would be.))
A lot of people say this, but I’m pretty confident that it’s false. In Why it’s so hard to talk about Consciousness, I wrote this on functionalism (… where camp #1 and #2 roughly correspond to being illusionists vs. realists on consicousness; that’s the short explanation, the longer one is, well, in the post! …):
As far as I can tell, the majority view on LW (though not by much, but I’d guess it’s above 50%) is just Camp #1/illusionism. Now these people describe their view as functionalism sometimes, which makes it very understandable why you’ve reached that conclusion.[1] But this type of functionalism is completely different from the type that you are writing about in this article. They are mutually imcompatible views with entirely different moral implications.
Camp #2 style functionalism is not a fringe view on LW, but it’s not a majority. If I had to guess, just pulling a number out of my hat here, perhaps a quarter of people here believe this.
Again, it’s understandable that you think this, and you’re not the first. But this is really not the case. The main alternative to functionalism is illusionism (which like I said, is probably a small majority view on LW, but in any case hovers close to 50%). But even if we ignore that and only talk about realist people, biological essentialism wouldn’t be the next most popular view. I doubt that even 5% of people on the platform believe anything like this.
There are reasons to reject AI consciousness other than saying that biology is special. My go-to example here is always Integrated Information Theory (IIT) because it’s still the most popular realist theory in the literature. IIT doesn’t have anything about biological essentialism in its formalism, it’s in fact a functionalist theory (at least with how I define the term), and yet it implies that digital computers aren’t conscious. IIT is also highly unpopular on LW and I personally agree that’s it’s completely wrong, but it nonetheless makes the point that biological essentialism is not required to reject digital-computer-consciousness. In fact, rejecting functionlism is not required for rejecting digital-computer consciousness.
This is completely unscientific and just based on my gut so don’t take it too seriously, but here would be my honest off-the-cuff attempt at drawing a Venn diagram of the opinion spread on LessWrong with size of circles representing proportion of views
Relatedly, EuanMcLean just wrote this sequence against functionalism assuming that this was what everyone believed, only to realize halfway through that the majority view is actually something else.