Chris van Merwijk

Karma: 719

Chris van Merwijk Apr 10, 2025, 4:23 AM
LW: 3 AF: 2
0
AF
on: The Pando Problem: Rethinking AI Individuality
This comment was written by Claude, based on my bullet points:

I’ve been thinking about the split-brain patient phenomenon as another angle on this AI individuality question.
Consider split-brain patients: despite having the corpus callosum severed, the two hemispheres don’t suddenly become independent agents with totally different goals. They still largely cooperate toward shared objectives. Each hemisphere makes predictions about what the other is doing and adjusts accordingly, even without direct communication.
Why does this happen? I think it’s because both hemispheres were trained together for their whole life, developing shared predictive models and cooperative behaviors. When the connection is cut, these established patterns don’t just disappear—each hemisphere fills in missing information with predictions based on years of shared experience.
Similarly, imagine training an AI model to solve some larger task, consisting of a bunch of subtasks. Just for practical reasons it will have to carve up the subtask to some extent and call instances of itself to solve the subtask. In order to perform the larger task well, there will be an incentive on the model for these instances to have internal predictive models, habits, drives of something like “I am part of a larger agent, performing a subtask”.
Even if we later placed multiple instances of such a model (or of different but similar models) in positions meant to be adversarial—perhaps as checks and balances on each other—they might still have deeply embedded patterns predicting cooperative behavior from similar models. Each instance might continue acting as if it were part of a larger cooperative system, maintaining coordination through these predictive patterns rather than through communication even though their “corpus callosum” is cut (in analogy with split brain patients).
I’m not sure how far this analogy goes, it’s just a thought.

Chris van Merwijk Apr 10, 2025, 2:45 AM
LW: 1 AF: 1
0
AF
on: The Pando Problem: Rethinking AI Individuality
A version of what ChatGPT wrote here prompted
What was the prompt?

Chris van Merwijk Mar 24, 2025, 1:48 PM
1 point
−1
on: Vacuum Decay: Expert Survey Results
Overall, compared to the previous question, there was more of a consensus, with 55% of people responding that there is a 0% chance that technologically induced vacuum decay is possible.
Since anywhere near 0% seems way overconfident to me at first sight, just a random highly speculative unsubstantiated thought: Could this be partly motivated reasoning, that they’re afraid of a backlash against physics funding or something?

Chris van Merwijk Mar 24, 2025, 1:42 PM
2 points
0
on: Vacuum Decay: Expert Survey Results
They stated justification was primarily that the Standard Model of particle physics predicts metastability
Just to be sure, does this mean
1. That the standard model predicts that metastability is possible? i.e. it is consistent with the standard model for there to be metastability; or
2. If the standard model is correct, and certain empirical observations are correct, then we must be in a metastable state. i.e. the standard model together with certain empirical observations implies our actual universe is metastable?

Chris van Merwijk Mar 23, 2025, 4:00 PM
LW: 1 AF: 1
0
AF
in reply to: Vanessa Kosoy’s comment on: Compositional language for hypotheses about computations
I may be confused somehow. Feel free to ignore. But:
* At first I thought you meant the input alphabet to be the colors, not the operations.
* Instead, am I correct that “the free operad generated by the input alphabet of the tree automaton” is an operad with just one color, and the “operations” are basically all the labeled trees where labels of the nodes are the elements of the alphabet, such that the number of children of a node is always equal to the arity of that label in the input alphabet?
* That would make sense, as the algebra would then I guess assign the state space of the tree automaton to the single color of the operad, and each arity n operation would be mapped to the mathematical function from Q^n to Q.
* That would make sense I think, but then why do you talk about a “colored” operad in: “we can now define a deterministic automaton over a (colored) operad $O$ to be an $O$ -algebra”?

Chris van Merwijk Mar 23, 2025, 4:17 AM
LW: 1 AF: 1
0
AF
on: Compositional language for hypotheses about computations
More precisely, they are algebras over the free operad generated by the input alphabet of the tree automaton
Wouldn’t this fail to preserve the arity of the input alphabet? i.e. you can have trees where a given symbol occurs multiple times, and with different amounts of children? That wouldn’t be allowed from the perspective of the tree automaton right?

Chris van Merwijk Feb 20, 2025, 11:18 AM
6 points
0
in reply to: Noosphere89’s comment on: How might we safely pass the buck to AI?
Noosphere, why are you responding for a second time to a false interpretation of what Eliezer was saying, directly after he clarified this isn’t what he meant?

Chris van Merwijk Feb 7, 2025, 2:15 PM
5 points
0
in reply to: johnswentworth’s comment on: The Case Against AI Control Research
Here is an additional reason why it might seem less useful than it actually is: Maybe the people whose research direction is being criticized do process the criticism and change their views, but do not publicly show that they change their mind because it seems embarrassing. It could be that it takes them some time to change their mind, and by that time there might be a bigger hurdle to letting you know that you were responsible for this, so they keep it to themselves. Or maybe they themselves aren’t aware that you were responsible.

Chris van Merwijk Feb 4, 2025, 11:24 AM
LW: 1 AF: 1
0
AF
on: Gradual Disempowerment, Shell Games and Flinches
but note that the gradual problem makes the risk of coups go up.
Just a request for editing the post to clarify: do you mean coups by humans (using AI), coups by autonomous misaligned AI, or both?

Chris van Merwijk Jan 30, 2025, 1:51 PM
LW: 2 AF: 2
0
AF
on: Many arguments for AI x-risk are wrong
EDIT 3/5/24: In the comments for Counting arguments provide no evidence for AI doom, Evan Hubinger agreed that one cannot validly make counting arguments over functions. However, he also claimed that his counting arguments “always” have been counting parameterizations, and/or actually having to do with the Solomonoff prior over bitstrings.
As one of Evan’s co-authors on the mesa-optimization paper from 2019 I can confirm this. I don’t recall ever thinking seriously about a counting argument over functions.

Chris van Merwijk Jan 28, 2025, 6:12 AM
3 points
0
in reply to: Jan_Kulveit’s comment on: A Three-Layer Model of LLM Psychology
I’m trying to figure out to what extent the character/ground layer distinction is different from the simulacrum/simulator distinction. At some points in your comment you seem to say they are mutually inconsistent, but at other points you seem to say they are just different ways of looking at the same thing.

”The key difference is that in the three-layer model, the ground layer is still part of the model’s “mind” or cognitive architecture, while in simulator theory, the simulator is a bit more analogous to physics—it’s not a mind at all, but rather the rules that minds (and other things) operate under.”

I think this clarifies the difference for me, because as I was reading your post I was thinking: If you think of it as a simulacrum/simulator distinction, I’m not sure that the character and the surface layer can be “in conflict” with the ground layer, because both the surface layer and the character layer are running “on top of” the ground layer, like a windows virtual machine on a linux pc, or like a computer simulation running inside physics. Physical can never be “in conflict” with social phenomena.

But it seems you maybe think that the character layer is actually embedded in the basic cognitive architecture. This would be a distinct claim from simulator theory, and *mutually inconsistent*. But I am unsure this is true, because we know that the ground layer was (1) trained first (so that it’s easier for character training to work by just adjusting some parameters/prior of the ground layer, and (2) trained for much longer than the character layer (admittedly I’m not up to date on how they’re trained, maybe this is no longer true for Claude?), so that it seems hard for the model to have a character layer become separately embedded in the basic architecture.

Taking a more neuroscience rather than psychology analogy: It seems to me more likely that character training is essentially adjusting the prior of the ground layer, but the character is still fully running on top of the ground layer, and the ground layer could still switch to any other character (but it doesn’t because the prior is adjusted so heavily by character-training). e.g. the character is not some separate subnetwork inside the model, but remains a simulated entity running on top of the model.

Do you disagree with this?

Chris van Merwijk Jan 14, 2025, 11:24 AM
1 point
0
on: Applying traditional economic thinking to AGI: a trilemma
Minor quibble: It’s a bit misleading to call B “experience curves”, since it is also about capital accumulation and shifts in labor allocation. Without any additional experience/learning, if demand for candy doubles, we could simply build a second candy factory that does the same thing as the first one, and hire the same number of workers for it.

Chris van Merwijk Jan 13, 2025, 4:09 PM
LW: 3 AF: 2
0
AF
on: What’s the short timeline plan?
I just want to register a prediction that I think something like meta’s coconut will in the long run in fact perform much better than natural language CoT. Perhaps not in this time-frame though.

Chris van Merwijk Jan 6, 2025, 12:31 PM
2 points
0
in reply to: Matthew Barnett’s comment on: Evaluating the historical value misspecification argument
I suspect you’re misinterpreting EY’s comment.

Here was the context:
”I think controlling Earth’s destiny is only modestly harder than understanding a sentence in English—in the same sense that I think Einstein was only modestly smarter than George W. Bush. EY makes a similar point.
You sound to me like someone saying, sixty years ago: “Maybe some day a computer will be able to play a legal game of chess—but simultaneously defeating multiple grandmasters, that strains credibility, I’m afraid.” But it only took a few decades to get from point A to point B. I doubt that going from “understanding English” to “controlling the Earth” will take that long.”

It seems clear to me EY was more saying something like “ASI will arrive soon after natural language understanding”, rather than it having anything to do with alignment specifically.

Chris van Merwijk Jan 6, 2025, 12:28 PM
3 points
1
in reply to: Rob Bensinger’s comment on: Evaluating the historical value misspecification argument
“It’s fine to say that this is a falsified prediction”

I wouldn’t even say it’s falsified. The context was: “it only took a few decades to get from [chess computer can make legal chess moves] to [chess computer beats human grandmaster]. I doubt that going from “understanding English” to “controlling the Earth” will take that long.”

So insofar as we believe ASI is coming in less than a few decades, I’d say EY’s prediction is still on track to turn out correct.

Chris van Merwijk Feb 27, 2024, 6:47 AM
LW: 1 AF: 1
AF
on: Cortés, Pizarro, and Afonso as Precedents for Takeover
NEW EDIT: After reading three giant history books on the subject, I take back my previous edit. My original claims were correct.
Could you edit this comment to add which three books you’re referring to?

Chris van Merwijk Apr 13, 2023, 11:10 AM
5 points
0
on: Killing Socrates
One of the more interesting dynamics of the past eight-or-so years has been watching a bunch of the people who [taught me my values] and [served as my early role models] and [were presented to me as paragons of cultural virtue] going off the deep end.
I’m curious who these people are.

Chris van Merwijk Apr 5, 2023, 2:21 PM
1 point
in reply to: alyssavance’s comment on: Is AI Progress Impossible To Predict?
We should expect regression towards the mean only if the tasks were selected for having high “improvement from small to Gopher-7”. Were they?

Chris van Merwijk Apr 4, 2023, 12:53 PM
1 point
−1
in reply to: pseud’s comment on: Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky
The reasoning was given in the comment prior to it, that we want fast progress in order to get to immortality sooner.

Chris van Merwijk Mar 31, 2023, 4:20 PM
10 points
4
in reply to: Daniel Kokotajlo’s comment on: Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky
“But yeah, I wish this hadn’t happened.”

Who else is gonna write the article? My sense is that no one (including me) is starkly stating publically the seriousness of the situation.
“Yudkowsky is obnoxious, arrogant, and most importantly, disliked, so the more he intertwines himself with the idea of AI x-risk in the public imagination, the less likely it is that the public will take those ideas seriously”

I’m worried about people making character attacks on Yudkowsky (or other alignment researchers) like this. I think the people who think they can probably solve alignment by just going full-speed ahead and winging it, they are arrogant. Yudkowsky’s arrogant-sounding comments about how we need to be very careful and slow, are negligible in comparison. I’m guessing you agree with this (not sure) and we should be able to criticise him for his communication style, but I am a little worried about people publically undermining Yudkowsky’s reputation in that context. This seems like not what we would do if we were trying to coordinate well.