Software engineering, parenting, cognition, meditation, other
Linkedin, Facebook, Admonymous (anonymous feedback)
Gunnar_Zarncke
Unexpected Conscious Entities
The last thing may result from a hard-coded genetic heuristic learning rate. We can’t update fully Bayesian and a learning rate is an approximation given computational constraints. There is an optimal learning rate, but it depends on context, such as the trust in prior information, esp. the volatility of the environment. And thus it may happen that your genetic prior for your learning rate may not match the dynamics of your current environment. I guess our modern environment changes faster than the ancestral environment and most people update to slowly on new information. Updating much faster is probably adaptive. I also have that.
Hm, yes, seems plausible. Very inconsistent though. And they should remove the second paragraph, which seems to imply that it is still possible to apply anyway.
Can somebody get me in touch with somebody from the Center for AI Safety (safe.ai)? Their page for applying for compute resources seems broken. I have used their contact form to report the issue on April 7th, but received no reply.
This is how the application page looks like at least since then (linked from their Compute Cluster page):
As you can see, there is no form field to enter and only a lone “Absenden” button, which is German and means “submit” (which is strange because my system and browser are set to English). If I click that button, I get this message:
Looks like this form is empty. Try filling it out before submitting.
My guess is that there is a problem with their Airtable integration.
If you wonder what I’m trying to apply for:
The project Reducing LLM deception at scale with self-other overlap fine-tuning (SOO) I am working with at AE Studio is in urgent need for more compute to run SOO experiments with Mistral Large 2 (or even larger).
The aintelope project (sorry, not many updates recently) needs compute for running more evaluations of our benchmark From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks and we wanted to apply at CAIS too (having run out of funding, more on that later).
It is a great idea to test a hypothesis experimentally. I did your experiment too, and the result is:
hours in a day: when I saw your post it was 1 AM in the morning, estimating 2 hours in a day. ❌
months in a year: I’m born in Juni, so twelve months. ✅ Though, we could also have taken the current month as a base and then it would have been 8 months.
Earth size: I don’t know latitude but probably like yours—I’m in Hamburg. ✅ But I do know that the longitude here is pretty exactly 10. If I go by that the circumference should be 20 - instead of 360. ❌
human life expectancy: I’m 51. ✅
Several experiments show that I can extract useful information just by treating myself as a random sample, and thus a view that I can’t use myself as a random sample is false.
I think there are some problems here. I think be more accurate claim would be:
You can do experiments that extract useful information about whether you can treat yourself as a random sample (i.e., a representative or “typical” sample) by comparing the result of the experiment to the baserate.
Or at the very least, based on my experiments, for me, the claim seems to be false. I’m not representative enough. But I can’t know that without comparing my results to a baserate. I can’t use the observations to establish a baserate or make estimations such as expected lifetime.
From a statistical perspective, a random sample means:
Drawn randomly from the population of interest—but you are not randomly selected.
Having an equal and independent chance of being selected—but you are subject to bias.
The sample size is sufficient to capture variance—but you are n=1, thus variance is undefined.
You may not be representative in any observable or unobservable dimension for your purpose. And to know if you are representative, you have to look at other samples and then you are back so some kind of baserate.
Outside view, context, and details. I’d ask
How big is the fish?
How much did the fish cost?
How big is the aquarium?
What is the natural habitat of the fish and what kind of species is it?
I don’t understand how the rewarding works. Can you explain again?
How far away is the TV? Is it in the water?
How long did it take to train?
How often was it wrong on the politicians?
Have you shown anybody else?
Is this a person I know or somebody I know knows?
See also Proper posture for mental arts, which also mentions the Unbendable Arm and explains how it works biomechanically, namely via the latissimus dorsi.
We use Unsloth for fine-tuning 8B models because it offers support for single GPU training with high throughput. For 70B models, we use the Together AI fine-tuning API. Axolotl is used for 405B models due to its support for Fully Sharded Data Parallel (FSDP) training. Together AI does not support 405B fine-tuning, and Unsloth lacks distributed training capabilities, while the Pro version is in development.
Thank you for your details on your setup. Can you drop hints how long each training run took on these systems?
I thought it would be good to have some examples where you could have a useful type signature, and I asked ChatGPT. I think these are too wishy-washy, but together with the given explanation, they seem to make sense.
Would you say that this level of “having a type signature in mind” would count?
ChatGPT 4o suggesting examples
1. Prediction vs Explanation
Explanation might be:
Phenomenon → (Theory, Mechanism)
Prediction might be:
Features → Label
These have different type signatures. A model that predicts well might not explain. People often conflate these roles. Type signatures remind us: different input-output relationships.
Moral Judgments vs Policy Proposals
Moral judgment (deontic):
Action → Good/Bad
Policy proposal (instrumental):
(State × Action) → (New State × Externalities)
People often act as if “this action is wrong” implies “we must ban it,” but that only follows if the second signature supports the first. You can disagree about outcomes while agreeing on morals, or vice versa.
Interpersonal Feedback
Effective feedback:
(Action × Impact) → Updated Mental Model
People often act as if the type signature is just
Action → Judgment
. That’s blame, not feedback. This reframing can help structure nonviolent communication.Creativity vs Optimization
Optimization:
(Goal × Constraints) → Best Action
Creativity:
Void → (Goal × Constraints × Ideas)
The creative act generates the very goal and constraints. Treating creative design like optimization prematurely can collapse valuable search space.
7. Education
Lecture model:
Speaker → (Concepts × StudentMemory)
Constructivist model:
(Student × Task × Environment) → Insight
If the type signature of insight requires active construction, then lecture-only formats may be inadequate. Helps justify pedagogy choices.
Source: https://chatgpt.com/share/67f836e2-1280-8001-a7ad-1ef1e2a7afa7
Of these, I’m most worried about neuralese recurrence effectively removing direct access to the AI’s reasoning in a legible format.
I am not worried about this right now. We should always be able to translate latent space reasoning aka neuralese (see COCONUT) to a human language equivalent representation. This might be incomplete or leave out details—but that is already the case for existing models (as discussed here). The solution suggested by Villiam is to recursively expand as needed.
Another option might be to translate neuralese to equivalent program code (preferably Lean). This would be harder for most people to read but more precise and probably easier to verify.
Thanks for the clarifications!
The second referred to holes in the landscape mentioned in the post, not in the rules.
That seems like a great game that I will try.
I found the material and the game manual v0.1, but I do not understand
how corpses are created. When a player is killed/eaten?
how are holes created and healed? I didn’t see any cards for that.
are players eliminated when both of their agents are eaten? do they still get a score? I guess yes.
government issue 4x4s (the true menace of the road)
can confirm from direct observation in both Tanzania and Kenya. It’s not only government issue 4x4, but the distinction doesn’t matter as they enjoy the same privileges.
The good thing is that at least those actions on larger platforms leave evidence that can be established by the court.
Yes, I think “runes” throw many LLMs off into the wrong simulator. Humans don’t fall for this because the symbols “look” mathematical but a text-based LLM can’t “see” that. The opposite happens for computer scientists: They see “[]” and start to think in programming terms such as lambda functions...
Using a much simpler prompt and without mentioning number theory or math o3 easily solves it:
There’s a series of symbol sequences, composed entirely of “[”, “]”, and “-” in some combination that is equal to a number. Here are examples:
…
What is the meaning of this formal notation?
Yes. I first tried things like this, too. I also tried term rewrite rules, and some of these were quite close. For example, AB → A*(B+1) or AB → A*(B+A) or AB → A*(B+index) led to some close misses (the question was which to expand first, so which associativity, I also considered expanding smaller first) but failed with later expansions. Took me half an hour to figure out that the index was not additive or multiplicative but the exponent base.
When we talk about AIs scheming, alignment faking or goal preservation, we imply there is something scheming or alignment faking or wanting to preserve its goals or escape the datacentre.
See also this previous discussion about
What is the AGI system and what is the environment? Where does the AGI system draw the boundary when reasoning about itself?
Yeah, I’m not happy that the anthill is fictional. I considered putting it into a footnote, but then I would have to put all the table entires there too, and the comparison in a table would be lost, and I think it helps drive the intuition that the elements of computation could be distributed.
I agree with that. In fact, it is one reason I don’t see LLMs currently as conscious. An earlier version of this post had a combined system of an LLM and a human interacting with it as another example, but I felt that was too difficult and not core to the thesis. A human, by continuously interacting, can provide the coherence-over-time. Stable awareness patterns and self-perception might still be missing or weak, though.
Yes—and I think that’s the most fragile part of the analogy. There is coherence, but it’s definitely not as robust as a nervous system is. Still, we do see subsets (e.g., ministries, branches of government, political blocs) coordinating through shared norms, procedures, and mutual modelling. They’re noisy, error-prone, often adversarial, but they’re not completely incoherent. At times, especially under external threat or during major events, countries do behave in surprisingly unified ways. These aren’t mere aggregations of individual actions; they require and ensure a degree of coordination that maintain a whole over time.
If we take that critique seriously, we have to stop saying that corporations launch products, or that teams win matches. There’s always an underlying substrate of individual action. But we regularly model higher-level entities as agents when doing so improves prediction or explanation. From a functionalist perspective, if “Country X believes Y” helps us model diplomatic behaviour more accurately than tracking all individuals, that’s meaningful—even if we know that it is an abstraction.
Yes, but I think this is too strict a reading. The same could be said about any distributed system. When a program outputs “Hello world,” it’s really just electrons doing things. When a person speaks, it’s really just muscles and neural impulses. The distinction is in the coordination and interpretation. When a state department issues a formal diplomatic communication, it’s acting as the voice of an institution that maintains internal models, makes predictions, and responds to feedback. That is, in all the functional ways that matter, it is the country speaking.
Exactly, and we can extend the analogy to institutions that are the coordinating organs of a country’s body. They can fail, conflict, or contradict each other, which is comparable to a neurological disorder. But that doesn’t mean there is no coherence. It just means the coherence is partial and susceptible to breakdown. One could say that is also true of human consciousness in pathological states.
So yes, I take the point that coherence is crucial. But I don’t think the lack of perfect coherence disqualifies countries from being modelled as agents or even from being on some continuum toward consciousness. The better question might be: Under what conditions does it become useful or predictive to model a system as being conscious?