AI Safety person currently working on multi-agent coordination problems.
Jonas Hallgren
If you look at the Active Inference community there’s a lot of work going into PPL-based languages to do more efficient world modelling but that shit ain’t easy and as you say it is a lot more compute heavy.
I think there’ll be a scaling break due to this but when it is algorithmically figured out again we will be back and back with a vengeance as I think most safety challenges have a self vs environment model as a necessary condition to be properly engaged. (which currently isn’t engaged with LLMs wolrd modelling)
Do you have any thoughts on what this actionably means? For me it seems a bit like being able to influence such coversations is potentially a bit intractable but maybe one could host forums and events for this if one has the right network?
I think it’s a good point and I’m wondering about how it actionably looks, I can see it for someone with the right contacts and so the message for people who don’t have that is to create it or what are your thoughts there?
Okay, so I don’t have much time to write this so bear with the quality but I thought I would say one or two things of the Yudkowsky and Wolfram discussion as someone who’s at least spent 10 deep work hours trying to understand Wolfram’s persepective of the world.
With some of the older floating megaminds like Wolfram and Friston who are also phycisists you have the problem that they get very caught up in their own ontology.
From the perspective of a phycisist morality could be seen as an emergent property of physical laws.
Wolfram likes to think of things in terms of computational reducibility, a way this can be described in the agent foundations frame is that the agent modelling the environment will be able to predict the world dependent on it’s own speed. It’s like some sort of agent-environment relativity where the information processing capacity determines the space of possible ontologies. An example of this being how if we have an intelligence that’s a lot closer to operating at the speed of light, the visual field might not be a useful vector of experience to model.
Another way to say it is that there’s only modelling and modelled. An intuition from this frame is that there’s only differently good models of understanding specific things and so the concept of general intelligence becomes weird here.
IMO this is like the problem of the first 2 hours of the conversation, to some extent Wolfram doesn’t engage with the huamn perspective as much nor any ought questions. He has a very physics floating megamind perspective.
Now, I personally believe there’s something interesting to be said about an alternative hypothesis to the individual superintelligence that comes from theories of collective intelligence. If a superorganism is better at modelling something than an individual organism is then it should outcompete the others in this system. I’m personally bullish on the idea that there are certain configurations of humans and general trust-verifying networks that can outcompete individual AGI as the outer alignment functions would enforce the inner functions enough.
But, to help me understand what people mean by the NAH could you tell me what would (in your view) constitute strong evidence against the NAH? (If the fact that we can point to systems which haven’t converged on using the same abstractions doesn’t count)
Yes sir!
So for me it is about looking at a specific type of systems or a specific type of system dynamics that encode the axioms required for the NAH to be true.
So, it is more the claim that “there are specific set of mathematical axioms that can be used in order to get convergence towards similar ontologies and these are applicable in AI systems.”
For example, if one takes the Active Inference lens on looking at concepts in the world, we generally define the boundaries between concepts as markov blankets. Suprisingly or not, markov blankets are pretty great for describing not only biological systems but also AI and some economic systems. The key underlying invariant is that these are all optimisation systems.
p(NAH|Optimisation System).
So if we for example, with the perspective of markov blankets or the “natural latents” (which are functionals that work like markov blankets) don’t see convergence in how different AI systems represent reality then I would say that the NAH has been disproven or that it is evidence against it.
I do however think that this exists on a spectrum and that it isn’t fully true or false, it is true for a restricted set of assumptions, the question being how restricted that is.
I see it more as a useful frame of viewing agent cognition processes rather than something I’m willing to bet my life on. I do think it is pointing towards a core problem similar to what ARC Theory are working on but in a different way, understanding cognition of AI systems.
Yeah, that was what I was looking for, very nice.
It does seem to verify what I was thinking with that you can’t really do the same bet strategy as VCs. I do really also appreciate the thoughts in there, they seem like things one should follow, I gotta make sure to do the last due dilligence part of talking to people that have worked with others in the past, it has always felt like a lot but you’re right in that one should do it.
Also, I’m considering why there isn’t some sort of bet pooling network for startup founders where you have like 20 people go together and say that they will all try out ambitious projects and support each other if they fail. It’s like startup insurance but from the perspective of people doing startups. Of course you have to trust the others there and stuff but I think this should work?
Okay, what I’m picking up here is that you feel that the natural abstractions hypothesis is quite trivial and that it seems like it is naively trying to say something about how cognition works similar to how physics work. Yet this is obviously not true since development in humans and other animals clearly happen in different ways, why would their mental representations converge? (Do correct me if I misunderstood)
Firstly, there’s something called the good regulator theorem in cybernetics and our boy that you’re talking about, Mr Wentworth, has a post on making it better that might be useful for you to understand some of the foundations of what he’s thinking about.
Okay, why is this useful preamble? Well, if there’s convergence in useful ways of describing a system then there’s likely some degree of internal convergence in the mind of the agent observing the problem. Essentially this is what the regulator theorem is about (imo)
So when it comes to the theory, the heavy lifting here is actually not really done by the Natural Abstractions Hypothesis part that is the convergence part but rather the Redundant Information Hypothesis.
It is proving things about the distribution of environments as well as power laws in reality that makes the foundation of the theory compared to just stating that “minds will converge”.
This is at least my understanding of NAH, does that make sense or what do you think about that?
Hmm, I find that I’m not fully following here. I think “vibes” might be thing that is messing it up.
Let’s look at a specific example: I’m talking to a new person at an EA-adjacent event and we’re just chatting about how the last year has been. Part of the “vibing” here might be to hone in on the difficulties experienced in the last year due to a feeling of “moral responsibility”, in my view vibing doesn’t have to be done with only positive emotions?
I think you’re bringing up a good point that commitments or struggles might be something that bring people closer than positive feelings because you’re more vulnerable and open as well as broadcasting your values more. Is this what you mean with shared commitments or are you pointing at something else?
Generally fair and I used to agree, I’ve been looking at it from a bit of a different viewpoint recently.
If we think of a “vibe” of a conversation as a certain shared prior that you’re currently inhabiting with the other person then the free association game can rather be seen as a way of finding places where your world models overlap a lot.
My absolute favourite conversations are when I can go 5 layers deep with someone because of shared inference. I think the vibe checking for shared priors is a skill that can be developed and the basis lies in being curious af.
There’s apparently a lot of different related concepts in psychology about holding emotional space and other things that I think just comes down to “find the shared prior and vibe there”.
No sorry, I meant from the perspective of the person with less legible skills.
Amazing post, I really enjoyed the perspective explored here.
An extension that might be useful for me as an illiquid path enjoyer is what arbitrage or risk-reduction opportunities you see existing out there?
VCs can get by by doing a lot of smaller bets and if you want to be anti-fragile as an illiquid bet it becomes quite hard as you’re part of the cogs in the anti-fragile system. What Taleb says about that is that then these people should be praised because they dare to take on that risk. But there has to be some sort of system one could for example develop with peers and similar?
What is the many bets risk reduction strat here, is it just to make a bunch of smaller MVPs to gain info?
I would be very curious to hear your perspective on this.
I thought this was an interesting take on the Boundaries problem in agent foundations from the perspective of IIT. It is on the amazing Michael Levin’s youtube channel: https://www.youtube.com/watch?app=desktop&v=5cXtdZ4blKM
One of the main things that makes it interesting to me is that around 25-30 mins in, ot computationally goes through the main reason why I don’t think we will have agentic behaviour from AI in at least a couple of years. GPTs just don’t have a high IIT Phi value. How will it find it’s own boundaries? How will it find the underlying causal structures that it is part of? Maybe this can be done through external memory but will that be enough or do we need it in the core stack of the scaling-based training loop?
A side note is that, one of the main things that I didn’t understand about IIT before was how it really is about looking at meta-substrates or “signals” as Douglas Hofstadter would call them are optimally re-organising themselves to be as predictable for themselves in the future. Yet it does and it integrates really well into ActInf (at least to the extent that I currently understand it.)
Okay, so I would say that I atleast have some experience of going from being not that agentic to being more agentic and the stuff that I think worked the best for me was to generally think of my life as a system. This has been the focus of my life over the last 3 years.
More specifically the process that has helped so far for me has been to:Throw myself into high octane projects and see what I needed to keep up.
Burn out and realise, holy shit, how do these people do it?
(Environment is honestly really important, I’ve tried out a bunch of different working conditions and your motivation levels can wary drastically.)
Started looking into the reasons for why this might be that I can’t do it and other can.
Went into absolutely optimising the shit out of my health by tracking stuff using bearable and listening to audiobooks and podcasts, Huberman is a house god of mine.
(Sleep is the most important here, crazy right?)
Supplement and technique tips for sleep:
Glycine, Ashwagandha, Magnesium Citrate
Use a sad lamp within 30 minutes of waking
Yoga Nidras for naps and for falling asleep faster.
Also checkout my biohackers in-depth guide on this at https://desmolysium.com/
He’s got a phd in medicine and is quite the experimental and smart person. (He tries a bunch of shit on himself and sees how it goes.)
Started going into my psychological background and talked to CBT therapists as well as meditating a lot.
I’m like 1.5k hours into this at this point and it has completely changed my life and my view of myself and what productivity means, e.t.c.
It has helped me realise that a lot of the behaviours that made me less productive where based on me being a sensitive person and having developed unhealthy coping mechanisms.
This lead to me having to relive through past traumas whilst having compassion and acceptance for myself.
This has now lead me to having good mechanisms instead of bad ones, It made me remove my access to video games and youtube (willingly!)
For me this has been the most important, Waking up and The Mind Illuminated up until stage 6-7 is the recommendation I have for anyone who wants to start. Also, after 3-6 months of TMI, try to go to a 10 day retreat, especially if you can find a metta retreat. (Think of this as caring and acceptance instead of loving-kindness btw, it helps)
Now I generally, have a strict schedule in terms of when I can do different things during the day.
The app appblock can allow you to block apps and device settings which means you can’t actually deblock them on your phone.
Cold turkey on the computer can do the same and if you find a patch through another app you can just patch that by blocking the new app.
I’m just not allowed to be distracted from the systems that I have.
Confidence:
I feel confident in myself and what I want to do in the world not because I don’t have issues but rather because I know where my issues are and how to counteract them.
The belief is in the process rather than the outcomes. Life is poker, you just gotta optimise the way you play your hands, the EV will come.
Think of yourself as a system and optimise the shit out of it. Weirdly enough, this has made me focus a lot more on self-care than I did before.
Of course, it’s a work in progress but I want to say that it is possible and that you can do it.
Also, randomly, here’s a CIV VI analogy for you on why self-care is op.If you want to be great at CIV, one of the main things to do is to increase your production and economics as fast as possible. This leads to an exponential curve where the more production and economy you have the more you can produce. This is why CIV pros in general rush Commercial Hubs and markets as internal trade routes yield more production.
Your production is based on your psychological well being and the general energy levels that you have. If you do a bunch of tests on this and figure out what works for you, then you have even more production stats. This leads to more and more of that over time until you plateau at the end of that logistic growth.
Best of luck!
When it comes to formal verification I’m curious what you think about the heuristic argument line of research that ARC are approaching?:
It isn’t formal verification in the same sense of the word but rather probabilistic verification if that makes sense?
You could then apply something like control theory methods to ensure that the expected divergence from the heuristic is less than a certain percentage in different places. In the limit it seems to me that this could be convergent towards formal verification proofs, it’s almost like swiss cheese style on the model level?
(Yes, this comment is a bit random with respect to the rest of the context but I find it an interesting question for control in terms of formal verification and it seemed like you might have some interesting takes here.)
I use the waking up app but you can search for “nsdr” on youtube. 20 mins are the timeframe I started with but you can try other timeframes as well.
This does seem kind of correct to me?
Maybe you could see the fixed points that OP is pointing towards as priors in the search process for frames.
Like, your search is determined by your priors which are learnt through your upbringing. The problem is that they’re often maladaptive and misleading. Therefore, working through these priors and generating new ones is a bit like relearning from overfitting or similar.
Another nice thing about meditation is that it sharpens your mind’s perception which makes your new priors better. It also makes you less dependent on attractor states you could have gotten into from before since you become less emotionally dependent on past behaviour. (there’s obviously more complexity here) (I’m referring to dependent origination for you meditators out there)
It’s like pruning the bad data from your dataset and retraining your model, you’re basically guaranteed to find better ontologies from that (or that’s the hope at least).
I’m currently in the process of releasing more of my fixed points through meditation and man is it a weird process. It is very fascinating and that fundamental openness to moving between views seems more prevalent. I’m not sure that I fully agree with you on the all-in part but cudos for trying!
I think it probably makes sense to spend earlier years doing this cognition training and then using that within specific frames to gather the bits of information that you need to solve problems.
Frames are still useful to gather bits of information through so don’t poopoo the mind!
Otherwise, it was very interesting to hear about your journey!
Sleep is a banger reset point for me and therefore doing a nap/yoga nidra and then picking up the day from there if I notice myself avoiding things has been really helpful for me.
Thanks for the post, it was good.
Random extra tip on naps is doing a yoga nidra or non sleep deep rest. You don’t have to fall asleep to get the benefits of a nap+. It also has some extra growth hormone release and dopamine generation afterwards. (Huberman bro, out)
In natural langage maybe it would be something like “given these ontological boundaries, give us the best estimate you can of CEV. ”?
It seems kind of related to boundaries as well if you think of natural latents as “functional markov blankets” that cut reality at it’s joints then you could probably say that you want to perserve part of that structure that is “human agency” or similar. I don’t know if that makes sense but I like the idea direction!
Any reason for the timing window being 4 hours before instead of 30 min to 1 hour? Most of the stuff I’ve heard is around half an hour to an hour before bed, I’m currently doing this with 0.3ish mg (I divide a 1 mg tablet in 3) of melatonin.