My overall frame is it’s best to have emotional understanding and system 2 deeply integrated. How to handle local tradeoffs unfortunately depends a lot on your current state, and where your bottlenecks are.
Could you provide a specific, real-world example where the tradeoff comes up and you’re either unsure of how to navigate it, or, you think I might suggested navigating it differently?
So, I’ve had this research agenda into agent foundations for a while which essentially mirrors developmental interpretability a bit in that it wants to say things about what a robust development process is rather than something about post-training sampling.
The idea is to be able to predict “optimisation daemons” or inner optimisers as they arise in a system.
The problem that I’ve had is that it is very non-obvious to me what a good mathematical basis for this is. I’ve read through a lot of the existent agent foundations literature but I’m not satisfied with finite factored sets nor the existing boundaries definitions since they don’t tell you about the dynamics.
What I would want is a dynamical systems inspired theory of the formation of inner misalignment. It’s been in my head in the background for almost 2 years now and it feels really difficult to make any progress, from time to time I have a thought that brings me closer but I don’t usually make it closer by just thinking about it.
I guess something I’m questioning in my head is the deliberate practice versus exploration part of this. For me this is probably the hardest problem I’m working on and whilst I could think more deliberately on what I should be doing here I generally follow my curiosity, which I think has worked better than deliberate practice in this area?
I’m currently following a strategy where this theoretical foundation is on the side whilst I build real world skills of running organisations, fundraising, product-building and networking. I then from time to time find some gems such as applied category theory or Michael Levin’s work on Boundaries in cells and Active Inference that I find can really help elucidate some of the deeper foundations of this problem.
I do feel like I’m floating more here, going with the interest and coming back to the problems over time in order to see if I’ve unlocked any new insights. This feels more like flow than it does deliberate practice? Like I’m building up my skills of having loose probability clouds and seeing where they guide me?
I’m not sure if you agree that this is the right strategy but I guess that there’s this frame difference between a focus on the emotional, intuition or research taste side of things versus the deliberate practice side of things?
Nod. So, first of all: I don’t know. My own guesses would depend on a lot of details of the individual person (even those in a similar situation to you).
(This feels somewhat outside the scope of the main thrust of this post, but, definitely related to my broader agenda of ‘figure out a training paradigm conveying the skills and tools necessary to solve very difficult, confusing problems’)
But, riffing so far. And first, summarizing what seemed like the key points:
You want to predict optimization daemons as they arise in a system, and want a good mathematical basis for that, and don’t feel satisfied with the existing tools.
You’re currently exploring this on-the-side while working on some more tractable problems.
You’ve identified two broad-strategies, which are:
somehow “deliberate practice” this,
somehow explore and follow your curiosity intermitently
Three things I’d note:
“Deliberate practice” is very openended. You can deliberate practice noticing and cultivating veins of curiosity, for example.
You can strategize about how to pursue curiosity, or explore (without routing through the practice angle)
There might be action-spaces other than “deliberate practice” or “explore/curiosity” that will turn out to be useful.
My current angle for deliberate practice is to find problem sets that feel somehow-analogous to the one you’re trying to tackle, but simpler/shorter. They should be difficult enough that they feel sort of impossible while you’re working on them, but, also actually solvable. They should be varied enough that you aren’t overfitting to one particular sort of puzzle.
After the exercise, apply the Think It Faster meta-exercise to it.
Part of the point here is notice strategies like “apply explicit systematic thinking” and strategies like “take a break, come back to it when you feel more inspired”, and start to develop your own sense of which strategies work best for you.
My overall frame is it’s best to have emotional understanding and system 2 deeply integrated. How to handle local tradeoffs unfortunately depends a lot on your current state, and where your bottlenecks are.
Could you provide a specific, real-world example where the tradeoff comes up and you’re either unsure of how to navigate it, or, you think I might suggested navigating it differently?
Yeah sure!
So, I’ve had this research agenda into agent foundations for a while which essentially mirrors developmental interpretability a bit in that it wants to say things about what a robust development process is rather than something about post-training sampling.
The idea is to be able to predict “optimisation daemons” or inner optimisers as they arise in a system.
The problem that I’ve had is that it is very non-obvious to me what a good mathematical basis for this is. I’ve read through a lot of the existent agent foundations literature but I’m not satisfied with finite factored sets nor the existing boundaries definitions since they don’t tell you about the dynamics.
What I would want is a dynamical systems inspired theory of the formation of inner misalignment. It’s been in my head in the background for almost 2 years now and it feels really difficult to make any progress, from time to time I have a thought that brings me closer but I don’t usually make it closer by just thinking about it.
I guess something I’m questioning in my head is the deliberate practice versus exploration part of this. For me this is probably the hardest problem I’m working on and whilst I could think more deliberately on what I should be doing here I generally follow my curiosity, which I think has worked better than deliberate practice in this area?
I’m currently following a strategy where this theoretical foundation is on the side whilst I build real world skills of running organisations, fundraising, product-building and networking. I then from time to time find some gems such as applied category theory or Michael Levin’s work on Boundaries in cells and Active Inference that I find can really help elucidate some of the deeper foundations of this problem.
I do feel like I’m floating more here, going with the interest and coming back to the problems over time in order to see if I’ve unlocked any new insights. This feels more like flow than it does deliberate practice? Like I’m building up my skills of having loose probability clouds and seeing where they guide me?
I’m not sure if you agree that this is the right strategy but I guess that there’s this frame difference between a focus on the emotional, intuition or research taste side of things versus the deliberate practice side of things?
Nod. So, first of all: I don’t know. My own guesses would depend on a lot of details of the individual person (even those in a similar situation to you).
(This feels somewhat outside the scope of the main thrust of this post, but, definitely related to my broader agenda of ‘figure out a training paradigm conveying the skills and tools necessary to solve very difficult, confusing problems’)
But, riffing so far. And first, summarizing what seemed like the key points:
You want to predict optimization daemons as they arise in a system, and want a good mathematical basis for that, and don’t feel satisfied with the existing tools.
You’re currently exploring this on-the-side while working on some more tractable problems.
You’ve identified two broad-strategies, which are:
somehow “deliberate practice” this,
somehow explore and follow your curiosity intermitently
Three things I’d note:
“Deliberate practice” is very openended. You can deliberate practice noticing and cultivating veins of curiosity, for example.
You can strategize about how to pursue curiosity, or explore (without routing through the practice angle)
There might be action-spaces other than “deliberate practice” or “explore/curiosity” that will turn out to be useful.
My current angle for deliberate practice is to find problem sets that feel somehow-analogous to the one you’re trying to tackle, but simpler/shorter. They should be difficult enough that they feel sort of impossible while you’re working on them, but, also actually solvable. They should be varied enough that you aren’t overfitting to one particular sort of puzzle.
After the exercise, apply the Think It Faster meta-exercise to it.
Part of the point here is notice strategies like “apply explicit systematic thinking” and strategies like “take a break, come back to it when you feel more inspired”, and start to develop your own sense of which strategies work best for you.