Master’s student in applied mathematics, funded by Center on Long-Term Risk to investigate the cheating problem in safe pareto-improvements. Agent foundations fellow with @Alex_Altair.
Some other areas I’m interested in:
Investigate properties of general purpose search so that we can handcraft it & simply retarget the search
Investigate the type signature of world models to find properties that remain invariant under ontology shifts
Natural latents
How to characterize natural latents in settings like PDEs?
Equivalence of natural latents under transformation of variables
Formalizing automated design
Information theoretic impact measures
Scalable blockchain consensus mechanisms
Programming language for concurrency
Quantifying optimization power without assuming a particular utility function
What mathematical axioms would emerge in a solomonoff inductor?
How things like riemannian metric & differential equations might emerge from discrete systems
Thanks! I recall reading the steering subsystems post a while ago & it matched a lot of my thinking on the topic. The idea of using variables in the world model to determine the optimization target also seems similar to your “Goals selected from learned knowledge” approach (the targeting process is essentially a mapping from learned knowledge to goals).
Another motivation for the targeting process (which might also be an advantage of GLSK) I forgot to mention is that we can allow the AI to update their goals as they update their knowledge (eg about what the current human values are), which might help us avoid value lock-in.