Master’s student in applied mathematics, funded by Center on Long-Term Risk to investigate the cheating problem in safe pareto-improvements. Agent foundations fellow with @Alex_Altair.
Some other areas I’m interested in:
Investigate properties of general purpose search so that we can handcraft it & simply retarget the search
Investigate the type signature of world models to find properties that remain invariant under ontology shifts
Natural latents
How to characterize natural latents in settings like PDEs?
Equivalence of natural latents under transformation of variables
Formalizing automated design
Information theoretic impact measures
Scalable blockchain consensus mechanisms
Programming language for concurrency
Quantifying optimization power without assuming a particular utility function
What mathematical axioms would emerge in a solomonoff inductor?
How things like riemannian metric & differential equations might emerge from discrete systems
Morphogenesis
My mental model of this is something like: My concept of a squirgle is a function f(x) which maps latent variables x to observations such that likelier observations correspond to latent variables with lower description length.
Suppose that we currently settle on a particular latent variable x0, but we receive new observations that are incompatible with f(x0), and these new observations can be most easily accounted for by modifying x0 to a new latent variable x1 that’s pretty close to x0, then we say that this change is still about squirgle
But if we receive new observations that can be more easily accounted for by perturbing a different latent variable y that corresponds to another concept g(y) (eg about TV), then that is a change about a different thing and not the squirgle
The main property that enables this kind of separation is modularity of the world model, because when most components are independent of most other components at any given time, only a change in a few latent variables (as opposed to most latent variables) is required to accomodate new beliefs, & that allows us to attribute changes in beliefs into changes about disentangled concepts