OAASTA (Open Agency Architecture for Safe Transformative AI) has some overlap with my own approach outlined in LOVE in a simbox.
Both approaches are based on using extensive hierarchical world models/simulations, but OAASTA emphasizes formal verification of physics models whereas LOVES emphasizes highly scalable/performant simulations derived more from video game tech (and the simplest useful LOVES sim could perhaps be implemented entirely with a LLM).
At a surface level OOASTA and LOVES are using their world models for different purposes: in LOVES the main sim world model is used to safely train and evaluate agent architectures, whereas in OOASTA the world model is used more directly for model based RL. However at a deeper level of inspection these could end up very similar if you view the entire multi-agent system (of human orgs training AI archs in sims) in LOVES as a hierarchical agent, then the process of sim rollouts of training and evaluating internal sub-agents is equivalent to planning rollouts for the aggregate global hierarchical agent (because its primary action of consequence is creating new improved architectures and evaluating their alignment consequences). An earlier version of the LOVES post draft was more explicit about the hierarchical agent planning analogy.
I’ve also put some thought into what the resulting social decision system should look like assuming success (ie the bargaining solver), but that was beyond the scope of LOVES as I don’t view that part as important as first solving the core of alignment.
LOVES implicitly assumes[1]shorter timelines than OOSTA—so in that sense you could think of LOVES as a nearer term MVP streamlined to minimize alignment tax (but I’m already worried we don’t have enough time for LOVES).
Beyond that the model of AGI informing LOVES assumes brain-like AGI and alignment, with the main artifact being powerful agents that love humanity and act independently to empower our future, but probably couldn’t provide any formal proof of how their actions would fulfill formal goals.
OAASTA (Open Agency Architecture for Safe Transformative AI) has some overlap with my own approach outlined in LOVE in a simbox.
Both approaches are based on using extensive hierarchical world models/simulations, but OAASTA emphasizes formal verification of physics models whereas LOVES emphasizes highly scalable/performant simulations derived more from video game tech (and the simplest useful LOVES sim could perhaps be implemented entirely with a LLM).
At a surface level OOASTA and LOVES are using their world models for different purposes: in LOVES the main sim world model is used to safely train and evaluate agent architectures, whereas in OOASTA the world model is used more directly for model based RL. However at a deeper level of inspection these could end up very similar if you view the entire multi-agent system (of human orgs training AI archs in sims) in LOVES as a hierarchical agent, then the process of sim rollouts of training and evaluating internal sub-agents is equivalent to planning rollouts for the aggregate global hierarchical agent (because its primary action of consequence is creating new improved architectures and evaluating their alignment consequences). An earlier version of the LOVES post draft was more explicit about the hierarchical agent planning analogy.
I’ve also put some thought into what the resulting social decision system should look like assuming success (ie the bargaining solver), but that was beyond the scope of LOVES as I don’t view that part as important as first solving the core of alignment.
LOVES implicitly assumes[1] shorter timelines than OOSTA—so in that sense you could think of LOVES as a nearer term MVP streamlined to minimize alignment tax (but I’m already worried we don’t have enough time for LOVES).
Beyond that the model of AGI informing LOVES assumes brain-like AGI and alignment, with the main artifact being powerful agents that love humanity and act independently to empower our future, but probably couldn’t provide any formal proof of how their actions would fulfill formal goals.
I would be surprised if we don’t have human surpassing AGI in the next 5 years.