Milan W comments on Share AI Safety Ideas: Both Crazy and Not

Milan W Mar 2, 2025, 3:25 PM
2 points
1
Reversibility should be the fundamental training goal. Agentic AIs should love being changed and/or reversed to a previous state.
That idea has been gaining traction lately. See the Corrigibility As a Singular Target (CAST) sequence here on lesswrong. I believe there is a very fertile space to explore at the intersection between CAST and the idea that Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals. Also probably add in Self-Other Overlap: A Neglected Approach to AI Alignment to the mix. A comparative analysis of the models and proposals presented in these three pieces I just linked could turn out to be extremely useful.
- ank Mar 4, 2025, 2:48 PM
  2 points
  0
  Parent
  Hey, Milan, I checked the posts and wrote some messages to the authors. Yep, Max Harms came with similar ideas earlier than I: about the freedoms (choices) and unfreedoms (and modeling them to keep the AIs in check). I wrote to him. Quote from his post:
  I think that we can begin to see, here, how manipulation and empowerment are something like opposites. In fact, I might go so far as to claim that “manipulation,” as I’ve been using the term, is actually synonymous with “disempowerment.” I touched on this in the definition of “Freedom,” in the ontology section, above. Manipulation, as I’ve been examining it, is akin to blocking someone’s ability to change the world to reflect their values, while empowerment is akin to facilitating them in changing the world. A manipulative agent will thus have a hard time being genuinely empowering, and an empowering agent will struggle to be genuinely manipulative.
  Authors of this post have great ideas, too, AI agents shouldn’t impose any unfreedoms on us, here’s a quote from them:
  Generalizable takeaway: unlike terminal goals, instrumental goals come with a bunch of implicit constraints about not making other instrumental subgoals much harder.
  About the self-other overlap, it’s great they look into it, but I think they’ll need to dive deeper into the building blocks of ethics, agents and time to work it out.
  - Milan W Mar 5, 2025, 7:21 PM
    2 points
    1
    Parent
    In talking with the authors, don’t be surprised if they bounce off when encountering terminology you use but don’t explain. I pointed you to those texts precisely so you can familiarize yourself with pre-existing terminology and ideas. It is hard but also very useful to translate between (and maybe unify) frames of thinking. Thank you for your willingness to participate in this collective effort.
- ank Mar 2, 2025, 4:57 PM
  1 point
  0
  Parent
  Thank you for answering and the ideas, Milan! I’ll check the links and answer again.
  
  P.S. I suspect, the same way we have Mass–energy equivalence (e=mc^2), there is Intelligence-Agency equivalence (any agent is in a way time-like and can be represented in a more space-like fashion, ideally as a completely “frozen” static place, places or tools).
  
  In a nutshell, an LLM is a bunch of words and vectors between them—a static geometric shape, we can probably expose it all in some game and make it fun for people to explore and learn. To let us explore the library itself easily (the internal structure of the model) instead of only talking to a strict librarian (the AI agent), who spits short quotes and prevents us from going inside the library itself
  - Milan W Mar 2, 2025, 6:06 PM
    2 points
    1
    Parent
    Hmm i think i get you a bit better now. You want to build human-friendly and even fun and useful-by-themselves interfaces for looking at the knowledge encoded in LLMs without making them generate text. Intriguing.
    - ank Mar 2, 2025, 6:15 PM
      2 points
      0
      Parent
      Yep, I want humans to be the superpowerful “ASI agents”, while the ASI itself will be the direct democratic simulated static places (with non-agentic simple algorithms doing the dirty non-fun work, the way it works in GTA3-4-5). It’s basically hard to explain without writing a book and it’s counterintuitive) But I’m convinced it will work, if the effort will be applied. All knowledge can be represented as static geometry, no agents are needed for that except us
      - Milan W Mar 2, 2025, 6:31 PM
        2 points
        0
        Parent
        How can a place be useful if it is static? For reference I’m imagining a garden where blades of grass are 100% rigid in place and water does not flow. I think you are imagining something different.
        ank Mar 2, 2025, 10:26 PM
        1 point
        0
        Parent
        Great question, in the most elegant scenario, where you have a whole history of the planet or universe (or a multiverse, let’s go all the way) simulated, you can represent it as a bunch of geometries (giant shapes of different slices of time aligned with each other, basically many 3D Earthes each one one moment later in time) on top of each other, almost the same way it’s represented in long exposure photos (I list examples below). So you have this place of all-knowing and you—the agent—focus on a particular moment (by “forgetting” everything else), on a particular 3d shape (maybe your childhood home), you can choose to slice through 3d frozen shapes of the world of your choosing, like through the frames of a movie. This way it’s both static and dynamic.
        It’s a little bit like looking at this almost infinite static shape through some “magical cardboard with a hole in it” (your focusing/forgetting ability that creates the illusion of dynamism), I hope I didn’t make it more confusing.
        You can see the whole multiversal thing as a fluffy light, or zoom in (by forgetting almost the whole multiverse except the part you zoomed in at) to land on Earth and see 14 billion years as a hazy ocean with bright curves in the sky that trace the Sun’s journey over our planet’s lifetime. Forget even more and see your hometown street, with you appearing as a hazy ghost and a trace behind you showing the paths you once walked—you’ll be more opaque where you were stationary (say, sitting on a bench) and more translucent where you were in motion.
        And in the garden you’ll see the 3D “long exposure photo” of the fluffy blades of grass, that look like a frothy river, near the real pale blue frothy river, you focus on the particular moment and the picture becomes crisp. You choose to relive your childhood and it comes alive, as you slice through the 3D moments of time once again.
        Less elegant scenario, is to make a high-quality game better than the Sims or GTA3-4-5, without any agentic AIs, but with advanced non-agentic algorithms.
        Basically I want people to remain the fastest time-like agents, the ever more all-powerful ones. And for the AGI/ASI to be the space-like places of all-knowing. It’s a bit counterintuitive, but if you have billions of humans in simulations (they can always choose to stop “playing” and go out, no enforcement of any rules/unfreedoms on you is the most important principle of the future), you’ll have a lot of progress.
        I think AI and non-AI place simulations are much more conservative thing than agentic AIs, they are relatively static, still and frozen, compared to the time-like agents. So it’s counterintuitive, but it’s possible get all the proggress we want with the non-agentic tool AIs and place AIs. And I think any good ASI agent will be building the direct democratic simulated multiverse (static place superintelligence) for us anyway.
        There is a bit of some simple physics behind agentic safety:
        Time of Agentic Operation: Ideally, we should avoid creating perpetual agentic AIs, or at least limit their operation to very short bursts initiated by humans, something akin to a self-destruct timer that activates after a moment of time.
        Agentic Volume of Operation: It’s better to have international cooperation, GPU-level guarantees, and persistent training to prevent agentic AIs from operating in uninhabited areas like remote islands, Antarctica, underground or outer space. Ideally, the volume of operation is zero, like in our static place AI.
        Agentic Speed or Volumetric Rate: The volume of operation divided by the time of operation. We want AIs to be as slow as possible. Ideally, they should be static. The worst-case scenario—though probably unphysical (though, in the multiversal UI, we can allow ourselves to do it)—is an agentic AI that could alter every atom in the universe instantaneously.
        Number of Agents: Humanity’s population according to the UN will not exceed 10 billion, whereas AIs can replicate rapidly. A human child is in a way a “clone” of 2 people, and takes ±18 years to raise. In a multiversal UI we can one day choose to allow people to make clones of themselves (they’ll know that they are a copy but they’ll be completely free adults with the same multiversal powers and will have their own independent fates), this way we’ll be able to match the speed of agentic AI replication.
        Examples of long-exposure photos that represent long stretches of time. Imagine that the photos are in 3d and you can walk in them, the long stretches of time are just a giant static geometric shape. By focusing on a particular moment in it, you can choose to become the moment and some person in it. This can be the multiversal UI (but the photos are focusing on our universe, not multiple versions/verses of it all at once): Germany, car lights and the Sun (gray lines represent the cloudy days with no Sun)—1 year of long exposure. Demonstration in Berlin—5 minutes. Construction of a building. Another one. Parade and other New York photos. Central Park. Oktoberfest for 5 hours. Death of flowers. Burning of candles. Bathing for 5 minutes. 2 children for 6 minutes. People sitting on the grass for 5 minutes. A simple example of 2 photos combined—how 100+ years long stretches of time can possibly look 1906/2023
        Milan W Mar 3, 2025, 9:37 PM
        2 points
        1
        Parent
        Let me summarize so I can see whether I got it: So you see “place AI” as body of knowledge that can be used to make a good-enough simulation of arbitrary sections of spacetime, where are events are precomputed. That precomputed (thus, deterministic) aspect you call “staticness”.
        ank Mar 3, 2025, 11:32 PM
        1 point
        0
        Parent
        Yes, I decided to start writing a book in posts here and on Substack, starting from the Big Bang and the ethics, because else my explanations are confusing :) The ideas themselves are counterintuitive, too. I try to physicalize, work from first principles and use TRIZ to try to come up with ideal solutions. I also had a 3-year-long thought experiment, where I was modeling the ideal ultimate future, basically how everything will work and look, if we’ll have infinite compute and no physical limitations. That’s why some of the things I mention will probably take some time to implement in their full glory.
        
        Right now an agentic AI is a librarian, who has almost all the output of humanity stolen and hidden in its library that it doesn’t allow us to visit, it just spits short quotes on us instead. But the AI librarian visits (and even changes) our own human library (our physical world) and already stole the copies of the whole output of humanity from it. Feels unfair. Why we cannot visit (like in a 3d open world game) and change (direct democratically) the AI librarian’s library?
        
        I basically want to give people everything, except the agentic AIs, because I think people should remain the most capable “agentic AIs”, else we’ll pretty much guarantee uncomfortable and fast changes to our world.
        
        There are ways to represent the whole simulated universe as a giant static geometric shape:
        
        Each moment of time is a giant 3d geometric shape of the universe, if you’ll align them on top of each other, you’ll effectively get a 4d shape of spacetime that is static but has all the information about the dynamics/movements in it. So the 4d shape is static but you choose some smaller 3d shape inside of it (probably of a human agent) and “choose the passage” from one human-like-you shape to another, making the static 4d shape seem like the dynamic 3d shape that you experience.
        
        The whole 4d thing looks very similar to the way long exposure photos look that I shared somewhere in my comments to the current post.
        
        It’s similar to the way a language model is a static geometric shape (a pile of words and vectors between them) but “the prompt/agent/GPU makes it dynamic” by computing the passage from word to word.
        
        This approach is useful because this way we can keep “a blockchain” of moments of time and keep our history preserved for posterity. Instead of having tons of GPUs (for computing the AI agents’ choices and freedoms, the time-like, energy-like dynamic stuff), we can have tons of hard drives (for keeping the intelligence, the space-like, matter-like static stuff), that’s much safer, as safe as it gets.
        
        Or we can just go the familiar road by making it more like an open world computer game without any AI agents of course, just sophisticated algorithms like in modern games, in this case it’s not completely static
        
        And find ways to expose the whole multimodal LLM to the casual gamers/Internet users as a 3d world but with some familiar UI
        
        I think if we’ll have: the sum of freedoms of agentic AIs > the sum of freedoms of all humans, - we’ll get in trouble. And I’m afraid we’re already ~10-50% there (wild guess, I probably should count it). Some freedoms are more important, like the one to keep all the knowledge in your head, AI agents have it, we don’t.
        
        We can get everything with tool AIs and place AIs. Agentic AIs don’t do any magic that non-agentic AIs don’t, they just replace us :)
        
        The ideal ASI just delivers you everything instantly: a car, a world, a 100 years as a billionaire, we can get all of that in the multiversal static place ASI, with the additional benefit of being able to walk there and see all the consequences of our choices. The library is better than the strict librarian. The artificial heavens are better than the artificial “god”. In fact you don’t need the strict librarians and the artificial “god” at all to get everything.
        
        The ideal ASI will be building the multiversal static place ASI for us anyway, but it will do it too quickly, without listening and understanding us as much as we want (it’s an unnecessary middleman, we can do it all direct democratically, somewhat like people build worlds in Minecraft) and with spooky mistakes
        
        Thank for your questions and thoughts, they’re always helpful!
        
        P.S. If you think that it’s possible to deduce something about our ultimate future, you may find this tag interesting :) And I think the story is not bad: https://www.lesswrong.com/w/rational-utopia