Roko comments on The Dissolution of AI Safety

Roko 13 Dec 2024 1:27 UTC
2 points
0

LLMs have plenty of internal state, the fact that it’s usually thrown away is a contingent fact about how LLMs are currently used

yes, but then your “Aligned AI based on LLMs” is just a normal LLM used in the way it is currently used.

Relevant aspects of observable behavior screen off internal state that produced it.

Yes this is a good way of putting it.
- Vladimir_Nesov 13 Dec 2024 1:43 UTC
  4 points
  0
  Parent
  
  but then your “Aligned AI based on LLMs” is just a normal LLM used in the way it is currently used
  
  Possibly, but there aren’t potentially dangerous AIs yet, LLMs are still only a particularly promising building block (both for capabilities and for alignment) with many affordances. The chatbot application at the current level of capabilities shapes their use and construction in certain ways. Further on the tech tree, alignment tax can end up motivating systematic uses that make LLMs a source of danger.
  - Roko 13 Dec 2024 2:06 UTC
    2 points
    0
    Parent
    
    Further on the tech tree, alignment tax can end up motivating systematic uses that make LLMs a source of danger.
    
    Sure, but you can say the same about humans. Enron was a thing. Obeying the law is not as profitable as disobeying it.
    - Vladimir_Nesov 13 Dec 2024 2:40 UTC
      5 points
      0
      Parent
      I think human uploads would be similarly dangerous, LLMs get us to the better place of being at the human upload danger level rather than ender dragon slayer model based RL danger level (at least so far). There are similar advantages and dangers to smarter LLMs and uploads, capability for extremely fast value drift and lack of a robust system that keeps such changes sane, propensity to develop superintelligence even to the detriment of themselves. The current world is tethered to the human species and relatively slow change in culture and centers of power.
      
      This changes with AI. If AIs establish effective governance, technical feasibility of change in human and AI nature or capabilities would be under control and could be compatible with (post-)human flourishing, but currently we are not on track to make sure this happens before a catastrophe. The things that eventually establish such governance don’t necessarily remain morally or culturally grounded in modern humanity, let alone find humanity still alive when the dust settles.