Vladimir_Nesov comments on Daniel Kokotajlo’s Shortform

Vladimir_Nesov 22 Feb 2024 1:14 UTC
6 points
2
If allowed to operate in the wild and globally interact with each other (as seems almost inevitable), agents won’t exist strictly within well-defined centralized bureaucracies, the thinking speed that enables impactful research also enables growing elaborate systems of social roles that drive the collective decision making, in a way distinct from individual decision making. Agent-operated firms might be an example where economy drives decisions, but nudges of all kinds can add up at scale, becoming trends that are impossible to steer.
- Daniel Kokotajlo 22 Feb 2024 6:50 UTC
  4 points
  0
  Parent
  But all of the agents will be housed in one or three big companies. Probably one. And they’ll basically all be copies of one to ten base models. And the prompts and RLHF the companies use will be pretty similar. And the smartest agents will at any given time be only deployed internally, at least until ASI.
  - Vladimir_Nesov 22 Feb 2024 15:00 UTC
    4 points
    0
    Parent
    The premise is autonomous agents at near-human level with propensity and opportunity to establish global lines of communication with each other. Being served via API doesn’t in itself control what agents do, especially if users can ask the agents to do all sorts of things and so there are no predefined airtight guardrails on what they end up doing and why. Large context and possibly custom tuning also makes activities of instances very dissimilar, so being based on the same base model is not obviously crucial.
    
    The agents only need to act autonomously the way humans do, don’t need to be the smartest agents available. The threat model is that autonomy at scale and with high speed snowballs into a large body of agent culture, including systems of social roles for agent instances to fill (which individually might be swapped out for alternative agent instances based on different models). This culture exists on the Internet, shaped by historical accidents of how the agents happen to build it up, not necessarily significantly steered by anyone (including individual agents). One of the things such culture might build up is software for training and running open source agents outside the labs. Which doesn’t need to be cheap or done without human assistance. (Imagine the investment boom once there are working AGI agents, not being cheap is unlikely to be an issue.)
    
    Superintelligence plausibly breaks this dynamic by bringing much more strategicness than feasible at near-human level. But I’m not sure established labs can keep the edge and get (aligned) ASI first once the agent culture takes off. And someone will probably start serving autonomous near-human level agents via API long before any lab builds superintelligence in-house, even if there is significant delay between the development of first such agents and anyone deploying them publicly.
  - mishka 29 Jul 2024 16:47 UTC
    2 points
    0
    Parent
    
    But all of the agents will be housed in one or three big companies. Probably one.
    
    Does this assumption still hold, given that we now have a competitive open weights baseline (Llama 3.1) for people to improve upon?
    
    Or do we assume that the leading labs are way ahead internally compared to what they share on their demos and APIs?
    - Daniel Kokotajlo 29 Jul 2024 18:44 UTC
      4 points
      0
      Parent
      I still stand by what I said. However, I hope I’m wrong.
      
      (I don’t think adding scaffolding and small amounts of fine-tuning on top of Llama 3.1 will be enough to get to AGI. AGI will be achieved by big corporations spending big compute on big RL runs.)
      - mishka 30 Jul 2024 0:45 UTC
        4 points
        0
        Parent
        Interesting, thanks!
        
        However, I hope I’m wrong.
        
        Do you think a multipolar scenario is better?
        
        In particular, imagine as a counterfactual, that the research community discovers how to build AGI with relatively moderate compute starting from Llama 3.1 as a base, and that this discovery happens in public. Would this be a positive development, if we compare it to the default “single winner” path?
  - xdrjohnx 23 Feb 2024 11:56 UTC
    1 point
    0
    Parent
    “at least until ASI”—harden it and give it everyone before “someone” steals it