faul_sname comments on Talent Needs of Technical AI Safety Teams

faul_sname 29 May 2024 21:54 UTC
3 points
0
Is there a particular reason you expect there to be exactly one hard part of the problem,
Have you stopped beating your wife? I say “the” here in the sense of like “the problem of climbing that mountain over there”. If you’re far away, it makes sense to talk about “the (thing over there)”, even if, when you’re up close, there’s multiple routes, multiple summits, multiple sorts of needed equipment, multiple sources of risk, etc.
I think the appropriate analogy is someone trying to strategize about “the hard part of climbing that mountain over there” before they have even reached base camp or seriously attempted to summit any other mountains. There are a bunch of parts that might end up being hard, and one can come up with some reasonable guesses as to what those parts might be, but the bits that look hard from a distance and the bits that end up being hard when you’re on the face of the mountain may be different parts.
We make an argument like “any solution would have to address X” or “anything with feature Y does not do Z” or “property W is impossible”, and then we can see what a given piece of research is and is not doing / how it is doomed to irrelevance
Doomed to irrelevance, or doomed to not being a complete solution in and of itself? The point of a lot of research is to look at a piece of the world and figure out how it ticks. Research to figure out how a piece of the world ticks won’t usually directly allow you to make it tock instead, but can be a useful stepping stone. Concrete example: dictionary learning vs Golden Gate Claude.
I agree IF we are looking at the objects in question. If LLMs were minds, the research would be much more relevant.
I think one significant crux is “to what extent are LLMs doing the same sort of thing that human brains do / the same sorts of things that future, more powerful AIs will do?” It sounds like you think the answer is “they’re completely different and you won’t learn much about one by studying the other”. Is that an accurate characterization?
To answer your question: by looking at and thinking about minds. The only minds that currently exist are humans
Agreed, though with quibbles
and the best access you have to minds is introspection.
In my experience, my brain is a dirty lying liar that lies to me at every opportunity—another crux might be how faithful one expects their memory of their thought processes to be to the actual reality of those thought processes.
- TsviBT 29 May 2024 21:57 UTC
  3 points
  0
  Parent
  
  Doomed to irrelevance, or doomed to not being a complete solution in and of itself?
  
  Doomed to not be trying to go to and then climb the mountain.
  
  my brain is a dirty lying liar that lies to me at every opportunity
  
  So then it isn’t easy. But it’s feedback. Also there’s not that much distinction between making a philosophically rigorous argument and “doing introspection” in the sense I mean, so if you think the former is feasible, work from there.
  - faul_sname 29 May 2024 22:04 UTC
    2 points
    0
    Parent
    Doomed to irrelevance, or doomed to not being a complete solution in and of itself?
    Doomed to not be trying to go to and then climb the mountain.
    If you think that current mech interp work is currently trying to directly climb the mountain, rather than trying to build and test a set of techniques that might be helpful on a summit attempt, I can see why you’d be frustrated and discouraged at the lack of progress.
    
    > Also there’s not that much distinction between making a philosophically rigorous argument and “doing introspection” in the sense I mean, so if you think the former is feasible, work from there.
    
    I don’t have much hope in the former being feasible, though I do support having a nonzero number of people try it because sometimes things I don’t think are feasible end up working.
    - TsviBT 29 May 2024 22:06 UTC
      2 points
      0
      Parent
      - faul_sname 29 May 2024 22:33 UTC
        3 points
        0
        Parent
        I mean if we’re going with memes I could equally say
        though realistically I think the most common problem in this kind of discussion is
        TsviBT 29 May 2024 22:39 UTC
        4 points
        0
        Parent
        Look… Consider the hypothetically possible situation that in fact everyone is very far from being on the right track, and everything everyone is doing doesn’t help with the right track and isn’t on track to get on the right track or to help with the right track.
        
        Ok, so I’m telling you that this hypothetically possible situation seems to me like the reality. And then you’re, I don’t know, trying to retreat to some sort of agreeable live-and-let-live stance, or something, where we all just agree that due to model uncertainty and the fact that people have vaguely plausible stories for how their thing might possibly be helpful, everyone should do their own thing and it’s not helpful to try to say that some big swath of research is doomed? If this is what’s happening, then I think that what you in particular are doing here is a bad thing to do here.
        
        Maybe we can have a phone call if you’d like to discuss further.
        faul_sname 29 May 2024 23:13 UTC
        2 points
        −1
        Parent
        
        Maybe we can have a phone call if you’d like to discuss further.
        
        I doubt it’s worth it—I’m not a major funder in this space and don’t expect to become one in the near future, and my impression is that there is no imminent danger of you shutting down research that looks promising to me and unpromising to you. As such, I think the discussion ended up getting into the weeds in a way that probably wasn’t a great use of either of our time, and I doubt spending more time on it would change that.
        
        That said, I appreciated your clarity of thought, and in particular your restatement of how the conversation looked to you. I will probably be stealing that technique.