Cleo Nardo comments on Shortform

Cleo Nardo 30 Sep 2024 16:41 UTC
4 points
0
I’ve added a fourth section to my post. It operationalises “innovation” as “non-transient novelty”. Some representative examples of an innovation would be:
I think these articles were non-transient and novel.
- Mateusz Bagiński 30 Sep 2024 17:37 UTC
  1 point
  −3
  Parent
  My notion of progress is roughly: something that is either a building block for The Theory (i.e. marginally advancing our understanding) or a component of some solution/intervention/whatever that can be used to move probability mass from bad futures to good futures.
  
  Re the three you pointed out, simulators I consider a useful insight, gradient hacking probably not (10% < p < 20%), and activation vectors I put in the same bin as RLHF whatever is the appropriate label for that bin.
  - Cleo Nardo 30 Sep 2024 18:01 UTC
    4 points
    0
    Parent
    thanks for the thoughts. i’m still trying to disentangle what exactly I’m point at.
    I don’t intend “innovation” to mean something normative like “this is impressive” or “this is research I’m glad happened” or anything. i mean something more low-level, almost syntactic. more like “here’s a new idea everyone is talking out”. this idea might be a threat model, or a technique, or a phenomenon, or a research agenda, or a definition, or whatever.
    like, imagine your job was to maintain a glossary of terms in AI safety. i feel like new terms used to emerge quite often, but not any more (i.e. not for the past 6-12 months). do you think this is a fair? i’m not sure how worrying this is, but i haven’t noticed others mentioning it.
    NB: here’s 20 random terms I’m imagining included in the dictionary:
    Evals
    Mechanistic anomaly detection
    Stenography
    Glitch token
    Jailbreaking
    RSPs
    Model organisms
    Trojans
    Superposition
    Activation engineering
    CCS
    Singular Learning Theory
    Grokking
    Constitutional AI
    Translucent thoughts
    Quantilization
    Cyborgism
    Factored cognition
    Infrabayesianism
    Obfuscated arguments