jacquesthibs comments on jacquesthibs’s Shortform

jacquesthibs 18 Jul 2024 17:34 UTC
18 points
11
Why aren’t you doing research on making pre-training better for alignment?
I was on a call today, and we talked about projects that involve studying how pre-trained models evolve throughout training and how we could guide the pre-training process to make models safer. For example, could models trained on synthetic/transformed data make models significantly more robust and essentially solve jailbreaking? How about the intersection of pretraining from human preferences and synthetic data? Could the resulting model be significantly easier to control? How would it impact the downstream RL process? Could we imagine a setting where we don’t need RL (or at least we’d be able to confidently use resulting models to automate alignment research)? I think many interesting projects could fall out of this work.
So, back to my main question: why aren’t you doing research on making pre-training better for alignment? Is it because it’s too expensive and doesn’t seem like a low-hanging fruit? Or do you feel it isn’t a plausible direction for aligning models?
We were wondering if there are technical bottlenecks that would make this kind of research more feasible for alignment research to better study how to guide the pretraining process in a way that benefits alignment. As in, would researchers be more inclined to do experiments in this direction if the entire pre-training code was handled and you’d just have to focus on whatever specific research question you have in mind? If we could access a large amount of compute (let’s say, through government resources) to do things like data labeling/filtering and pre-training multiple models, would this kind of work be more interesting for you to pursue?
I think many alignment research directions have grown simply because they had low-hanging fruits that didn’t require much compute (e.g., evals, and mech interp). It seems we’ve implicitly left all of the high-compute projects for the AGI labs to figure out. But what if we weren’t as bottlenecked on this anymore? It’s possible to retrain GPT-2 1.5B with under 700$ now (and 125M for 20$). I think we can find ways to do useful experiments, but my guess is that the level of technical expertise required to get it done is a bit high, and alignment researchers would rather avoid these kinds of projects since they are currently high-effort.
I talk about other related projects here.
- jacquesthibs 29 Jul 2024 20:57 UTC
  3 points
  0
  Parent
  Synthesized various resources for this “pre-training for alignment” type work:
  - Data
    Synthetic Data
    The RetroInstruct Guide To Synthetic Text Data
    Alignment In The Age of Synthetic Data
    Leveraging Agentic AI for Synthetic Data Generation
    **AutoEvol**: Automatic Instruction Evolving for Large Language Models We build a fully automated Evol-Instruct pipeline to create high-quality, highly complex instruction tuning data
    Synthetic Data Generation and AI Feedback notebook
    The impact of models training on their own outputs and how its actually done well in practice
    Google presents Best Practices and Lessons Learned on Synthetic Data for Language Models
    
    Transformed/Enrichment of Data
    Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling. TLDR: You can train 3x faster and with upto 10x lesser data with just synthetic rephrases of the web!
    Better Synthetic Data by Retrieving and Transforming Existing Datasets
    Rho-1: Not All Tokens Are What You Need RHO-1-1B and 7B achieves SotA results of 40.6% and 51.8% on MATH dataset, respectively — matching DeepSeekMath with only 3% of the pretraining tokens.
    Data Attribution
    In-Run Data Shapley
    Scaling Laws for the Value of Individual Data Points in Machine Learning We show how some data points are only valuable in small training sets; others only shine in large datasets.
    What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
    
    Data Mixtures
    Methods for finding optimal data mixture
    RegMix: Data Mixture as Regression for Language Model Pre-training
    Curriculum Learning
    On transforming data into a curriculum to improve learning efficiency and capability
    Curriculum learning that actually works?
    Active Data Selection
    MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models MATES significantly elevates the scaling curve by selecting the data based on the model’s evolving needs.
    Data Filtering
    Scaling Laws for Data Filtering—Data Curation cannot be Compute Agnostic Argues that data curation cannot be agnostic of the total compute that a model will be trained for Github
    How to Train Data-Efficient LLMs Models trained on ASK-LLM data consistently outperform full-data training—even when we reject 90% of the original dataset, while converging up to 70% faster
  - On Pre-Training
    Pre-Training from Human Preferences
    Ethan Perez wondering if jailbreaks would be solved with this pre-training approach
    LAION uses this approach for finegrained control over outputs during inference.
    Nora Belrose thinks that alignment via pre-training would make models more robust to unlearning (she doesn’t say this, but this may be a good thing if you pre-train such that you don’t need unlearning)
    Tomek describing some research direction for improving pre-training alignment
    Simple and Scalable Strategies to Continually Pre-train Large Language Models
    Neural Networks Learn Statistics of Increasing Complexity
  - Pre-Training towards the basin of attraction for alignment
    Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis
    Requirements for a Basin of Attraction to Alignment
    A “Bitter Lesson” Approach to Aligning AGI and ASI
  - Alignment techniques
    AlignEZ: Using the self-generated preference data, we identify the subspaces that: (1) facilitate and (2) are harmful to alignment. During inference, we surgically modify the LM embedding using these identified subspaces. Jacques note: could we apply this iteratively throughout training (and other similar methods)?
  - What do we mean by “alignment”? What makes the model safe?
    Values
    What does it mean for a model to have a value?
    On making the model “care”
- myyycroft 6 Sep 2024 10:47 UTC
  1 point
  0
  Parent
  GPT-2 1.5B is small by today’s standards. I hypothesize people are not sure if findings made for models of this scale will generalize to frontier models (or at least to the level of LLaMa-3.1-70B), and that’s why nobody is working on it.
  
  However, I was impressed by “Pre-Training from Human Preferences”. I suppose that pretraining could be improved, and it would be a massive deal for alignment.
- eggsyntax 20 Jul 2024 11:12 UTC
  1 point
  0
  Parent
  how to guide the pretraining process in a way that benefits alignment
  One key question here, I think: a major historical alignment concern has been that for any given finite set of outputs, there are an unbounded number of functions that could produce it, and so it’s hard to be sure that a model will generalize in a desirable way. Nora Belrose goes so far as to suggest that ‘Alignment worries are quite literally a special case of worries about generalization.’ This is relevant for post-training but I think even more so for pre-training.
  I know that there’s been research into how neural networks generalize both from the AIS community and the larger ML community, but I’m not very familiar with it; hopefully someone else can provide some good references here.