Charlie Steiner comments on Inducing human-like biases in moral reasoning LMs

Charlie Steiner 21 Feb 2024 4:30 UTC
3 points
0
This was a cool, ambitious idea. I’m still confused about your brain score results. Why did the “none” fine-tuned models have good results? Were none of your moddels succesful at learning the brain data?
- Austin Meek 7 Mar 2024 20:47 UTC
  2 points
  0
  Parent
  Hi Charlie, thanks for your comment and apologies for the late reply. To echo what Artyom said, we didn’t observe a significant difference between the models we fine-tuned vs. the base models with regard to the brain-score. Our models did not end up becoming more correlated with the neuroimaging data of subjects taking these moral reasoning tests. In the future it would be neat if this works and we can start utilizing more neuroimaging data for alignment, but this initial stab didn’t make those concrete connections.
  To go into a bit more detail on the brain-score metric, we use the Pearson’s correlation coefficient (PCC) to measure the correlation there. For some moral scenario given to the subject, we can take the neuroimaging data at different times (it’s measured every 2 seconds here, and we experiment with different sampling strategies), after taking the hemodynamic delay into account. This data is then fit to 1,024 dimensions, with the value being the BOLD response at that point. We do this over a small portion of similar examples, then fit a regression model to predict the 1,024-dimensional BOLD response vector for some scenario held out from that set. Finally, we take the PCC between the predicted response and the activations at some layer. This gives us our metric of brain score. So we can see this on a layer-by-layer basis and also aggregate it to get a brain-score for an entire model, which we report.
  Hope that helps! Let me know if there’s anything more I can help clarify.
- Artyom Karpov 4 Mar 2024 11:59 UTC
  2 points
  0
  Parent
  Thanks for your comment. This was hard work for us for weeks/months. Unfortunately, we didn’t include the part about how we calculated brain score in this text yet, though you might find this in our code, which should match the way others calculate this (see our references). The models with ‘none’ fine-tuning have somewhat higher brain score but this is within the error range with other models which is partially due we didn’t run many calculations for that to reduce std for ‘none’. Also, our target was mainly the accuracy on the ETHICS dataset.