lovetheusers comments on AGI Ruin: A List of Lethalities

lovetheusers 15 Nov 2022 2:43 UTC
1 point
0
Human raters make systematic errors—regular, compactly describable, predictable errors.
This implies it’s possible- through another set of human or automated raters- rate better. If the errors are predictable, you could train a model to predict the errors- by comparing rater errors and a heavily scrutinized ground truth. You could add this model’s error prediction to the rater answer and get a correct label.
- Jay Bailey 15 Nov 2022 12:16 UTC
  2 points
  1
  Parent
  The whole problem with “Human raters make systematic errors” is that this is likely to happen to the heavily scrutinized ground truth. If you have a way of creating a correct ground truth that avoids this problem, you don’t need the second model, you can just use that as the dataset for the first model.