lberglund comments on Value systematization: how values become coherent (and misaligned)

lberglund 6 Nov 2023 20:39 UTC
3 points
0
We don’t yet have examples of high-level belief systematization in AIs. Perhaps the closest thing we have is grokking
It should be easy to find many examples of belief systematization over successive generations of AI systems (e.g. GPT-3 to GPT-4). Some examples could be:
- Early GPT models can add some but not all two-digit numbers; GPT-3 can add all of them.
- Models knowing that some mammals don’t lay eggs without knowing all mammals don’t lay eggs.
- Being able to determine valid chess moves in some, but not all cases.
- niplav 8 Dec 2023 9:56 UTC
  6 points
  1
  Parent
  Models knowing that some mammals don’t lay eggs without knowing all mammals don’t lay eggs.
  
  Being nitpicky^[1]: Some mammals do lay eggs.
  ↩︎
  I’m actually not sure how much people on LessWrong are annoyed by/happy about nitpicks like these?