Amalthea comments on Abhimanyu Pallavi Sudhir’s Shortform

Amalthea 11 Aug 2024 15:24 UTC
2 points
0
Ideas come from unsupervised training, answers from supervised training and proofs from RL on a specified reward function.
- Abhimanyu Pallavi Sudhir 11 Aug 2024 15:57 UTC
  1 point
  0
  Parent
  I think only particular reward functions, such as in multi-agent/co-operative environments (agents can include humans, like in RLHF) or in actually interactive proving environments?