In-coming postdoc at CHAI, UC-Berkeley, working under Prof. Stuart Russell’s supervision
Nice work! I agree with many of the opinions. Nevertheless, the survey is still missing several key citations.
Personally, I have made several contributions to RLHF:
Reinforcement learning for bandit neural machine translation with simulated human feedback (Nguyen et al., 2017) is the first paper that shows the potential of using noisy rating feedback to train text generators.
Interactive learning from activity description (Nguyen et al., 2021) is one of the first frameworks for learning from descriptive language feedback with theoretical guarantees
Language models are bounded pragmatic speakers: Understanding RLHF from a Bayesian cognitive modeling perspective (Nguyen 2023) proposes a theoretical framework that makes it easy to understand the limitations of RLHF and derive improve directions.
Recently, there are also several interesting papers on fine-tuning LLMs that was not cited:
Ruibo Liu has authored as series of paper on this topic. Especially, Training Socially Aligned Language Models in Simulated Human Society allows generation and incorporation of rich language feedback.
Hao Liu also has also proposed promising ideas (e.g., Chain of Hindsight Aligns Language Models with Feedback)
I hope that you will find this post useful. Happy to chat more :)
Current theme: default
Less Wrong (text)
Less Wrong (link)
Arrow keys: Next/previous image
Escape or click: Hide zoomed image
Space bar: Reset image size & position
Scroll to zoom in/out
(When zoomed in, drag to pan; double-click to close)
Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).
]
Keys shown in grey (e.g., ?) do not require any modifier keys.
?
Esc
h
f
a
m
v
c
r
q
t
u
o
,
.
/
s
n
e
;
Enter
[
\
k
i
l
=
-
0
′
1
2
3
4
5
6
7
8
9
→
↓
←
↑
Space
x
z
`
g
Nice work! I agree with many of the opinions. Nevertheless, the survey is still missing several key citations.
Personally, I have made several contributions to RLHF:
Reinforcement learning for bandit neural machine translation with simulated human feedback (Nguyen et al., 2017) is the first paper that shows the potential of using noisy rating feedback to train text generators.
Interactive learning from activity description (Nguyen et al., 2021) is one of the first frameworks for learning from descriptive language feedback with theoretical guarantees
Language models are bounded pragmatic speakers: Understanding RLHF from a Bayesian cognitive modeling perspective (Nguyen 2023) proposes a theoretical framework that makes it easy to understand the limitations of RLHF and derive improve directions.
Recently, there are also several interesting papers on fine-tuning LLMs that was not cited:
Ruibo Liu has authored as series of paper on this topic. Especially, Training Socially Aligned Language Models in Simulated Human Society allows generation and incorporation of rich language feedback.
Hao Liu also has also proposed promising ideas (e.g., Chain of Hindsight Aligns Language Models with Feedback)
I hope that you will find this post useful. Happy to chat more :)