RSS

janos

Karma: 239

On scal­able over­sight with weak LLMs judg­ing strong LLMs

Jul 8, 2024, 8:59 AM
49 points
18 comments7 min readLW link
(arxiv.org)

Power-seek­ing can be prob­a­ble and pre­dic­tive for trained agents

Feb 28, 2023, 9:10 PM
56 points
22 comments9 min readLW link
(arxiv.org)