RSS

ChengCheng

Karma: 103

GPT-4o Guardrails Gone: Data Poi­son­ing & Jailbreak-Tuning

1 Nov 2024 0:10 UTC
17 points
0 comments6 min readLW link
(far.ai)

Pac­ing Out­side the Box: RNNs Learn to Plan in Sokoban

25 Jul 2024 22:00 UTC
59 points
8 comments2 min readLW link
(arxiv.org)

Does ro­bust­ness im­prove with scale?

25 Jul 2024 20:55 UTC
14 points
0 comments1 min readLW link
(far.ai)

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

23 Oct 2023 14:11 UTC
20 points
2 comments5 min readLW link
(far.ai)

Un­cov­er­ing La­tent Hu­man Wel­lbe­ing in LLM Embeddings

14 Sep 2023 1:40 UTC
32 points
7 comments8 min readLW link
(far.ai)