RSS

Oscar Obeso

Karma: 292

Math undergrad at ETH Zurich.

More info: oscarbalcells.com

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
228 points
93 comments10 min readLW link

Re­fusal mechanisms: ini­tial ex­per­i­ments with Llama-2-7b-chat

8 Dec 2023 17:08 UTC
81 points
7 comments7 min readLW link