RSS

Roland Pihlakas

Karma: 113

Notable run­away-op­ti­miser-like LLM failure modes on Biolog­i­cally and Eco­nom­i­cally al­igned AI safety bench­marks for LLMs with sim­plified ob­ser­va­tion format

Mar 16, 2025, 11:23 PM
36 points
6 comments7 min readLW link

Why mod­el­ling multi-ob­jec­tive home­osta­sis is es­sen­tial for AI al­ign­ment (and how it helps with AI safety as well)

Roland PihlakasJan 12, 2025, 3:37 AM
43 points
7 comments10 min readLW link

Build­ing AI safety bench­mark en­vi­ron­ments on themes of uni­ver­sal hu­man values

Roland PihlakasJan 3, 2025, 4:24 AM
18 points
3 comments8 min readLW link
(docs.google.com)

Sets of ob­jec­tives for a multi-ob­jec­tive RL agent to optimize

Nov 23, 2022, 6:49 AM
13 points
0 comments8 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Sep 29, 2021, 5:09 PM
30 points
7 comments10 min readLW link