My name is Ram Rachum and I’m working on AI Safety research. I want to elicit social behavior in RL agents and use it to achieve AI Safety goals such as alignment, interpretability and corrigibility.
This isn’t specifically about my research, as it’s mostly geared towards the public so it’s pretty basic. I do have a plug for my latest paper at the bottom. This is my first public writing on AI Safety, so I’d appreciate any comments or corrections.
I’m currently raising funding for my research. If you know of relevant funders, I’d appreciate a connection.
Can AI agents learn to be good?
Link post
Hi everyone!
My name is Ram Rachum and I’m working on AI Safety research. I want to elicit social behavior in RL agents and use it to achieve AI Safety goals such as alignment, interpretability and corrigibility.
I made a guest post on the Future of Life Institute’s blog: https://futureoflife.org/ai-research/can-ai-agents-learn-to-be-good/
This isn’t specifically about my research, as it’s mostly geared towards the public so it’s pretty basic. I do have a plug for my latest paper at the bottom. This is my first public writing on AI Safety, so I’d appreciate any comments or corrections.
I’m currently raising funding for my research. If you know of relevant funders, I’d appreciate a connection.