Feel free to reach out, I like internet friends :)
I’m a Secular Humanist (but, like, for animals and possibly future digital minds too).
I like thinkers like Peter Singer, Nick Bostrom, Eliezer Yudkowsky, Scott Alexander, you know the drill. More recently also Scott Aaranson and Paul Christiano.
Outside of the Bayesian rat sphere, I am into thinkers like RMS, Cory Doctorow, Simone de Beauvoir, and David Graeber.
I like rationality and EA for the quirky, sincere, passion driven movements that they are. I care a lot about values like truth, sanity, and the flourishing of all sentient beings.
For personal context: I can understand why a superintelligent system having any goals that aren’t my goals would be very bad for me. I can also understand some of the reasons it is difficult to actually specify my goals or train a system to share my goals. There are a few parts of the basic argument that I don’t understand as well though.
For one, I think I have trouble imagining an AGI that actually has “goals” and acts like an agent; I might just be anthropomorphizing too much.
1. Would it make sense to talk about modern large language models as “having goals” or is that something that we expect to emerge later as AI systems become more general? 2. Is there a reason to believe that sufficiently advanced AGI would have goals “by default”? 3. Are “goal-directed” systems inherently more concerning than “tool-like” systems when it comes to alignment issues (or is that an incoherent distinction in this context)?
I will try to answer those questions myself to help people see where my reasoning might be going wrong or what questions I should actually be trying to ask.
Thanks!