Ph.D. student in a computational data science department. Building the AI Safety Field in India. CBG Grant from CEA.
I am interested in Agent Foundations. Doing SERI MATS research sprint remotely with John this summer.
Participating in Key Phenomena in AI Risk PIBBSS reading group.
Attending SLT in-person workshop end of this month.
https://twitter.com/adityaarpitha
Aditya
maybe it’s somewhat easier on account of how we have more introspective access to a working mind than we have to the low-level physical fields;
We have a biased access which makes things tricker because we weren’t selected for our introspection skills to be high fidelity and having a correspondence with reality. Rather it’s about the utility to survival.
It doesn’t have to be the result of explicit metaphysical beliefs...it could be the result of vague guesswork, and analogical thinking.
Yeah I could be wrong but my claim is implicit metaphysical beliefs have a big role here.
defining “agentic” as “possessing spooky metaphysical free will” rather than “not passive”. It’s perfectly possibly to build an agent-in-the-sense-of-active out of mechanical parts.
I was just noting that people who are aware of the internal workings of AI will have to acutely face cognitive dissonance if they admit it can have “spooky” agency. They can’t compartmentalize it the way others can.
“topics about which philosophy is still concerned because we don’t or can’t get information that would enable us to have sufficient certainty of answers to allow those topics to transition into science”.
I think that is quite close. I mean the implicit assumptions behind all these discussions, which are unquestioned. Moral realism, Computationalism, Empiricism, and Reductionism all come to mind. These topics cannot be tested or falsified with the scientific method.
but there’s not really anything here that seems like an argument that would convince anyone who didn’t already agree
I thought it would be best to try even if I am not confident it will make any impact on people reading it. My attempt is, like you rightly said, trying to get AI safety researchers to take philosophy more seriously. Most people see it as a past time that they can enjoy for intrinsic pleasure. In my opinion there is a lot of utility if we practiced going more meta until we could see the underpinnings of both the problem of x risk and the solution.
Some of the utility comes from being able to communicate it to more diverse people at higher fidelity. The rest comes from empowering existing researchers to maybe make a breakthrough in alignment itself.
A lot of these objects like values, and goals seem to exist strongly in our ontology. I would like to see people try and question these things, consider other possibilities.
This exchange between Connor and Joscha seems to be an example where Connor clearly is irritated at the question because it is trying to use philosophy to question if we should even both saving humanity, is humans bad by our own standards. I can understand how he feels completely. But notice how Joscha seems to seriously think the philosophy of what values we have and how they are justified are very important.In this community it seems to taken as fact that the direction we align the AI towards is something to be considered after figuring out how to set the direction in anyway whatsoever. We have decoupled these two things. I would like to question these assumptions, and because I am not smart enough maybe others can also try. This needs us to unsee the boundaries we are so used to and be very careful which ones we put down.
In particular, they might unlearn it in narrow contexts related to their immediate work, but then get confused and fail to unlearn it in general, resulting in them getting confused about things like agency and free will.
Yeah, I was hoping to draw attention to this problem with my post. I love the embedded agency comic series. Yeah, the cartesian boundary is one of such boundaries which most of us have but again if we want to think about alignment honestly, I think it is worthwhile to train to unsee that too.
I will check out your book. I hope to also maybe write something that can help people grok monoism and other philosophical ideas they might want to consider in its entirety.
Aren’t non-academics and non-experts the majority,
I was talking about people who had not grokked materialism which is the majority. The people who are not aware of the technical details model AI as this black box, therefore, seem to be more open to considering that it might be agentic but that is them just deferring to an outside view that sounds convincing rather than building their own model.
so maybe people there, who are working on AI and machine learning, more often have a religious or spiritual concept of human nature, compared to their counterparts in the secularized West?
Most people I talked to were from India and it is possible there is a pattern there. But I see similar arguments come up even in the people in the west. When people say “it is just statistics”, they seem to be pointing to the idea that deterministic processes can never be agentic.
I am not trying to bring consciousness into the discussion necessarily but I think there is value in helping people make their existing philosophical beliefs more explicit so that they can see it to the natural conclusion.
Thanks for the constructive critism. I thought about it and I guess I need to increase the legibility of what I wrote.
I will add a TLDR and update the post soon.
Avoiding metaphysics means giving bad philosophy a free pass
Some things don’t make sense unless you really experience it. Personally I have no words for the warping effects such emotions have on you. It’s comparable to having kids or getting brain injury.
It’s a socially acceptable mental disorder.
The only thing is to notice when you are in that state and put very low credence on all positive opinions you have about your Limerent Object. You cannot know to a high confidence anything about them in that state. Give it a few years.
Don’t take decisions you can’t undo, entangle parts of your life which will be painful to detach later.
But it’s a ride worth going on. No point in living life too safely. Have fun but stay safe out there.
Evolution failed at imparting its goal into humans, since humans have their own goals that they shoot for instead when given a chance.
To me, your framing of inner misalignment sounds like Goodharting itself because we evolved our intrinsic motivations towards these measures because they were good measures in the ancestral environment. But when we got access to advanced technology we kept optimizing on the measure (sex, sugar, beauty, etc) which led to it becoming no longer a measure of the actual target (kids, calories, health, etc.)
I think outer alignment is better thought of as a property of the objective function i.e. “an objective function is outer aligned if it incentivizes or produces the behavior we actually want on the training distribution.”
You should come for the Bangalore meet-up this Sunday. If you are near this part of India.
I asked out my crushes. Worked out well for me.
I used to be really inhibited, now I have tried weed, alcohol and am really enjoying the moment.
Bangalore LW/ACX Meetup in person
Feels nice to see my name in a story. This fact about Romans is just so tasty.
It was hard to really imagine someone getting so emotionally caught up about a fact. I didn’t expect to find it so hard.
Most fights are never about the underlying fact but it’s tribal, about winning. If people cared about knowing the truth it would be discussions not debates.
This is totally possible and valid. I would love for this to be true. It’s just that we can plan for the worst case scenario.
I think it can help to believe that things will turn out ok, we are training the AI on human data. It might adopt some values. Once you believe that, then working on alignment can just be a matter of planning for the worst case scenario.
Just in case. Seem like that would be better for mental health.
Oh ok, I had heard this theory from a friend. Looks like I was misinformed. Rather than evolution causing cancer I think it is more accurate to say evolution doesn’t care if older individuals die off.
evolutionary investments in tumor suppression may have waned in older age.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3660034/
Moreover, some processes which are important for organismal fitness in youth may actually contribute to tissue decline and increased cancer in old age, a concept known as antagonistic pleiotropy
So thanks for clearing that up. I understand cancer better now.
When I talk to my friends, I start with the alignment problem. I found this analogy to human evolution really drives home the point that it’s a hard problem. We aren’t close to solving it.
https://youtu.be/bJLcIBixGj8
So at this time questions come up about how intelligence necessarily means morality. I talk about orthogonality thesis. Then why would the AI care about anything other that what it was explicitly told to do, the danger comes from Instrumental convergence.
Finally people tend to say, we can never do it, they talk about spirituality, uniqueness of human intelligence. So I need to talk about evolution hill climbing to animal intelligence, how narrow ai has small models while we just need AGI to have a generalised world model. Brains are just electrochemical complex systems. It’s not magic.
Talk about pathways, imagen, gpt3 and what it can do, talk about how scaling seems to be working.
https://www.gwern.net/Scaling-hypothesis#why-does-pretraining-work
So it makes sense we might have AGI in our lifetime and we have tons of money and brains working on building ai capability, fewer on safety.
Try practising on other smart friends and develop your skill, you need to ensure people don’t get bored so you can’t use too much time. Use nice analogies. Have answers to frequent questions ready.
I think this is how evolution selected for cancer. To ensure humans don’t live for too long competing for resources with their descendants.
Internal time bombs are important to code in. But it’s hard to integrate that into the AI in a way that the ai doesn’t just remove it the first chance it gets. Humans don’t like having to die you know. AGI would also not like the suicide bomb tied onto it.
The problem of coding this (as part of training) into an optimiser such that it adopts it as a mesa objective is an unsolved problem.
Same this post is what made me decide I can’t leave it to the experts. It is just a matter of spending the required time to catch up on what we know and tried. As Keltham said—Diversity is in itself an asset. If we can get enough humans to think about this problem we can get some breakthroughs many some angles others have not thought of yet.
For me, it was not demotivating. He is not a god, and it ain’t over until the fat lady sings. Things are serious and it just means we should all try our best. In fact, I am kinda happy to imagine we might see a utopia happen in my lifetime. Most humans don’t get a chance to literally save the world. It would be really sad if I died a few years before some AGI turned into a superintelligence.
Eliezer’s latest fanfic is pretty fun to read; if any of you guys are reading it, I would love to discuss it.
I found this very informative, but I think I can contribute to this discussion from the opposite direction. The problem of having too little frame control is also something that exists. Both extremes are bad.
On one end you are pushing your frame on a person, without trying to account for their current value system. In fact if you do it gently, slowly and find a pathway they would want to talk then it becomes moral. If I know the right buttons to push, the right arguments, the evidence, the life experience that could get a friend to adopt the values, beliefs that I hold. I can “guide” him to the state I want him to inhabit. A lot of this can be legitimate communication.
You clearly marked out the boundaries where it becomes immoral, hurtful, and wrong. But imagine a person who respects other people’s frames to the extent that he takes up the frame of the person he talks to, he finds it easy to relate to the person. For example, even if he was an atheist, when talking to a religious person he will assume god exists and proceed with such assumptions.
People like that can be seen as too flexible, not having any character, it can affect how attractive they are. They tend to not climb social hierarchies, accumulate power and influence. People like that can have trouble recommending software, movies, lifestyles because while they love some aspect of these behaviors they wonder if it is their place to decide for them. They are careful to provide the facts and let the other person come to their decision regarding what decision to take.
I think when discussing frame control it is useful to also look at the consequences of a community where it has a lot of stigma associated with it. Since you clearly hate people who abuse it, you are sensitive to people who misuse it and might be blind to the other extreme.
I highly recommend people watch Connor talk about his interpretation of this post.
He talks about how Eliezer is a person who managed to access many anti memes that slid right off our heads.
What is an anti meme you might ask?
Anti meme
By their very nature they resist being known or integrated into your world model. You struggle to remember them. Just like how memes are sticky, and go viral anti memes are slippery and struggle to gain traction.
They could be extraordinarily boring. They could be facts about yourself that your ego protects you from really grasping. Facts about human nature that cause cognitive dissonance because it contradicts existing beliefs you hold dear.
Insights that are anti memes are hard to communicate. You need to use fables, narratives, embody the idea into a story and convey the vibe. You use metaphors in the story.
Jokes or humor is a great way to communicate anti memes, especially ones that are socially awkward and outside people’s overton window.
---
Then Connor gives the example of this post—Death with Dignity as an example of an anti meme. Most of those reading the post seem to completely miss the actual point even when it was clearly spelled out.