Marius Adrian Nicoară

Karma: 8

Marius Adrian Nicoară May 18, 2025, 5:01 PM
1 point
0
in reply to: Charlie Steiner’s comment on: Absolute Zero: Alpha Zero for LLM
“If the human wants coffee, we want the AI to get the human a coffee. We don’t want the AI to get itself a coffee.”
It’s not clear to me that this is the only possible outcome. It’s not a mistake that we humas do routinely. In fact, there is some evidence that if someone asks us to do them a favor, we might end up liking them more and continue to do more favors for that person. Granted, there seem to have been no large-scale studies analyzing this so called Ben Franklin effect. Even if this effect does turn out to be more robust, it’s not clear to me how this could transfer to an AI. And then there’s the issue of making sure the AI won’t somehow get rid of this constraint that we imposed on it.

”The problem is that we don’t know what we want the AI to do, certainly not with enough precision to turn it into code.”
I agree; that’s backed up by the findings from the Moral Machine experiment about what we think autonomous cars should do.

Marius Adrian Nicoară May 17, 2025, 11:56 AM
0 points
0
on: PSA: The LessWrong Feedback Service
I would be great to have automated feedback on the epistemics of a piece of text. An LLM that can read text and identify reasoning errors or add appropriate qualifiers. As a browser plugin, it would also be helpful when reading news articles. Perhaps it can be done by using the Constitutional AI methodology and using Rationality: From A-Z(or something similar) as the constitution.

Marius Adrian Nicoară May 17, 2025, 9:12 AM
1 point
0
in reply to: Charlie Steiner’s comment on: Absolute Zero: Alpha Zero for LLM
I only skimmed a little through the post I’m linking to, but I’m curios if the method of self-other-overlap could help “keep AI meta-ethical evolution grounded to human preferences”:
https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine
My own high-level, vaguely defined guess of a method would be something that is central to the functioning of the AI such that if the AI goes against it, then the AI will not be able to make sense of the world. But that seems to carry the risk of the AI just messing everything up as it goes crazy. So the method should also include a way of limiting the capabilities of the AI while it’s in that confused state.

Marius Adrian Nicoară May 17, 2025, 8:58 AM
1 point
0
in reply to: Charlie Steiner’s comment on: Absolute Zero: Alpha Zero for LLM
“it’s the distinction between learning from human data versus learning from a reward signal.” That’s an interesting distinction. The difference I currently see between the two is that currently a reward signal can be hacked by the AI, while human data cannot. Is that an accurate thing to say?
Are there any resources you could recommend for alignment methods that take into account the distinction you mentioned?

Satire: Sam Altman get’s grilled by the Financial Times for his kitchen and his cooking skills + what this might say about him

Marius Adrian NicoarăMay 13, 2025, 9:38 AM

1 point

0 comments2 min readLW link

Marius Adrian Nicoară May 13, 2025, 8:02 AM
1 point
0
in reply to: Matrice Jacobine’s comment on: Absolute Zero: Reinforced Self-play Reasoning with Zero Data
I think editing should be possible. Not sure about deleting it entirely.

Marius Adrian Nicoară May 12, 2025, 6:42 PM
3 points
0
on: Absolute Zero: Alpha Zero for LLM
I think that from an AI Alignment perspective, giving AI so much control over its training seems to be very problematic. What we are mostly left with is to control the interface that AI has to physical reality i.e. sensors and actuators.
For now, it seems to me that AI is mostly affecting the virtual world. I think the moment when AI can competently and more directly influence physical reality would be a tipping point, because then it can cause a lot more changes to the world.
I would say that the ability to do continuous learning is required to adapt well to the complexity of physical reality. So a big improvement in continuous learning might be an important next goalpost to watch for.

Marius Adrian Nicoară May 12, 2025, 5:55 PM
1 point
0
in reply to: Vladimir_Nesov’s comment on: Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Yes, and there’s an interesting question in that post and and interesting answer in the comments there. Would be great to have everything in one place. @Matrice Jacobine and @alapmi, maybe you could try to come to some sort of an agreement?

An artistic illustration of Scalable Oversight—“A world apart, neither gods nor mortals”

Marius Adrian NicoarăApr 16, 2025, 12:41 PM

1 point

0 comments1 min readLW link

Marius Adrian Nicoară Aug 26, 2024, 7:11 AM
1 point
0
in reply to: Dagon’s comment on: Ethical Deception: Should AI Ever Lie?
This seems like an introduction to the topic. enough to get the curiosity boiling.

Hyppotherapy

Marius Adrian NicoarăAug 12, 2024, 8:07 PM

−3 points

0 comments1 min readLW link

Marius Adrian Nicoară Aug 5, 2024, 7:29 PM
1 point
0
on: AI Will Not Want to Self-Improve
Looking at the convergent instrumental goals
- self preservation
- goal preservation
- resource acquisition
- self improvement
I think some are more important than others.

There is the argument that in order to predict the actions of a superintelligent agent you need to be as intelligent as it is. It would follow that an AI might not be able to predict if its goal will be preserved or not by self improvement.

But I think it can have high confidence that self improvement will help with self preservation and resource acquisition. And those gains will be helpful with any new goal it might decide to have. So self improvement would not seem to be such a bad idea.

Marius Adrian Nicoară Jul 18, 2024, 3:17 PM
1 point
0
in reply to: Martin Vlach’s comment on: Some desirable properties of automated wisdom
>Are those your musings about agents questioning their assumptions and word-views?
- Yes, these are my musings about agents questioning their assumptions and world-views.

>And like, do you wish to improve your fallacies?
- I want get better at avoiding fallacies. What I desire for myself I also desire for AI. As Marvin Minsky put it: “Will robots inherit the Earth? Yes, but they will be our children.”

>higher threshold than ability, like inherent desire/optimisation?
What kind of stability? Any from https://en.wikipedia.org/wiki/Stable_algorithm? I’d focus more on sort of non-fatal influence. Should the property be more about the alg being careful/cautious?
- I was thinking of stability in terms of avoiding infinite regress, as illustrated by Jonas noticing the endless sequence of metaphorical whale bellies.

Philosopher Gabriel Liiceanu in his book “Despre limită” (English: Concerning limit—unfortunately, no English version seems to be available) argues that we fell lost when we loose our landmark-limit i.e. in the desert/in the middle of the ocean on a cloudy night with no navigational tools. I would say that we can also get lost in our mental landscape and thus be unable to decide which goal to pursue.

Consider the paperclip maximizing algorithm: once it has turned all available matter in the Universe into paperclips, what will it do? And if the algorithm can predict that it will reach this confusing state, does it decide to continue the paperclip optimization? As a Buddhist saying goes: “When you get what you desire, you become a different person. Consider becoming that version of yourself first and you might find that you no longer need the object of your desires.”.

Marius Adrian Nicoară Jul 13, 2024, 3:20 PM
1 point
0
on: LessWrong/ACX meetup Transilvanya tour—Sibiu
Waited fo 20 minutes and no one showed up. Better luck next time, I guess.

Some desirable properties of automated wisdom

Marius Adrian NicoarăJul 13, 2024, 6:05 AM

3 points

2 comments6 min readLW link

Marius Adrian Nicoară Jul 4, 2024, 2:03 PM
1 point
0
on: LessWrong/ACX meetup Transilvanya tour—Cluj Napoca
Because no one has signed up, this event will not be held.

Marius Adrian Nicoară Jun 29, 2024, 2:45 PM
1 point
0
on: LessWrong/ACX meetup Transilvanya tour—Alba Iulia
Nobody signed up, so the event will not take place.

LessWrong/ACX meetup Transilvanya tour—Sibiu

Marius Adrian NicoarăJun 28, 2024, 11:41 AM

1 point

1 comment1 min readLW link

LessWrong/ACX meetup Transilvanya tour—Alba Iulia

Marius Adrian NicoarăJun 19, 2024, 7:56 PM

1 point

1 comment1 min readLW link

LessWrong/ACX meetup Transilvanya tour—Cluj Napoca

Marius Adrian NicoarăJun 7, 2024, 5:45 AM

1 point

1 comment1 min readLW link

Marius Adrian Nicoară

Satire: Sam Alt­man get’s grilled by the Fi­nan­cial Times for his kitchen and his cook­ing skills + what this might say about him

An artis­tic illus­tra­tion of Scal­able Over­sight—“A world apart, nei­ther gods nor mor­tals”

Hyppotherapy

Some de­sir­able prop­er­ties of au­to­mated wisdom

LessWrong/​ACX meetup Tran­sil­vanya tour—Sibiu

LessWrong/​ACX meetup Tran­sil­vanya tour—Alba Iulia

LessWrong/​ACX meetup Tran­sil­vanya tour—Cluj Napoca

Satire: Sam Altman get’s grilled by the Financial Times for his kitchen and his cooking skills + what this might say about him

An artistic illustration of Scalable Oversight—“A world apart, neither gods nor mortals”

Some desirable properties of automated wisdom

LessWrong/ACX meetup Transilvanya tour—Sibiu

LessWrong/ACX meetup Transilvanya tour—Alba Iulia

LessWrong/ACX meetup Transilvanya tour—Cluj Napoca