ZY

Karma: 74

I try to practice independent reasoning/critical thinking, to challenge current solutions to be more considerate/complete. I do not reply to DMs for non-personal (with respect to the user who reached out directly) discussions, and will post here instead with reference to the user and my reply.

ZY Apr 5, 2025, 4:07 PM
3 points
0
on: How much progress actually happens in theoretical physics?
I have a second-handed source hearing this view from a theoretical physics 4th phd student at Stanford—he believes less breakthroughs nowadays as the field becomes more and more established, and this was exactly why he was a bit discouraged/sad. Not sure if things has changed, and that may or may not be his personal view.

ZY Apr 5, 2025, 4:02 PM
1 point
0
on: Is instrumental convergence a thing for virtue-driven agents?
I very much agree with the approach and the values in virtue; in case for humans, we enforce virtues either through empathy or law/punishments (in modern societies); wondering how that can be most effectively translated to machines in a consistent way

ZY Mar 1, 2025, 7:26 AM
1 point
0
in reply to: Daniel Tan’s comment on: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Ahh I see! Thanks for the reply/info

ZY Mar 1, 2025, 6:30 AM
1 point
0
on: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Haven’t read the full report, maybe you have already done/tested this—one thought is to use things like influence functions, to try to trace which data (especially fro the non-secure code) “contributed” to these predictions, and see if there is any code that may be related

ZY Mar 1, 2025, 6:27 AM
2 points
0
in reply to: Daniel Tan’s comment on: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Super interesting paper, thanks for the work! Naive question—I thought GPT-4o is not open-sourced, is it finetune-able because UK AISI has access to the model/model weights?

ZY Feb 28, 2025, 5:55 AM
2 points
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Cole Wyeth’s Shortform
On LLMs vs search on internet: agree that LLMs are very helpful in many ways, both personally and professionally, but the worse parts of the misinformation in LLM comparing to wikipedia/internets in my opinion includes: 1) it is relatively more unpredictable when the model will hallucinate, whereas for wikipedia/internet, you would generally expect higher accuracy for simpler/purely factual/mathematical information. 2) it is harder to judge the credibility without knowing the source of the information, whereas on the internet, we could get some signals where the website domain, etc.

ZY Feb 27, 2025, 6:53 PM
2 points
0
on: Have LLMs Generated Novel Insights?
You may like this paper, and I like the series generally ( https://physics.allen-zhu.com ):
https://arxiv.org/pdf/2407.20311

This paper looked at generalization abilities for Math, by making up some datasets in a way that the training data have definitely not seen it.

ZY Feb 14, 2025, 3:24 AM
1 point
0
on: The Game Board has been Flipped: Now is a good time to rethink what you’re doing
It is unclear to me why AI ethics would be partisan; could you elaborate? Do you mean the bias part in the context of US politics somehow specifically perhaps? (I think a lot of the topics are politicized in an unnecessary/non-causal way in the US)

ZY Feb 9, 2025, 8:46 PM
1 point
0
in reply to: Nikola Jurkovic’s comment on: nikola’s Shortform
I am a bit confused on this being “disappointing” to people, maybe because it is not a list that is enough and it is far from complete/enough? I would also be very concerned if OpenAI does not actually care about these, but only did this for PR values (seems some other companies could do this). Otherwise, these are also concrete risks that are happening, actively harming people and need to be addressed. These practices also set up good examples/precedents for regulations and developing with safety mindset. Linking a few resources:
child safety:
- https://cyber.fsi.stanford.edu/news/ml-csam-report
- https://www.iwf.org.uk/media/nadlcb1z/iwf-ai-csam-report_update-public-jul24v13.pdf
private information/PII:
- https://arxiv.org/html/2410.06704v1
- https://arxiv.org/abs/2310.07298
deep fakes:
- https://www.pbs.org/newshour/world/in-south-korea-rise-of-explicit-deepfakes-wrecks-womens-lives-and-deepens-gender-divide
- https://www.nytimes.com/2024/09/03/world/asia/south-korean-teens-deepfake-sex-images.html
bias:
- https://arxiv.org/html/2405.01724v1
- https://arxiv.org/pdf/2311.18140

ZY Feb 4, 2025, 8:14 AM
1 point
0
in reply to: Mikhail Samin’s comment on: Mikhail Samin’s Shortform
I wonder if this is due to
1. funding—the company need money to perform research on safety alignment (X risks, and assuming they do want to to this), and to get there they need to publish models so that they can 1) make profits from them, 2) attract more funding. A quick look on the funding source shows Amazon, Google, some other ventures, and some other tech companies
2. empirical approach—they want to take empirical approach to AI safety and would need some limited capable models
But both of the points above are my own speculations

ZY Jan 16, 2025, 5:06 AM
1 point
0
in reply to: habryka’s comment on: Habryka’s Shortform Feed
Out of curiosity—what was the time span for this raise that achieved this goal/when did first start again? Was it 2 months ago?

ZY Jan 14, 2025, 5:11 AM
6 points
1
in reply to: Fabien Roger’s comment on: Fabien’s Shortform
A few thoughts from my political science classes and experience -
when people value authority more than arguments
It’s probably less about “authority”, but more about the desperate hope to reach stability, and the belief of unstable governments leading to instability, after many years of being colonized on the coasts, and war (ww 2 + civil war).
“Societies can be different”
is a way too compressed term to summarize the points you made. Some of them are political ideology issues, and others are resource issues, but not related to “culture” as could be included in “societies can be different” phrase.
Power imbalance and exploited positions:
This ultimately came from lack of resources compared with the total number of people. Unfortunately this still exist when a society is poor, or have very large economic disparity.
It would be very helpful to also take some reads at comparative governments (I enjoyed the AP classes back in high school in the US context), and other general political concepts to understand even deeper.

ZY Jan 3, 2025, 5:42 AM
1 point
0
in reply to: Ben Pace’s comment on: Ben Pace’s Shortform Feed
For “prison sentencing” here, do you mean some time in prison, but not life sentencing? Also instead of prison sentencing, after increasing “reliability of being caught”, would you propose alternative form of sentencing?
Some parts of 1) and most of 2) made me feel educating people on the clear consequences of the crime is important.
For people who frequently go in and out of prison—I would guess most legal systems already make it more severe than previous offenses typically, but for small crimes they may not be.
I do think other types of punishments that you have listed there (physical pain, training programs, etc) would be interesting depending on the crime.

ZY Dec 18, 2024, 6:21 PM
3 points
0
in reply to: cousin_it’s comment on: Our Intuitions About The Criminal Justice System Are Screwed Up
how to punish fewer people in the first place
This seems to be hard when actual crimes (murder, violent crimes, etc.) are committed; seems to be good to figure out why they commit the crimes, and reducing that reason in the first place is more fundamental.

ZY Dec 10, 2024, 4:28 AM
1 point
0
on: Our Intuitions About The Criminal Justice System Are Screwed Up
A side note -
We don’t own slaves, women can drive, while they couldn’t in Ancient Rome, and so on.
Seems to be a very low bar for being “civilized”

ZY Dec 8, 2024, 5:21 PM
1 point
0
on: Enemies vs Malefactors
focusing less on intent and more on patterns of harm
In a general context, understanding intent though will help to solve the issue fundamentally. There might be two general reasons behind harmful behaviors: 1.do not know this will cause harm, or how not to cause harm, aka uneducated on this behavior/being ignorant, 2.do know this will cause harm, and still decided to do so. There might be more nuances but these two are probably the two high level categories. Knowing what the intent is helps to create strategies to address the issue − 1.more education? 2.more punishments/legal actions?

ZY Dec 7, 2024, 8:51 PM
1 point
0
on: Model Integrity: MAI on Value Alignment
In my opinion, theoretically, the key to have “safe” humans and “safe” models, is “to do no harm” under any circumstances, even when they have power. This is roughly what law is about, and what moral values should be about (in my opinion)

ZY Dec 1, 2024, 5:35 PM
1 point
0
in reply to: yams’s comment on: Daniel Kokotajlo’s Shortform
Yeah nice; I heard youtube also has something similar for checking videos as well

ZY Nov 27, 2024, 7:35 PM
3 points
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
It is interesting; I am only a half musician but I wonder what a true musician think about the music generation quality generally; also this reminds me of the Silicon Valley show’s music similarity tool to check for copyright issues; that might be really useful nowadays lmao

ZY Nov 23, 2024, 6:25 AM
1 point
0
on: Have we seen any “ReLU instead of sigmoid-type improvements” recently
On the side—could you elaborate why you think “relu better than sigmoid” is a “weird trick”, if that is implied by this question?
The reason that I thought to be commonly agreed is that it helps with the vanishing gradient problem (this could be shown from the graphs).