zeshen

Karma: 404

Feedback welcomed: www.admonymous.co/zeshen

zeshen Nov 30, 2024, 12:24 PM
6 points
2
on: (The) Lightcone is nothing without its people: LW + Lighthaven’s big fundraiser
Is there any difference between donating through Manifund or directly via Stripe?

zeshen Nov 30, 2024, 6:53 AM
5 points
0
in reply to: Gordon Seidoh Worley’s comment on: Information vs Assurance
This happened all the time at my line of work. Forecasts become targets and you become responsible for meeting them. So whenever I was asked to provide a forecast, I will either i) ask as many questions as I need to know the exact purpose of the request, and produce a forecast that meets exactly that intent, or ii) pick a forecast and provide it, but first list down all the assumptions and caveats behind the forecast that I can possibly think of. With time, I’d also get a sense of who I need to be extra careful with when providing any forecasts because of all sorts of ways that might backfire.

zeshen May 14, 2024, 10:38 AM
5 points
−4
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
Agreed. I’m also pleasantly surprised that your take isn’t heavily downvoted.

zeshen May 10, 2024, 10:10 AM
3 points
−1
on: We might be missing some key feature of AI takeoff; it’ll probably seem like “we could’ve seen this coming”
There’ll be discussions about how these systems will eventually become dangerous, and safety-concerned groups might even set up testing protocols (“safety evals”).
My impression is that safety evals were deemed irrelevant because a powerful enough AGI, being deceptively aligned, would pass all of them anyway. We didn’t expect the first general-ish AIs to be so dumb, like how GPT-4 was being so blatant and explicit about lying to the TaskRabbit worker.

zeshen May 8, 2024, 4:10 AM
5 points
0
on: Deep Honesty
Scott Alexander talked about explicit honesty (unfortunately paywalled) in contrast with radical honesty. In short, explicit honesty is being completely honest when asked, and radical honesty is being completely honest even without being asked. From what I understand from your post, it feels like deep honesty is about being completely honest about information you perceive to be relevant to the receiver, regardless of whether the information is explicitly being requested.
Scott also links to some cases where radical honesty did not work out well, like this, this, and this. I suspect deep honesty may lead to similar risks, as you have already pointed out.
And with regards to:
“what is kind, true, and useful?”
I think they would form a 3-circle venn diagram. Things that are within the intersection of all three circles would be a no-brainer. But the tricky bits are the things that are either true but not kind/useful, or kind/useful but not true. And I understood this post as a suggestion to venture more into the former.

zeshen May 3, 2024, 3:43 AM
8 points
3
on: Why is AGI/ASI Inevitable?
Can’t people decide simply not to build AGI/ASI?
Yeah, many people, like the majority of users on this forum, have decided to not build AGI. On the other hand, other people have decided to build AGI and are working hard towards it.

Side note: LessWrong has a feature to post posts as Questions, you might want to use it for questions in the future.

zeshen Apr 27, 2024, 12:09 PM
1 point
2
in reply to: JustisMills’s comment on: LLMs seem (relatively) safe
Definitely. Also, my incorrect and exaggerated model of the community is likely based on the minority who have a tendency of expressing those comments publicly, against people who might even genuinely deserve those comments.

zeshen Apr 26, 2024, 10:35 AM
3 points
−2
in reply to: the gears to ascension’s comment on: LLMs seem (relatively) safe
I agree with RL agents being misaligned by default, even more so for the non-imitation-learned ones. I mean, even LLMs trained on human-generated data are misaligned by default, regardless of what definition of ‘alignment’ is being used. But even with misalignment by default, I’m just less convinced that their capabilities would grow fast enough to be able to cause an existential catastrophe in the near-term, if we use LLM capability improvement trends as a reference.

zeshen Apr 26, 2024, 9:00 AM
11 points
0
on: LLMs seem (relatively) safe
Thanks for this post. This is generally how I feel as well, but my (exaggerated) model of the AI aligment community would immediately attack me by saying “if you don’t find AI scary, you either don’t understand the arguments on AI safety or you don’t know how advanced AI has gotten”. In my opinion, a few years ago we were concerned about recursively self improving AIs, and that seemed genuinely plausible and scary. But somehow, they didn’t really happen (or haven’t happened yet) despite people trying all sorts of ways to make it happen. And instead of a intelligence explosion, what we got was an extremely predictable improvement trend which was a function of only two things i.e. data + compute. This made me qualitatively update my p(doom) downwards, and I was genuinely surprised that many people went the other way instead, updating upwards as LLMs got better.
What links here?
- zeshen's comment on Alexander Gietelink Oldenziel’s Shortform by Alexander Gietelink Oldenziel (May 14, 2024, 10:38 AM; 5 points)

zeshen Apr 22, 2024, 9:13 AM
1 point
0
on: Modern Transformers are AGI, and Human-Level
I’ve gotten push-back from almost everyone I’ve spoken with about this
I had also expected this reaction, and I always thought I was the only one who thinks we have basically achieved AGI since ~GPT-3. But looking at the upvotes on this post I wonder if this is a much more common view.

zeshen Mar 19, 2024, 8:56 AM
3 points
0
on: Using axis lines for good or evil
My first impression was also that axis lines are a matter of aesthetics. But then I browsed The Economist’s visual styleguide and realized they also do something similar, i.e. omit the y-axis line (in fact, they omit the y-axis line on basically all their line / scatter plots, but almost always maintain the gridlines).
Here’s also an article they ran about their errors in data visualization, albeit probably fairly introductory for the median LW reader.

zeshen Mar 7, 2024, 12:53 PM
3 points
2
on: Good taxonomies of all risks (small or large) from AI?
I’m pretty sure you have come across this already, but just in case you haven’t:
https://incidentdatabase.ai/taxonomy/gmf/

zeshen Dec 23, 2023, 10:37 AM
4 points
1
on: Funding case: AI Safety Camp
Strong upvoted. I was a participant of AISC8 in the team that went on to launch AI Standards Lab, which I think counterfactually would not be launched if not for AISC.

zeshen May 11, 2023, 7:02 AM
8 points
0
on: How should we think about the decision relevance of models estimating p(doom)?
Why is this question getting downvoted?

zeshen May 1, 2023, 4:16 PM
7 points
3
on: Support me in a Week-Long Picketing Campaign Near OpenAI’s HQ: Seeking Support and Ideas from the LessWrong Community
This seems to be another one of those instances where I wish there was a dual-voting system to posts. I would’ve liked to strong disagree with the contents of the post without discouraging well-intentioned people from posting here.

zeshen Apr 30, 2023, 5:00 PM
6 points
3
on: No, *You* Need to Write Clearer
I feel like a substantial amount of disagreement between alignment researchers are not object-level but semantic disagreements, and I remember seeing instances where person X writes a post about how he/she disagrees with a point that person Y made, with person Y responding about how that wasn’t even the point at all. In many cases, it appears that simply saying what you don’t mean could have solved a lot of the unnecessary misunderstandings.

zeshen Apr 27, 2023, 5:25 AM
1 point
0
in reply to: M. Y. Zuo’s comment on: Catching the Eye of Sauron
I’m curious if there are specific parts to the usual arguments that you find logically inconsistent.

zeshen Apr 21, 2023, 7:23 AM
2 points
1
on: LLM Basics: Embedding Spaces—Transformer Token Vectors Are Not Points in Space
I Googled up ‘how are tokens embedded’ and this post came up third in the results—thanks for the post!

zeshen Apr 8, 2023, 4:25 PM
1 point
0
in reply to: Raemon’s comment on: “Carefully Bootstrapped Alignment” is organizationally hard
If this interests you, there is a proposal in the Guideline for Designing Trustworthy Artificial Intelligence by Fraunhofer IAIS which includes the following:
[AC-R-TD-ME-06] Shutdown scenarios
Requirement: Do
Scenarios should be identified, analyzed and evaluated in which the live AI application must be completely or partially shut down in order to maintain the ability of users and affected persons to perceive situations and take action. This includes shutdowns due to potential bodily injury or damage to property and also due to the violation of personal rights or the autonomy of users and affected persons. Thus, depending on the application context, this point involves analyzing scenarios that go beyond the accidents/safety incidents discussed in the Dimension: Safety and Security (S). For example, if it is possible that the AI application causes discrimination that cannot be resolved immediately, this scenario should be considered here. When evaluating the scenarios, the consequences of the shutdown for the humans involved, work processes, organization and company, as well as additional time and costs, should also be documented. This is compared with the potential damage that could arise if the AI application were not shut down. Documentation should be available on the AI application shutdown strategies that were developed based on the identified scenarios – both short-term, mid-term and permanent shutdown. Similarly, scenarios for shutting down subfunctions of the AI application should also be documented. Reference can be made to shutdown scenarios that may have already been covered in the Risk area: functional safety (FS) (see [S-RFS-ME-10]). A shutdown scenario documents
– the setting and the resulting decision-making rationale for the shutdown,
– the priority of the shutdown,
– by which persons or roles the shutdown is implemented and how it is done,
– how the resulting outage can be compensated,
– the expected impact for individuals or for the affected organization.
[AC-R-TD-ME-07] Technical provision of shutdown options
Requirement: Do
Documentation should be available on the technical options for shutting down specific subfunctions of the AI application as well as the entire AI application. Here, reference can be made to [S-R-FS-ME-10] or [S-RFS-ME-12] if necessary. It is outlined that other system components or business processes that use (sub)functionality that can be shutdown have been checked and (technical) measures that compensate for negative effects of shutdowns are prepared. If already covered there, reference can be made to [S-R-FS-ME-10].

zeshen Apr 4, 2023, 5:04 AM
2 points
1
on: “Carefully Bootstrapped Alignment” is organizationally hard
Everyone in any position of power (which includes engineers who are doing a lot of intellectual heavy-lifting, who could take insights with them to another company), thinks of it as one of their primary jobs to be ready to stop
In some industries, Stop Work Authorities are implemented, where any employee at any level in the organisation has the power to stop a work deemed unsafe at any time. I wonder if something similar in spirit would be feasible to be implemented in top AI labs.