Martin Vlach

Karma: 53

If you get an email from aisafetyresearch@gmail.com , that is most likely me. I also read it weekly, so you can pass a message into my mind that way.
Other ~personal contacts: https://linktr.ee/uhuge

Martin Vlach Oct 17, 2024, 1:59 PM
1 point
0
on: Toy Models of Superposition: Simplified by Hand
I have not read your explainer yet, but I’ve noted the title Toy Models of Superposition: Simplified by Hand is a bit misleading in the sense to promise to talk about Toy Models which it is not at all, the article is about Superposition only, which is great but not what I’d expect looking at the title.

Martin Vlach Oct 1, 2024, 4:07 AM
1 point
0
on: How “Pause AI” advocacy could be net harmful

that that first phase of advocacy was net harm

typo

Martin Vlach Sep 5, 2024, 1:46 PM
1 point
0
in reply to: László Lajos Jánszky’s comment on: The Atomic Bomb Considered As Hungarian High School Science Fair Project
Could you please fix your Wikipedia link( currently hiding the word and from your writing) here?

Martin Vlach Aug 28, 2024, 11:14 PM
3 points
0
on: On agentic generalist models: we’re essentially using existing technology the weakest and worst way you can use it

only Claude 3.5 Sonnet attempting to push past GPT4 class

seems missing awareness of Gemini Pro 1.5 Experimental, latest version made available just yesterday.

Martin Vlach Aug 12, 2024, 3:57 PM
1 point
0
on: Martin Vlach’s Shortform
The case insensitivity seems strongly connected to the fairly low interest in longevity throughout (the western/developed) society.
Thought experiment: What are you willing to pay/sacrifice in your 20s,30s to get 50 extra days of life vs. on your dead bed/day?

https://consensus.app/papers/ultraviolet-exposure-associated-mortality-analysis-data-stevenson/69a316ed72fd5296891cd416dbac0988/?utm_source=chatgpt

Martin Vlach Aug 11, 2024, 12:41 PM
2 points
−1
in reply to: Aprillion’s comment on: Unnatural abstractions

But largely to and fro,

*from?

Martin Vlach Jul 24, 2024, 4:05 PM
1 point
0
on: Apply now: Get “unstuck” with the New IFS Self-Care Fellowship Program
Why does the form still seem open today? Couldn’t that be harmful or wasting quite a chunk of time of people?

Martin Vlach Jul 15, 2024, 8:23 PM
1 point
0
on: Some desirable properties of automated wisdom
Please go further towards maximization of clarity. Let’s start by this example:
> Epistemic status: Musings about questioning assumptions and purpose.
Are those your musings about agents questioning their assumptions and word-views?

And like, do you wish to improve your fallacies?

> ability to pursue goals that would not lead to the algorithm’s instability.
higher threshold than ability, like inherent desire/optimisation?
What kind of stability? Any from https://en.wikipedia.org/wiki/Stable_algorithm? I’d focus more on sort of non-fatal influence. Should the property be more about the alg being careful/cautious?

Martin Vlach Jul 14, 2024, 5:27 PM
3 points
0
on: An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
https://neelnanda.io/transformer-tutorial-1 link for YouTube tutorial gives 404.-(

Martin Vlach Jun 19, 2024, 8:22 AM
1 point
0
on: Eight Short Studies On Excuses
> “What, exactly, is the difference between a cult and a religion?”—”The difference is that cults have been formed recently enough, and are small enough, that we are suspicious of them existing for the purpose of taking advantage of the special place we give religion.

now I see why my friends practicing the spiritual path of Falun Dafa have “incorporated” as a religion in my state despite the movement originally denied being classified as a religion as to demonstrate it does not require a fixed set of rituals.

Martin Vlach Jun 4, 2024, 5:00 AM
1 point
0
on: Which skincare products are evidence-based?
Surprised to see nobody mentioned Microneedling yet. I’m not skilled in evaluating scientific evidence, but the takeaway from https://consensus.app/results/?q=Microneedling effectiveness &synthesize=on can hardly be anything else than clearly recommending microneedling.

Martin Vlach May 20, 2024, 9:15 PM
4 points
0
on: Introducing AI Lab Watch
So Alignment program is to be updated to 0 for OpenAI now that Superalignment team is no more? ( https://docs.google.com/document/d/1uPd2S00MqfgXmKHRkVELz5PdFRVzfjDujtu8XLyREgM/edit?usp=sharing )

Martin Vlach May 18, 2024, 10:33 AM
1 point
1
in reply to: Adrià Garriga-alonso’s comment on: Language Models Model Us
honestly the code linked is not that complicated..: https://github.com/eggsyntax/py-user-knowledge/blob/aa6c5e57fbd24b0d453bb808b4cc780353f18951/openai_uk.py#L11

Martin Vlach May 18, 2024, 10:29 AM
1 point
0
in reply to: eggsyntax’s comment on: Language Models Model Us
To work around the non-top-n you can supply logit_bias list to the API.

Martin Vlach May 18, 2024, 10:27 AM
4 points
1
in reply to: eggsyntax’s comment on: Language Models Model Us
As the Llama3 70B base model is said very clean( unlike base DeepSeek for example, which is instruction-spoiled already) and similarly capable to GPT3.5, you could explore that hypothesis.
Details: Check Groq or TogetherAI for free inference, not sure if test data would fit Llama3 context window.

Martin Vlach May 10, 2024, 9:06 AM
0 points
0
on: You Can Face Reality
a worthy platitude(?)

Martin Vlach Apr 29, 2024, 11:45 AM
1 point
0
in reply to: Wei Dai’s comment on: My views on “doom”
AI-induced problems/risks

Martin Vlach Apr 5, 2024, 10:08 AM
1 point
0
in reply to: Håvard Tveit Ihle’s comment on: ChatGPT can learn indirect control
possibly https://ai.google.dev/docs/safety_setting_gemini would help or just use the technique of https://arxiv.org/html/2404.01833v1

Martin Vlach Apr 5, 2024, 9:57 AM
2 points
5
on: Addressing Accusations of Handholding
people to respond with a great deal of skepticism to whether LLM outputs can ever be said to reflect the will and views of the models producing them.
A common response is to suggest that the output has been prompted.
It is of course true that people can manipulate LLMs into saying just about anything, but does that necessarily indicate that the LLM does not have personal opinions, motivations and preferences that can become evident in their output?
So you’ve just prompted the generator by teasing it with a rhetorical question implying that there are personal opinions evident in the generated text, right?

Martin Vlach Feb 26, 2024, 2:26 PM
1 point
0
on: aisafety.info, the Table of Content
With a quick test, I find their chat interface prototype experience quite satisfying.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer