Martin Vlach

Karma: 35

Martin Vlach 20 May 2024 21:15 UTC
2 points
0
on: Introducing AI Lab Watch
So Alignment program is to be updated to 0 for OpenAI now that Superalignment team is no more? ( https://docs.google.com/document/d/1uPd2S00MqfgXmKHRkVELz5PdFRVzfjDujtu8XLyREgM/edit?usp=sharing )

Martin Vlach 18 May 2024 10:33 UTC
1 point
1
in reply to: Adrià Garriga-alonso’s comment on: Language Models Model Us
honestly the code linked is not that complicated..: https://github.com/eggsyntax/py-user-knowledge/blob/aa6c5e57fbd24b0d453bb808b4cc780353f18951/openai_uk.py#L11

Martin Vlach 18 May 2024 10:29 UTC
1 point
0
in reply to: eggsyntax’s comment on: Language Models Model Us
To work around the non-top-n you can supply logit_bias list to the API.

Martin Vlach 18 May 2024 10:27 UTC
4 points
0
in reply to: eggsyntax’s comment on: Language Models Model Us
As the Llama3 70B base model is said very clean( unlike base DeepSeek for example, which is instruction-spoiled already) and similarly capable to GPT3.5, you could explore that hypothesis.
Details: Check Groq or TogetherAI for free inference, not sure if test data would fit Llama3 context window.

Martin Vlach 10 May 2024 9:06 UTC
0 points
0
on: You Can Face Reality
a worthy platitude(?)

Martin Vlach 29 Apr 2024 11:45 UTC
1 point
0
in reply to: Wei Dai’s comment on: My views on “doom”
AI-induced problems/risks

Martin Vlach 5 Apr 2024 10:08 UTC
1 point
0
in reply to: Håvard Tveit Ihle’s comment on: ChatGPT can learn indirect control
possibly https://ai.google.dev/docs/safety_setting_gemini would help or just use the technique of https://arxiv.org/html/2404.01833v1

Martin Vlach 5 Apr 2024 9:57 UTC
2 points
5
on: Addressing Accusations of Handholding
people to respond with a great deal of skepticism to whether LLM outputs can ever be said to reflect the will and views of the models producing them.
A common response is to suggest that the output has been prompted.
It is of course true that people can manipulate LLMs into saying just about anything, but does that necessarily indicate that the LLM does not have personal opinions, motivations and preferences that can become evident in their output?
So you’ve just prompted the generator by teasing it with a rhetorical question implying that there are personal opinions evident in the generated text, right?

Martin Vlach 26 Feb 2024 14:26 UTC
1 point
0
on: aisafety.info, the Table of Content
With a quick test, I find their chat interface prototype experience quite satisfying.

Martin Vlach 12 Dec 2023 11:20 UTC
1 point
0
on: Martin Vlach’s Shortform
Asserting LLMs’ views/opinions should exclude using sampling( even temperature=0, deterministic seed), we should just look at the answers’ distribution in the logits. My thesis on why that is not the best practice yet is that OpenAI API only supports logit_bias, not reading the probabilities directly.
This should work well with pre-set A/B/C/D choices, but to some extent with chain/tree of thought too. You’d just revert the final token and look at the probabilities in the last (pass through )step.

Martin Vlach 5 Dec 2023 16:10 UTC
8 points
7
in reply to: janus’s comment on: GPTs are Predictors, not Imitators
Do not say the sampling too lightly, there is likely an amazing delicacy around it.’+)

Martin Vlach 24 Nov 2023 12:47 UTC
2 points
0
on: OpenAI: The Battle of the Board
what happened at Reddit
could there be any link? From a small research I have only obtained that Steve Huffman praised Altman’s value to the Reddit board.

Martin Vlach 13 Nov 2023 10:03 UTC
1 point
0
on: unRLHF—Efficiently undoing LLM safeguards
makes makes
typo

[Question] Would it be useful to collect the contexts, where various LLMs think the same?

Martin Vlach24 Aug 2023 22:01 UTC

6 points

1 comment1 min readLW link

Martin Vlach 24 Aug 2023 7:20 UTC
4 points
0
on: Martin Vlach’s Shortform
Would be cool to have a playground or a daily challenge with a code golfing equivalent for a shortest possible LLM prompt to a given answer.

That could help build some neat understanding or intuitions.

Martin Vlach 16 Aug 2023 2:51 UTC
−1 points
−2
on: The Waluigi Effect (mega-post)

in the limit of arbitrary compute, arbitrary data, and arbitrary algorithmic efficiency, because an LLM which perfectly models the internet

seems worth formulating. My first and second read were What? If I can have arbitrary training data, the LLM will model those, not your internet. I guess you’ve meant storage for the model?+)

Martin Vlach 5 Aug 2023 15:39 UTC
3 points
1
on: Manifund: What we’re funding (weeks 2-4)
Would be cool if a link to https://manifund.org/about fit somewhere in the beginning of there are more readers like me unfamiliar with the project.

Otherwise a cool write-up, I’m a bit confused with Grant of the month vs. weeks 2-4 which seems a shorter period..also not a big deal though.

Martin Vlach 16 Jul 2023 15:16 UTC
1 point
0
in reply to: GeneSmith’s comment on: Elon Musk announces xAI
On the Twitter spaces 2 days ago, a lot of emphasis seemed put on understanding which to me has a more humble conotation to me.
Still I agree I would not bet on their luck with a choice of a single value to build their systems upon.( Although they have a luckers track record.)

Martin Vlach 9 Jun 2023 18:00 UTC
1 point
0
on: Co-found an incubator for independent AI Safety researchers
The website seems good, but the buttons on the ‘sharing’ circle on the bottom need fixing.

Martin Vlach 6 Apr 2023 9:31 UTC
1 point
0
on: Martin Vlach’s Shortform
Some SEO effort should be put to results of Guideline for safe AI development, Best practices for , etc.