wassname

Karma: 170

wassname 3 Jun 2024 8:52 UTC
8 points
0
on: MIRI 2024 Communications Strategy
Have you considered emphasizing this part of your position:

“We want to shut down AGI research including governments, military, and spies in all countries”.

I think this is an important point that is missed in current regulation, which focuses on slowing down only the private sector. It’s hard to achieve because policymakers often favor their own institutions, but it’s absolutely needed, so it needs to be said early and often. This will actually win you points with the many people who are cynical of the institutions, who are not just libertarians, but a growing portion of the public.

I don’t think anyone is saying this, but it fits your honest and confronting communication strategy.

wassname 25 May 2024 0:12 UTC
6 points
0
on: minutes from a human-alignment meeting
Just build the good John’s but not the bad Johns.

wassname 23 May 2024 23:31 UTC
1 point
0
on: Orthogonality is expensive
This argument is empirical, while the orthogonality hypothesis is merely philosophical, which means this is a stronger argument imo.

But this argument does not imply alignment is easy. It implies that acute goals are easy, while orthogonal goals are hard. Therefore, a player of games agent will be easy to align with power-seeking, but hard to align with banter. A chat agent will be easy to align with banter and hard to align with power-seeking.

We are currently in the chat phase, which seems to imply easier alignment to chatty huggy human values, but we might soon enter the player of long-term games phase. So this argument implies that alignment is currently easier, and if we enter the era of RL long-term planning agents, it will get harder.

wassname 20 May 2024 23:17 UTC
1 point
0
in reply to: eggsyntax’s comment on: Language Models Model Us
Feel free to suggest improvements, it’s just what worked for me, but is limited in format

wassname 19 May 2024 23:23 UTC
1 point
0
in reply to: Adrià Garriga-alonso’s comment on: Language Models Model Us
If you are using llama you can use https://github.com/wassname/prob_jsonformer, or snippets of the code to get probabilities over a selection of tokens

wassname 19 May 2024 3:57 UTC
1 point
0
in reply to: Seth Herd’s comment on: Instruction-following AGI is easier and more likely than value aligned AGI
That’s true, they are different. But search still provides the closest historical analogue (maybe employees/suppliers provide another). Historical analogues have the benefit of being empirical and grounded, so I prefer them over (or with) pure reasoning or judgement.

wassname 18 May 2024 9:09 UTC
2 points
1
in reply to: Matthew Barnett’s comment on: Instruction-following AGI is easier and more likely than value aligned AGI
When you rephrase this to be about search engines

I think the main reason why we won’t censor search to some abstract conception of “community values” is because users won’t want to rent or purchase search services that are censor to such a broad target

It doesn’t describe reality. Most of us consume search and recommendations that has been censored (e.g. removing porn, piracy, toxicity, racism, taboo politics) in a way that pus cultural values over our preferences or interests.

So perhaps it won’t be true for AI either. At least in the near term, the line between AI and search is a blurred line, and the same pressures exist on consumers and providers.

wassname 18 May 2024 8:43 UTC
1 point
0
in reply to: Lorxus’s comment on: romeostevensit’s Shortform
A before and after would be even better!

wassname 18 May 2024 5:48 UTC
4 points
0
in reply to: mike_hawke’s comment on: Ilya Sutskever and Jan Leike resign from OpenAI
Thanks, but this doesn’t really give insight on whether this is normal or enforceable. So I wanted to point out, we don’t know if it’s enforcible, and have not seen a single legal opinion.

wassname 15 May 2024 8:18 UTC
1 point
0
in reply to: Isaac King’s comment on: simeon_c’s Shortform
Thanks, I hadn’t seen that, I find it convincing.

wassname 15 May 2024 7:20 UTC
1 point
2
in reply to: Mateusz Bagiński’s comment on: Ilya Sutskever and Jan Leike resign from OpenAI
He might have returned to work, but agreed to no external coms.

wassname 15 May 2024 6:47 UTC
2 points
0
in reply to: PhilosophicalSoul’s comment on: William_S’s Shortform
Interesting! For most of us, this is outside our area of competence, so appreciate your input.

wassname 12 May 2024 0:18 UTC
8 points
2
in reply to: PhilosophicalSoul’s comment on: William_S’s Shortform
Are you familiar with USA NDA’s? I’m sure there are lots of clauses that have been ruled invalid by case law? In many cases, non-lawyers have no ideas about these, so you might be able to make a difference with very little effort. There is also the possibility that valuable OpenAI shares could be rescued?

If you haven’t seen it, check out this thread where one of the OpenAI leavers did not sigh the gag order.

wassname 12 May 2024 0:12 UTC
2 points
1
in reply to: Mitchell_Porter’s comment on: William_S’s Shortform
It could just be because it reaches a strong conclusion on anecdotal/clustered evidence (e.g. it might say more about her friend group than anything else). Along with claims to being better calibrated for weak reasons—which could be true, but seems not very epistemically humble.

Full disclosure I downvoted karma, because I don’t think it should be top reply, but I did not agree or disagree.

But Jen seems cool, I like weird takes, and downvotes are not a big deal—just a part of a healthy contentious discussion.

wassname 11 May 2024 23:58 UTC
4 points
2
in reply to: habryka’s comment on: simeon_c’s Shortform
Notably, there are some lawyers here on LessWrong who might help (possibly even for the lols, you never know). And you can look at case law and guidance to see if clauses are actually enforceable or not (many are not). To anyone reading, here’s habryka doing just that
What links here?
- wassname's comment on William_S’s Shortform by William_S (12 May 2024 0:18 UTC; 8 points)

wassname 11 May 2024 9:11 UTC
4 points
−5
in reply to: lePAN6517’s comment on: simeon_c’s Shortform
One is the change to the charter to allow the company to work with the military.

https://news.ycombinator.com/item?id=39020778

I think the board must be thinking about how to get some independence from Microsoft, and there are not many entities who can counterbalance one of the biggest companies in the world. The government’s intelligence and defence industries are some of them (as are Google, Meta, Apple, etc). But that move would require secrecy, both to stop nationalistic race conditions, and by contract, and to avoid a backlash.

EDIT: I’m getting a few disagrees, would someone mind explaining why they disagree with these wild speculations?

wassname 9 May 2024 23:37 UTC
1 point
0
in reply to: William_S’s comment on: William_S’s Shortform
Here’s something I’ve been pondering.

hypothesis: If transformers has internal concepts, and they are represented in the residual stream. Then because we have access to 100% of the information then it should be possible for a non-linear probe to get 100% out of distribution accuracy. 100% is important because we care about how a thing like value learning will generalise OOD.

And yet we don’t get 100% (in fact most metrics are much easier than what we care about, being in-distribution, or on careful setups). What is wrong with the assumptions hypothesis, do you think?

wassname 9 May 2024 23:30 UTC
1 point
0
in reply to: JenniferRM’s comment on: William_S’s Shortform

better calibrated than any of these opinions, because most of them don’t seem to focus very much on “hedging” or “thoughtful doubting”

new observations > new thoughts when it comes to calibrating yourself.

The best calibrated people are people who get lots of interaction with the real world, not those who think a lot or have a complicated inner model. Tetlock’s super forecasters were gamblers and weathermen.

wassname 8 May 2024 23:47 UTC
3 points
0
in reply to: faul_sname’s comment on: We are headed into an extreme compute overhang

I think this only holds if fine tunes are composable, which as far as I can tell they aren’t

Anecdotally, a lot of people are using mergekit to combine fine tunes

wassname 5 May 2024 22:25 UTC
1 point
0
AF
in reply to: Neel Nanda’s comment on: Refusal in LLMs is mediated by a single direction

it feels less surgical than a single direction everywher

Agreed, it seems less elegant, But one guy on huggingface did a rough plot the cross correlation, and it seems to show that the directions changes with layer https://huggingface.co/posts/Undi95/318385306588047#663744f79522541bd971c919. Although perhaps we are missing something.

Note that you can just do torch.save(FILE_PATH, model.state_dict()) as with any PyTorch model.

omg, I totally missed that, thanks. Let me know if I missed anything else, I just want to learn.

The older versions of the gist are in transformerlens, if anyone wants those versions. In those the interventions work better since you can target resid_pre, redis_mid, etc.