how 2 tell if ur in­put is out of dis­tri­bu­tion given only model weights

dkirmani5 Aug 2023 22:45 UTC
47 points
10 comments1 min readLW link

Sum­mary of Im­prov­ing Global De­ci­sion Mak­ing (around AI)

Will_Pearson5 Aug 2023 18:46 UTC
−7 points
0 comments1 min readLW link

Ground-Truth La­bel Im­bal­ance Im­pairs the Perfor­mance of Con­trast-Con­sis­tent Search (and Other Con­trast-Pair-Based Un­su­per­vised Meth­ods)

5 Aug 2023 17:55 UTC
6 points
2 comments7 min readLW link
(drive.google.com)

Seat­tle As­tral Codex Ten Monthly Social

a7x5 Aug 2023 17:55 UTC
1 point
0 comments1 min readLW link

AISafety.info’s Writ­ing & Edit­ing Hackathon

smallsilo5 Aug 2023 17:14 UTC
2 points
0 comments1 min readLW link

Join AISafety.info’s Writ­ing & Edit­ing Hackathon (Aug 25-28) (Prizes to be won!)

smallsilo5 Aug 2023 14:08 UTC
19 points
3 comments1 min readLW link
(forum.effectivealtruism.org)

Stomach Ulcers and Den­tal Cavities

Metacelsus5 Aug 2023 14:08 UTC
56 points
7 comments1 min readLW link
(denovo.substack.com)

video games > IQ tests

bhauth5 Aug 2023 13:27 UTC
35 points
45 comments3 min readLW link

[Linkpost] Ap­pli­ca­bil­ity of scal­ing laws to vi­sion en­cod­ing models

Bogdan Ionut Cirstea5 Aug 2023 11:10 UTC
11 points
2 comments1 min readLW link

A Naive Pro­posal for Con­struct­ing In­ter­pretable AI

Chris_Leong5 Aug 2023 10:32 UTC
18 points
6 comments2 min readLW link

ACX Paris Meetup—Au­gust 11 2023

PoignardAzur5 Aug 2023 9:44 UTC
2 points
0 comments1 min readLW link

Meet Hype­r­ion on Sun­day Aug 6?

duck_master5 Aug 2023 4:36 UTC
1 point
0 comments1 min readLW link

[Question] What are the best pub­lished pa­pers from out­side the al­ign­ment com­mu­nity that are rele­vant to Agent Foun­da­tions?

Stephen Fowler5 Aug 2023 3:02 UTC
20 points
4 comments1 min readLW link

An­nounc­ing Squig­gle Hub

5 Aug 2023 1:00 UTC
46 points
4 comments5 min readLW link
(forum.effectivealtruism.org)

Read More Books but Pre­tend to Read Even More

Arjun Panickssery5 Aug 2023 0:07 UTC
23 points
12 comments4 min readLW link
(arjunpanickssery.substack.com)

The Sinews of Su­dan’s Lat­est War

Tim Liptrot4 Aug 2023 18:17 UTC
43 points
12 comments12 min readLW link

Pri­vate notes on LW?

Raemon4 Aug 2023 17:35 UTC
61 points
33 comments1 min readLW link

When train­ing AI, we should es­ca­late the fre­quency of ca­pa­bil­ity tests

Hauke Hillebrandt4 Aug 2023 16:07 UTC
2 points
0 comments1 min readLW link

Man­i­fund: What we’re fund­ing (weeks 2-4)

Austin Chen4 Aug 2023 16:00 UTC
44 points
2 comments1 min readLW link
(manifund.substack.com)

[Linkpost] Mul­ti­modal Neu­rons in Pre­trained Text-Only Transformers

Bogdan Ionut Cirstea4 Aug 2023 15:29 UTC
11 points
0 comments1 min readLW link

Apollo Re­search is hiring evals and in­ter­pretabil­ity en­g­ineers & scientists

Marius Hobbhahn4 Aug 2023 10:54 UTC
25 points
0 comments2 min readLW link

[Question] Has any­one tried cre­at­ing a YouTube or TikTok se­ries cov­er­ing the se­quences?

Max Rossi4 Aug 2023 0:10 UTC
4 points
4 comments1 min readLW link

[Question] Is there any met­ric mea­sur­ing ~”pro­por­tion of peo­ple cre­at­ing ex­tra value”?

Amal 3 Aug 2023 22:54 UTC
7 points
3 comments1 min readLW link

[Question] Hy­po­thet­i­cal: what would you do?

JNS3 Aug 2023 22:39 UTC
4 points
2 comments1 min readLW link

[Linkpost] De­cep­tion Abil­ities Emerged in Large Lan­guage Models

Bogdan Ionut Cirstea3 Aug 2023 17:28 UTC
12 points
0 comments1 min readLW link

Embed­ding Eth­i­cal Pri­ors into AI Sys­tems: A Bayesian Approach

Justausername3 Aug 2023 15:31 UTC
−5 points
3 comments21 min readLW link

Pass­word-locked mod­els: a stress case for ca­pa­bil­ities evaluation

Fabien Roger3 Aug 2023 14:53 UTC
156 points
14 comments6 min readLW link

AI #23: Fun­da­men­tal Prob­lems with RLHF

Zvi3 Aug 2023 12:50 UTC
59 points
9 comments41 min readLW link
(thezvi.wordpress.com)

Bad Imi­ta­tion Instruments

jefftk3 Aug 2023 2:30 UTC
21 points
1 comment1 min readLW link
(www.jefftk.com)

Kol­mogorov’s the­ory of Al­gorith­mic Probability

Aidan Rocke3 Aug 2023 0:58 UTC
5 points
2 comments2 min readLW link
(keplerlounge.com)

Work cul­ture creep

CrimsonChin3 Aug 2023 0:38 UTC
27 points
15 comments8 min readLW link

[Question] Boxing

Zach Stein-Perlman2 Aug 2023 23:38 UTC
6 points
1 comment1 min readLW link

Ex­ter­nal ra­tio­nal­ity vs. in­ter­nal rationality

metachirality2 Aug 2023 23:29 UTC
7 points
0 comments1 min readLW link

When perform­ing a di­men­sion­al­ity re­duc­tion on ten­sors, the trace is of­ten zero.

Joseph Van Name2 Aug 2023 21:06 UTC
7 points
1 comment3 min readLW link

Progress links di­gest, 2023-08-02: Su­per­con­duc­tor edition

jasoncrawford2 Aug 2023 20:27 UTC
13 points
0 comments3 min readLW link
(rootsofprogress.org)

[Question] What works for ADHD and/​or re­lated things?

TeaTieAndHat2 Aug 2023 18:37 UTC
6 points
13 comments1 min readLW link

[Question] Would you pay for a search en­g­ine limited to ra­tio­nal­ist sites?

Conor2 Aug 2023 18:06 UTC
4 points
19 comments1 min readLW link

The Roots of Progress Blog-Build­ing In­ten­sive: ad­vice for ap­pli­cants, re­quest for support

jasoncrawford2 Aug 2023 15:37 UTC
9 points
0 comments1 min readLW link
(rootsofprogress.org)

3 lev­els of threat obfuscation

HoldenKarnofsky2 Aug 2023 14:58 UTC
69 points
14 comments7 min readLW link

ChatGPT for trans­la­tion

Varshul Gupta2 Aug 2023 11:57 UTC
1 point
0 comments3 min readLW link
(dubverseblack.substack.com)

Long-Term Fu­ture Fund: April 2023 grant recommendations

2 Aug 2023 7:54 UTC
81 points
3 comments50 min readLW link

[Question] Could we breed/​en­g­ineer in­tel­li­gent par­rots?

lemonhope2 Aug 2023 7:32 UTC
9 points
18 comments1 min readLW link

An­throp­i­cal Motte and Bailey in two ver­sions of Sleep­ing Beauty

Ape in the coat2 Aug 2023 7:08 UTC
32 points
56 comments6 min readLW link

so­lar-ther­mal and techno-eco­nomic analysis

bhauth2 Aug 2023 6:22 UTC
21 points
8 comments5 min readLW link
(www.bhauth.com)

South Bay ACX/​SSC Meetup @ Whole Foods

allisona2 Aug 2023 3:44 UTC
1 point
0 comments1 min readLW link

“Is There Any­thing That’s Worth More”

Zack_M_Davis2 Aug 2023 3:28 UTC
64 points
6 comments1 min readLW link

Bay Win­ter Sols­tice: call for speech pitches!

tcheasdfjkl2 Aug 2023 3:24 UTC
9 points
0 comments1 min readLW link
(docs.google.com)

[Question] What is on­tol­ogy?

Adam Zerner2 Aug 2023 0:54 UTC
28 points
19 comments1 min readLW link

My cur­rent LK99 questions

Eliezer Yudkowsky1 Aug 2023 22:48 UTC
206 points
38 comments5 min readLW link

Spiral Staircase

Michael Samoilov1 Aug 2023 21:51 UTC
19 points
2 comments2 min readLW link