RSS

MiguelDev

Karma: 311

https://​​www.whitehatstoic.com/​​

Un­lock­ing Eth­i­cal AI and Im­prov­ing Jailbreak Defenses: Re­in­force­ment Learn­ing with Lay­ered Mor­phol­ogy (RLLM)

MiguelDevFeb 1, 2025, 7:17 PM
4 points
2 comments2 min readLW link
(www.whitehatstoic.com)

An ex­am­i­na­tion of GPT-2′s bor­ing yet effec­tive glitch

MiguelDevApr 18, 2024, 5:26 AM
5 points
3 comments3 min readLW link

In­ter­gen­er­a­tional Knowl­edge Trans­fer (IKT)

MiguelDevMar 28, 2024, 8:14 AM
6 points
0 comments1 min readLW link

RLLMv10 experiment

MiguelDevMar 18, 2024, 8:32 AM
5 points
0 comments2 min readLW link

A T-o-M test: ‘pop­corn’ or ‘choco­late’

MiguelDevMar 8, 2024, 4:24 AM
20 points
13 comments1 min readLW link

Sparks of AGI prompts on GPT2XL and its var­i­ant, RLLMv3

MiguelDevMar 7, 2024, 6:33 AM
4 points
0 comments4 min readLW link

Can RLLMv3′s abil­ity to defend against jailbreaks be at­tributed to datasets con­tain­ing sto­ries about Jung’s shadow in­te­gra­tion the­ory?

MiguelDevFeb 29, 2024, 5:13 AM
7 points
2 comments11 min readLW link

Re­search Log, RLLMv3 (GPT2-XL, Phi-1.5 and Fal­con-RW-1B)

MiguelDevFeb 15, 2024, 3:39 AM
4 points
0 comments262 min readLW link

GPT2XL_RLLMv3 vs. Bet­terDAN, AI Machi­avelli & Oppo Jailbreaks

MiguelDevFeb 11, 2024, 11:03 AM
16 points
4 comments14 min readLW link

Re­search Log, RLLMv2: Phi-1.5, GPT2XL and Fal­con-RW-1B as pa­per­clip maximizers

MiguelDevJan 20, 2024, 3:30 PM
6 points
0 comments10 min readLW link

[Question] rab­bit (a new AI com­pany) and Large Ac­tion Model (LAM)

MiguelDevJan 10, 2024, 1:57 PM
17 points
3 comments1 min readLW link

Re­in­force­ment Learn­ing us­ing Lay­ered Mor­phol­ogy (RLLM)

MiguelDevDec 1, 2023, 5:18 AM
7 points
0 comments29 min readLW link

Migueldev’s shortform

MiguelDevNov 1, 2023, 8:54 AM
2 points
12 comments1 min readLW link

GPT-2 XL’s ca­pac­ity for co­her­ence and on­tol­ogy clustering

MiguelDevOct 30, 2023, 9:24 AM
6 points
2 comments41 min readLW link

Rele­vance of ‘Harm­ful In­tel­li­gence’ Data in Train­ing Datasets (We­bText vs. Pile)

MiguelDevOct 12, 2023, 12:08 PM
12 points
0 comments9 min readLW link

[Question] Who de­ter­mines whether an al­ign­ment pro­posal is the defini­tive al­ign­ment solu­tion?

MiguelDevOct 3, 2023, 10:39 PM
−1 points
6 comments1 min readLW link

<|end­of­text|> is a van­ish­ing text?

MiguelDevSep 16, 2023, 2:34 AM
10 points
0 comments1 min readLW link

On Ilya Sutskever’s “A The­ory of Un­su­per­vised Learn­ing”

MiguelDevAug 26, 2023, 5:34 AM
10 points
0 comments19 min readLW link

Ex­plor­ing the Re­spon­si­ble Path to AI Re­search in the Philippines

MiguelDevAug 23, 2023, 8:44 AM
6 points
0 comments6 min readLW link

A fic­tional AI law laced w/​ al­ign­ment theory

MiguelDevJul 17, 2023, 1:42 AM
6 points
0 comments2 min readLW link