VojtaKovarik

Karma: 747

My original background is in mathematics (analysis, topology, Banach spaces) and game theory (imperfect information games). Nowadays, I do AI alignment research (mostly systemic risks, sometimes pondering about “consequentionalist reasoning”).

[Question] When is “unfalsifiable implies false” incorrect?

VojtaKovarikJun 15, 2024, 12:28 AM

3 points

11 comments1 min readLW link

[Question] What is the purpose and application of AI Debate?

VojtaKovarikApr 4, 2024, 12:38 AM

13 points

9 comments1 min readLW link

Extinction Risks from AI: Invisible to Science?

VojtaKovarik, Chris van Merwijk and Ida Mattsson

Feb 21, 2024, 6:07 PM

24 points

7 comments1 min readLW link

(arxiv.org)

Extinction-level Goodhart’s Law as a Property of the Environment

VojtaKovarik and Ida Mattsson

Feb 21, 2024, 5:56 PM

23 points

0 comments10 min readLW link

Dynamics Crucial to AI Risk Seem to Make for Complicated Models

VojtaKovarik and Ida Mattsson

Feb 21, 2024, 5:54 PM

19 points

0 comments9 min readLW link

Which Model Properties are Necessary for Evaluating an Argument?

VojtaKovarik and Ida Mattsson

Feb 21, 2024, 5:52 PM

18 points

2 comments7 min readLW link

Weak vs Quantitative Extinction-level Goodhart’s Law

VojtaKovarik and Ida Mattsson

Feb 21, 2024, 5:38 PM

27 points

1 comment2 min readLW link

VojtaKovarik’s Shortform

VojtaKovarikFeb 4, 2024, 8:57 PM

5 points

5 comments LW link

My Alignment “Plan”: Avoid Strong Optimisation and Align Economy

VojtaKovarikJan 31, 2024, 5:03 PM

24 points

9 comments7 min readLW link

Control vs Selection: Civilisation is best at control, but navigating AGI requires selection

VojtaKovarikJan 30, 2024, 7:06 PM

7 points

1 comment1 min readLW link

AI Awareness through Interaction with Blatantly Alien Models

VojtaKovarikJul 28, 2023, 8:41 AM

7 points

5 comments3 min readLW link

Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

VojtaKovarikJul 21, 2023, 9:03 PM

12 points

18 comments3 min readLW link

Recursive Middle Manager Hell: AI Edition

VojtaKovarikMay 4, 2023, 8:08 PM

30 points

11 comments2 min readLW link

OpenAI could help X-risk by wagering itself

VojtaKovarikApr 20, 2023, 2:51 PM

31 points

16 comments1 min readLW link

Legitimising AI Red-Teaming by Public

VojtaKovarikApr 19, 2023, 2:05 PM

10 points

7 comments3 min readLW link

[Question] How do you align your emotions through updates and existential uncertainty?

VojtaKovarikApr 17, 2023, 8:46 PM

4 points

10 comments1 min readLW link

Formalizing Objections against Surrogate Goals

VojtaKovarikSep 2, 2021, 4:24 PM

16 points

23 comments1 min readLW link

Risk Map of AI Systems

VojtaKovarik and Jan_Kulveit

Dec 15, 2020, 9:16 AM

28 points

3 comments8 min readLW link

Values Form a Shifting Landscape (and why you might care)

VojtaKovarikDec 5, 2020, 11:56 PM

29 points

6 comments4 min readLW link

AI Problems Shared by Non-AI Systems

VojtaKovarikDec 5, 2020, 10:15 PM

7 points

2 comments4 min readLW link